Tuesday, May 5, 2015

Big-O Complexity != Actual Performance

Any easy mistake to make is to assume that because one algorithm has a better Big-O complexity, it must be faster to execute. Similarly, two algorithms with the same Big-O, tackling the same problem, and all things similar may have vastly different performance.

Consider for example a "unique string" algorithm, where you want to check if a string contains all unique characters. A naive approach to the problem, which does not use extra data structures, is to go through the string, iterating each letter against the entire string searching for a match. This approach is O(n^2).

int strunique_naive(const char *str)
{
    char *iter, *tmp;

    for (iter = (char *)str; *iter != '\0'; ++iter)
        for (tmp = (char *)str; *tmp != '\0'; ++tmp)
            if (*tmp == *iter)
                return 0;

    return 1;
}


Now consider a better approach to the problem. We can optimize the algorithm by adding an offset for the iterating over the string part, since for example in a short string there is no reason to test the last character with the first character as the first character already performed that test during its loop-through.

int strunique_better(const char *str)
{
    size_t offset = 0;    /* offset at 0 */
    char *iter, *tmp;

    for (iter = (char *)str; *iter != '\0'; ++iter, ++offset)
        for (tmp = (char *)str + offset; *tmp != '\0'; ++tmp)
            if (*tmp == *iter)
                return 0;

    return 1;
}

int strunique_best(const char *str)
{
    size_t offset = 1;   /* pre-emptively shift offset up */

    /* repeat code above */
}


When we calculate the number of loops for the worst-case scenario (i.e. a string that is unique) for the naive algorithm, we get n^2 loops. The better algorithm is done in (n^2 + n) / 2, and the best is n * (n - 1) / 2 loops. The following table shows the differences:

# of Loops on Worst Case

strunique_naivestrunique_betterstrunique_best
1 character110
2 characters431
3 characters963
5 characters251510
10 characters1005545
100 characters10,0005,0504,950


So why can you not trust Big-O complexity for actual performance? All 3 algorithms are actually O(n^2)! The better and best algorithms just have an improved actual performance equation.

Algorithm Benchmarks

strunique_naivestrunique_betterstrunique_best
Big-O ComplexityO(n^2)O(n^2)O(n^2)
Actual Performancen^2(n^2 + n) / 2n * (n - 1) / 2