Thursday 8 January 2009

Handling a Common strcmp() Case

In this section of the newsletter we will present some practical performance tips for improving code speed and reducing memory usage. Some of these tips will be useful only for C++ code and some will be more general and applicable to C or other languages.

As a first example, consider an application using C-style strings and functions such as strcmp(). A recent experience with this sort of application involved a function that does word stemming, that is, takes words such as "motoring" and reduces them to their root stem, in this case "motor".

In profiling this function, it was observed that much of the overall time was being spent in the strcmp() function. For the C++ compiler in question (Borland 3.1), this function is written in assembly language and is quite fast, and attempts to speed it up by unrolling the equivalent code locally at the point of function call will typically result in slowing things down.

But it's still the case that calling a function, even one implemented in assembly language, has some overhead, which comes from saving registers, manipulating stack frames, actual transfer of control, and so on. So it might be worth trying to exploit a common case -- the case where you can determine the relationship of the strings by looking only at the first character.

So we might use an inline function in C++ to encapsulate this logic:

inline int local_strcmp(const char* s, const char* t)
{
return (*s != *t ? *s - *t : strcmp(s, t));
}

If the first characters of each string do not match, there's no need to go further by calling strcmp(); we already know the answer.

Another way to implement the same idea is via a C macro:

#define local_strcmp(s, t) ((s)[0] != (t)[0] ? (s)[0] - (t)[0] : \
strcmp((s), (t)))

This approach has a couple of disadvantages, however. Macros are hard to get right because of the need to parenthesize arguments so as to avoid subtly wrong semantics. Writing local_strcmp() as a real function is more natural.

And macros are less likely to be understood by development tools such as browsers or debuggers. Inline functions are also a source of problems for such tools, but they at least are part of the C++ language proper, and many C++ compilers have a way of disabling inlining to help address this problem.

How much speedup is this approach good for? In the word stemming program, for input of about 65000 words, the times in seconds were:

strcmp() 9.7

inline local_strcmp() 7.5

#define local_strcmp() 7.5

or a savings of about 23%. Obviously, this figure will vary with the compiler and the application.

This particular speedup is achieved by exploiting a common case -- the case where the first letters of two strings are different. For applications involving English words, this is often a good assumption. For some other types of strings, it may not be.

0 comments: