Sunday, October 7, 2012

C++'s bad things for performance critical applications

One of the most often disputed things about C++ are its exceptions, virtuals, and runtime type information (RTTI).  Some of the arguments centered around these are often subjected to bias, and I should disclaimer that this article may be subjected to my bias as well.  But I'd like to take the time and reflect a few of the common statements about these thing.

  1. Virtual functions are slow!
  2. Exceptions are slow!
  3. RTTI is slow!

Virtual functions are slow:

Back when C++ introduced virtual functions the state of computing in general was (for a lack of a better word) shitty.  The cost of a virtual table look up on a 100MHz CPU was negligible, but a general rule of thumb now is if it cost more than allocating from main memory, it's slow.  Virtual functions aren't that slow, they're a little more expensive than a non-virtual member function, if they're called in a predictable pattern, if that pattern changes at any-time you will get a miss-prediction penalty of 10 to 30 clock cycles [reference] (page 44 of Agners Frog's "Optimizing Software in C++" ).  This performance penalty isn't really terrible on modern computers, but if you find yourself working with embedded systems or need to barge a little more speed out of your application you might want to reconsider your use of virtual functions.

Exceptions are slow:

Exceptions are a really handy feature to have in C++, they've got their wonderful uses, but again suffer from some serious overhead and might not work well in performance critical situations.  To understand why they're so slow let me explain to you how a typical implementation of exception handling (which is used in visual studio) works. I should note that no compiler for C++ currently implements exceptions in a zero-overhead fashion (that actually works), for both visual studio and GCC the compiler adds prologue and epilogue instructions to track information about the currently executing scope.  This information allows for faster stack unwinding, although this has one problem and that is, it greatly imposes a larger run time overhead when a exception isn't thrown.  In visual studios explicit case, the compiler creates hidden local variables in every function that references back to an exception handler function, and a table of data specific for that function [reference] ("How a C++ compiler implements exception handling.").  The overhead of exception handling in this example for when there is an absence of a throw is literally the cost of registering a handler as each function is entered, as well as the unregistering when the function is left.  If you observe some visual studio produced assembler you can instantly see that's about three push instructions, two movs for the prologue, and about three more movs for the epilogue.  This is (as you can already tell) rather expensive.  Of course there is yet another problem we derive from all this stuff the compiler splices into the instruction stream, that is cache locality.  With the presence of all these exception handlers (scattered through the instruction space) our cache locality is severely affected.  So when you consider all the variables in this particular example, the overhead of exception handling when an exception is thrown is the overall cost of unwinding the stack, iterating over all the function information tables for every function.  Each of these info tables must also be searched to find stack objects that require destruction, and to check the appropriate type catch blocks for what ever was thrown, handle the destruction accordingly and jump to that scope.  This is all very slow.  So you might want to reconsider your use of exceptions.

Runtime Type Information is slow:

RTTI is often abused for things like dynamic_cast, a clever cast that does a type check to see if the cast between two types is legal, however some might not recognize that this operation is very expensive both in time and space complexity.  The space cost itself for RTTI is very high, literally every class that has at least one virtual method will be given a vtable with some information about it's name, and information about it's base classes, this information grows exponentially for deep hierarchies, as you can already tell.  There is no point in further discussing this, RTTI is just plain wrong.