Thursday, October 7, 2010

Optimization - 02

Code Optimization - Inline Functions
Inline function is the optimization technique used by the compilers. One can simply prepend inline keyword/specifier to function prototype to make a function inline. This does not change the behavior of a function itself, but instruct compiler to embed the complete body of the function wherever that function is called, instead of being inserted only once and perform a regular call to it. When we declare a function or member function as inline we are trying to avoid the overhead of a function. If done carefully, this can improve the application's performance in exchange for increased compile time and possibly (but not always) an increase in the size of the generated binary executables.

//The format for its declaration is:
inline type name ( arguments ... )
// instructions ...

and the call is just like the call to any other function. You do not have to include the inline keyword when calling the function, only in its declaration.
Most compilers already optimize code to generate inline functions when it is more convenient. This specifier only indicates the compiler that inline is preferred for this function.
  1. Optimizations
C++ inline functions present a potential optimization to code: make a function inline and you save the overhead of a function call, Which is: loading parameters onto the stack, making a jump, popping parameters off the stack, and then it all in reverse when you return.
Modern CPUs have pipelines and attempt to optimize code execution on the fly. A function call can disrupt this pipeline when it is encountered, and again, when it returns. So that, not only we are avoiding the function call, but also we potentially allow the CPU to further improve execution.
Not only this, but a compiler may be able to optimize code which has been inlined in a way it cannot if there is a function call simply because the optimization becomes visible, e.g. optimizing register allocations. In the case of empty functions - fairly typical of constructors - making the function inline may allow the compiler to remove the call altogether.
Sometimes an inline function may save time and space. Making a function call requires instruction codes to make the call both on the part of the caller and callee. On occasions the number of bytes required for this may actually be greater than the number of bytes in the function body itself.
  1. Comparison to Macros
Traditionally, in languages such as C, inline expansion was accomplished at the source level using parameterized macros. Use of true inline functions, as are available in C99, provides several benefits over this approach:
  • Macro invocations do not perform type checking, or even check that arguments are well-formed, whereas function calls usually do.
  • In C, a macro cannot use the return keyword with the same meaning as a function would do (it would make the function that asked the expansion terminate, rather than the macro). In other words, you cannot make the macro return something which is not the result of the last expression invoked inside it.
  • Since C macros use mere textual substitution, this may result in unintended side-effects and inefficiency due to re-evaluation of arguments and order of operations.
  • Compiler errors within macros are often difficult to understand, because they refer to the expanded code, rather than the code the programmer typed.
  • Many constructs are awkward or impossible to express using macros, or use a significantly different syntax. Inline functions use the same syntax as ordinary functions, and can be inlined and un-inlined at will with ease.
  • Debugging information for inlined code is usually more helpful than that of macro-expanded code.
Many compilers can also inline expand some recursive functions; recursive macros are typically illegal.
Bjarne Stroustrup, the designer of C++, likes to emphasize that macros should be avoided wherever possible, and advocates extensive use of inline functions.
  1. Inline as a Better Macro
C programmers used to simulate inline functions with macros. C++ inline functions where in part, an attempt to encode this as a language feature rather than yet another use of the macro pre-processor. However, they are not an exact replacement.
  • If you use a macro the compiler has no desecration, the function will be inlined.
  • Scope rules are different.
  • Macros, particularly multi-line macros are much more difficult to understand and maintain.
In short, inline is an improvement on macro, but they are not a direct replacement.
  1. Catches
With these kinds of optimizations available we might wonder why we don't just make all functions inline. Simply, there are downsides, and in my opinion they out weigh the advantages.
For starters inline is only a hint to the compiler, it is free to ignore it. It will typically ignore it if:
  • The member is a virtual: it's not really possible to have an inline virtual function as virtual implies an indirect function call - although compiler can sometimes do it if it has type information.
  • The function is recursive or mutually recursive with other functions.
  • The compiler will also ignore the inline if it considers the body of the function "too big." How big is too big depends on your compiler.
The Other Inline
C++ inline functions are not the only "inlines" supported by most compilers. It is also possible to place assembler code within C++ code and this is commonly referred to as "inline assembler". It's normally clear which type is being talked about, but it's more difficult when doing a keyword search in MSDN or a search engine to pin down what you want.
The days when most systems contained some inline assembler to improve speed are fast receding. Traditionally used to speed up performance critical sections of a program it's now doubtful that even this is possible. Increasingly, to get the best performance out of a CPU requires you to know a lot of things about memory layout, cache architecture, pipelining, out of order execution and so on. While these rules can be encoded in a good compiler it is hard for an individual to know and apply all these rules.
  • There are a few occasions where inline assembler is still useful though. Microsoft use a little bit in the ATL to "thunk" a function call, and sometimes hard coding a break point in your code (asm { int 03h; } in Intel speak) has it's uses.
  • Inline code will not be generated when using a variable number of arguments to a function call (e.g. printf would not be inlined) or a variable sized data type, e.g.
void foo(int n)
char str[n];
  • Compilers have a limit to the depth they will inline functions, e.g. if inline A calls inline B, which calls inline C and so on. This is often configurable through a pragma or command line switch.
Of course, every compiler will differ slightly and you or I can't guarantee that there isn't a compiler out that can inline some of these things. Conversely, there are more reasons why your compiler will not inline a function that those given here. These are clear-cut cases but there are several other reasons why I avoid using inline functions. The main culprit is: code bloat, inline functions usually make your code bigger because you have duplicated code for each invocation of the function. Apart from needing a bigger hard disc code bloat has several implications:
  • Increased number of CPU cache misses: a CPU may keep a frequently executed function entirely in it's cache, if you increase the function size with lots of inline functions it may be too big for the cache, and because your functions are bigger fewer may be held in cache at once.
  • Disc cache: if every function is inlined the OS may not load the entire program into memory and your disc will start to thrash.
So, don't be surprised if making methods inline actually makes your program slower!
In addition embedded systems are very sensitive to code bloat: a PC may have a big CPU cache, oodles of fast RAM and a big fast hard disc but an embedded system probably won't. In the extreme this may mean you need to fit your device with a bigger ROM, which means it costs more, which in a competitive market may make the difference between success and failure.
  1. And some more catches...
There are catches which have to do less with an individual piece of code and more with how a system hangs together. These design issues mean you should think carefully before making a method inline:
  • Coupling between headers and implementation increases because a change is more likely to occur in the header file, which causes a bigger rebuild because other files depend on your header files. If the code is in the implementation file by contrast only that file needs to be recompiled. On large systems the difference can be hours.
  • Putting the implementation in the header file reduces the opportunity to forward declare classes, so you have to include another header file, increasing coupling; making the code less flexible and more prone to ripple effects - again increasing build times.
Perhaps a more significant point is what a lot of inline functions says about your design. Inlining is typically used forget and set functions. Classes with lots of such functions aren't really hiding anything, they are just containers for values which begs the question, just how object oriented is your design?
Some other points that are worth noting about inlined functions are:
  • Finding code in source files can be time consuming, if you have lots of inline functions you need to look in the .h files as well as the .cpp files. This may seem trivial but on a large system with several thousand files the time adds up.
  • If some of your code is compiled with inlining enabled and some with them disabled you will encounter link errors.
  • Some compilers have an auto-inline function that lets the compiler decide which functions to inline and which to leave as is. Of course, how aggressive this is, and hence how much difference it makes, is dependent on your compiler.
  • Inlining is usually only enabled when optimised code is being generated, so your debug builds won't have any inlined functions.
  • Although inline is commonly thought of as a C++ optimisation it is also available in C, and other languages, although these may need vendor specific extensions.
  • Theoretically each inline function should have one, and only one implementation body. However, it would be possible to provide different bodies in different files. Stroustrup points out in the Design & Evolution of C++ that most compilers don't check for this, we can hope that in the 7 years since he made his comments the situation has improved but I'm not sure. Before anyone think of a use for this "feature" consider the maintenance nightmare and please don't do it.
Seeing inline at work
The most reliable way to see if a function is being inlined or not is to look at the output from the compiler. Most compilers have a switch to output assembler code for your inspection. In Visual C++ there are several /FA which control this - the "listing file type" in the project C++ settings box.
The other switches worth playing with are the /Ob which control the aggressiveness of inline expansion - the "Inline function expansion" option in C++ settings.

It's interesting to experiment with these switches and the following example:

struct Widget {
int number_;
int t() const;

Widget::Widget() : number_(99) { }

int Widget::t() const
return number_;

int main()
Widget w;
int i = w.t();
return i;

It's interesting to make the constructor and t function inline and look at the assembler code, or use the /Ob2 switch to force functions to be inline, at this point the compiler reduces the program to:

push 99
pop eax

The program still comes out at 19,968 bytes, which demonstrates the overhead that even the simplest programs may carry.
  1. When to Use Inlines
Despite all this there are occasions when inlines make sense.
  • Templates based code is always inlined: this is one of the reasons people believe templates lead to bigger executables.
  • Sometimes making a function inline will improve performance for all the reasons given at the start.
The orthodox approach to performance today is to design a system for ease of maintenance, flexibility and extendibility, and only consider performance second. Usually, this means producing a working system and then performance profiling, improving the slowest bit and repeating until the system is fast enough.
When you profile your code you may find a hot spot in a function that could benefit from inlining. This is quite understandable, and at this point in the cycle your can afford to sacrifice some maintainability for speed if it is needed. However, after making the change repeat the performance test to ensure it actually does help.
One compromise is to avoid inlining other than for templates. Then as part of the QA or performance cycle use your compilers auto-inline feature. This compromise is not ducking the issue, or delivering code with sub-optimal performance. In fact, it's taking an engineering line, we build something, measure it, and if necessary improve it. (Stroustrup questions if letting the compiler decide is the best policy.)
  1. Advantages
  • It does not require function calling overhead.
  • It also save overhead of variables push/pop on the stack, while function calling.
  • It also save overhead of return call from a function.
  • It increases locality of reference by utilizing instruction cache.
  • After in-lining compiler can also apply intraprocedural optmization if specified. This is the most important one, in this way compiler can now focus on dead code elimination, can give more stress on branch prediction, induction variable elimination etc..
  1. Disadvantages
  • May increase function size so that it may not fit on the cache, causing lots of cahce miss.
  • After in-lining function if variables number which are going to use register increases than they may create overhead on register variable resource utilization.
  • It may cause compilation overhead as if some body changes code inside inline function than all calling location will also be compiled.
  • If used in header file, it will make your header file size large and may also make it unreadable.
  • If somebody used too many inline function resultant in a larger code size than it may cause thrashing in memory. More and more number of page fault bringing down your program performance.
  • Its not useful for embeded system where large binary size is not preferred at all due to memory size constraints.
  1. Performance
Now covering the topic which most the people are interested in the "Performance".
In most of the cases Inline function boost performance if used cautiously as it saves lots of overhead as discussed in our Advantages section above but as we have also discussed its disadvantages one need to be very cautious while using them. Today's modern compiler inline functions automatically, so no need to specify explicitly in most of the cases. Although placing inline keyword only gives compiler a hint that this function can be optimized by doing in-lining, its ultimately compiler decision to make it inline. Though there are ways to instruct compiler too, for making a function call inline like one can use __forceinline to instruct compiler to inline a function while working with microsoft visual c++. I suggest not to use this keyword until you are very sure about performance gain. Making a function inline may or may not give you performance boost, it all depends on your code flows too. Don't expect a magical performance boost by prepending inline keyword before a function to your code as most of the compiler nowadays does that automatically.
As we have seen inline function serves in terms of performacen but one has to use it with extreme cautions.
  1. Uses Guidelines
  • Always use inline function when your are sure it will give performance.
  • Always prefer inline function over macros.
  • Don't inline function with larger code size, one should always inline small code size function to get performance.
  • If you want to inline a function in class, then prefer to use inkine keyword outside the class with the function definition.
  • In c++, by default member function declared and defined within class get linlined. So no use to specify for such cases.
  • Your function will not be inlined in case there is differences between exception handling model. Like if caller function follows c++ structure handling and your inline function follows structured exception handling.
  • For recursive function most of the compiler would not do in-lining but microsoft visual c++ compiler provides a special pragma for it i.e. pragma inline_recursion(on) and once can also control its limit with pragma inline_depth.
  • If the function is virtual and its called virtually then it would not be inlined. So take care for such cases, same hold true for the use of function pointers.
  1. Conclusion
While inline functions may improve performance seldom will you be able to radically change a program's performance by making a bunch of functions inline. Inlines are not free; they will usually increase your program size and may actually impair performance.
They may also increase your development cycle because they increase coupling between files and lead to longer build times and more ripple effects.
Heavy reliance on inline functions may indicate flaws in your design; either a lack of object-orientedness or failure to design a system with the required performance characteristics.


No comments:

Post a Comment

Note: Only a member of this blog may post a comment.