Tuesday, October 12, 2010

Optimization - 03

Optimization - Inline Functions in C
  1. Introduction
GNU C (and some other compilers) had inline functions long before standard C introduced them (in the 1999 standard). The point of making a function inline is to hint to the compiler that it is worth making some form of extra effort to call the function faster than it would otherwise - generally by substituting the code of the function into its caller. As well as eliminating the need for a call and return sequence, it might allow the compiler to perform certain optimizations between the bodies of both functions.
Sometimes it is necessary for the compiler to emit a stand-alone copy of the object code for a function even though it is an inline function - for instance if it is necessary to take the address of the function, or if it can't be inlined in some particular context, or (perhaps) if optimization has been turned off. (And of course, if you use a compiler that doesn't understand inline, you'll need a stand-alone copy of the object code so that all the calls actually work at all.)
There are various ways to define inline functions; any given kind of definition might definitely emit stand-alone object code, definitely not emit stand-alone object code, or only emit stand-alone object code if it is known to be needed. Sometimes this can lead to duplication of object code, which is a potential problem for following reasons:
  • It wastes space.
  • It can cause pointers to what is apparently the same function to compare not equal to one another.
  • It might reduce the effectiveness of the instruction cache. (Although inlining might do that in other ways too.)
  1. Motivation
Inline expansion is used to eliminate the time overhead when a function is called. It is typically used for functions that execute frequently. It also has a space benefit for very small functions, and is an enabling transformation for other optimizations.
Without inline functions, however, the compiler decides which functions to inline. The programmer has little or no control over which functions are inlined and which are not. Giving this degree of control to the programmer allows her/him to use application-specific knowledge in choosing which functions to inline.
In some computer languages inline functions interact intimately with the compilation model. In C++, for example, it is necessary to define an inline function in every module that uses it, whereas an ordinary function must be defined in only a single module. This makes it possible to compile a single module independently of all other modules.
  1. Microsoft Visual C++ Specific
Microsoft Visual C++ and few other compilers support non-standard constructs for defining inline functions, such as __inline and __forceinline specifiers.
  1. __inline
The __inline keyword is equivalent to inline.
  1. __forceinline
The __forceinline keyword allow the programmer to force the compiler to inline the function, but indiscriminate use of __forceinline can result in larger code (bloated executable file), minimum or none performance gain and in some cases even performance losses. The compiler cannot inline the function in all circumstances, even with the __forceinline keyword applied. If the compiler cannot inline a function declared with __forceinline, a warning of level 1 is generated.
  1. cases when __forceinline will not take effect
  • The function or its caller is compiled with /Ob0 (the default option for debug builds).
  • The function and the caller use different types of exception handling (C++ exception handling in one, structured exception handling in the other).
  • The function has a variable argument list.
  • The function uses inline assembly, unless compiled with /Og, /Ox, /O1, or /O2.
  • The function is recursive and not accompanied by #pragma inline_recursion(on). With the pragma, recursive functions are inlined to a default depth of 16 calls. To reduce the inlining depth, use inline_depth pragma.
  • The function is virtual and is called virtually. Direct calls to virtual functions can be inlined.
  • The program takes the address of the function and the call is made via the pointer to the function. Direct calls to functions that have had their address taken can be inlined.
  • The function is also marked with the naked __declspec modifier.
  1. __forceinline is useful if:
  • inline or __inline is not respected by the compiler (ignored by compiler cost/benefit analyzer)
  • code portability is not required
  • inlining results in a necessary performance boost
  1. Example of portable code
#ifdef _MSC_VER
#define INLINE __forceinline // use __forceinline (VC++ specific)
#define INLINE inline // use standard inline

INLINE void helloworld() { /* function body */ }
  1. C99 Inline Rules
The specification for inline is section 6.7.4 of the C99 standard (ISO/IEC 9899:1999). Unfortunately this isn't freely available. The following possibilities exist:
  1. A Standalone Function
A function where all the declarations (including the definition) mention inline and never extern. There must be a definition in the same translation unit. The standard refers to this as an inline definition. No stand-alone object code is emitted, so this definition can't be called from another translation unit.
You can have a separate (not inline) definition in another translation unit, and the compiler might choose either that or the inline definition.
Such functions may not contain modifiable static variables, and may not refer to static variables or functions elsewhere in the source file where they are declared.
  1. In this example, all the declarations and definitions use inline but not extern:
// a declaration mentioning inline
inline int max(int a, int b);

// a definition mentioning inline
inline int max(int a, int b) {
return a > b ? a : b;
The function won't be callable from other files; if another file has a definition that might be used instead.
The standard is vague on this point. It says that the inline definition does not forbid an external definition elsewhere, but then that it provides an alternative to an external definition. Unfortunately this doesn't really make clear whether this external definition must actually exist.
In practice unless you are torture-testing your compiler it will exist: if you wanted to keep your inline function entirely private to one translation unit, you make it static inline.
  1. A Function with extern
A function where at least one declaration mentions inline, but where some declaration doesn't mention inline or does mention extern. There must be a definition in the same translation unit. Stand-alone object code is emitted (just like a normal function) and can be called from other translation units in your program.
The same constraint about statics above applies here, too.
  1. In this example all the declarations and definitions use inline but one adds extern:
// a declaration mentioning extern and inline
extern inline int max(int a, int b);

// a definition mentioning inline
inline int max(int a, int b) {
return a > b ? a : b;
  1. In this example, one of the declarations does not mention inline:
// a declaration not mentioning inline
int max(int a, int b);

// a definition mentioning inline
inline int max(int a, int b) {
return a > b ? a : b;
In either example, the function will be callable from other files.
  1. A Function with static
A function defined static inline. A local definition may be emitted if required. You can have multiple definitions in your program, in different translation units, and it will still work. Just dropping the inline reduces the program to a portable one (again, all other things being equal).
This is probably useful primarily for small functions that you might otherwise use macros for. If the function isn't always inlined then you get duplicate copies of the object code, with the problems described above.
A sensible approach would be to put the static inline functions in either a header file if they are to be widely used or just in the source files that use them if they are only ever used from one file.
  1. In this example the function is defined static inline:
// a definition using static inline
static int max(int a, int b) {
return a > b ? a : b;
The first two possibilities go together naturally. You either write inline everywhere and extern in one place to request a stand-alone definition, or write inline almost everywhere but omit it exactly once to get the stand-alone definition.
main is not allowed to be an inline function.
(If you think I've misinterpreted these rules, please let me know!)
(C++ is stricter: a function which is inline anywhere must be inline everywhere and must be defined identically in all the translation units that use it.)
  1. GNU C Inline Rules
The GNU C rules are described in the GNU C manual, which is included with the compiler. This is freely available if you follow links from e.g. http://gcc.gnu.org. The following possibilities exist:
  1. A Standalone Function
A function defined with inline on its own. Stand-alone object code is always emitted. You can only write one definition like this in your entire program. If you want to use it from other translation units to the one where it is defined, you put a declaration in a header file; but it would not be inlined in those translation units.
This is of rather limited use: if you only want to use the function from one translation unit then static inline below makes more sense, if not the you probably want some form that allows the function to be inlined in more than one translation unit.
However it does have the advantage that by defining away the inline keyword, the program reduces to a portable program with the same meaning (provided no other non-portable constructions are used).
  1. A Function with extern
A function defined with extern inline. Stand-alone object code is never emitted. You can have multiple such definitions and your program will still work. However, you should add a non-inline definition somewhere too, in case the function is not inlined everywhere. This provides sensible semantics (we can avoid duplicate copies of the functions' object code) but is a bit inconvenient to use.
One approach to using this would be to put the definitions in a header file, surrounded by a #if that expands to true either when using GNU C, or when the header has been included from the file that is contain the emitted definitions (whether or not using GNU C). In the latter case the extern is omitted (for instance writing EXTERN and #define-ing that to either extern or nothing). The #else branch would contain just declarations of the functions, for non-GNU compilers.
  1. A Function with static
A function defined with static inline. Stand-alone object code may be emitted if required. We can have multiple definitions in our program, in different translation units, and it will still work. This is the same as the C99 rules.
As of release 4.3, GNU C supports the C99 inline rules mentioed above and defaults to them with the -std=c99 or -std=gnu99 options. The old rules can be requested in new compilers with the -gnu89-inline option or via the gnu_inline function attribute.
If the C99 rules are in force then GCC will define the __GNUC_STDC_INLINE__ macro. Since GCC 4.1.3, it will define __GNUC_GNU_INLINE__ if the GCC-only rules are in use, but older compilers use these rules without defining either macro. You could normalize the situation with a fragment like this:
#if defined __GNUC__ && !defined __GNUC_STDC_INLINE__ && !defined __GNUC_GNU_INLINE__
# define __GNUC_GNU_INLINE__ 1
  1. Strategies For Using Inline Functions
These rules suggest several possible models for using inline functions in more or less portable ways.
  1. A Simple Portable Model
Use static inline (either in a common header file or just in one file). If the compiler needs to emit a definition (e.g. to take its address, or because it doesn't want to inline some call) then you waste a bit of space; if you take the address of the function in two translation units then the result won't compare equal.
For instance, in a header file:
static inline int max(int a, int b) {
return a > b ? a : b;
You can support legacy compilers (i.e. anything without inline) via -Dinline="", although this may waste space if the compiler does not optimize out unused functions.
  1. A GNU C Model
Use extern inline in a common header and provide a definition in a .c file somewhere, perhaps using macros to ensure that the same code is used in each case. For instance, in the header file:
#ifndef INLINE
# define INLINE extern inline
INLINE int max(int a, int b) {
return a > b ? a : b;
...and in exactly one source file:
#define INLINE
#include "header.h"
Supporting legacy compilers is awkward unless you don't mind wasting space and having multiple addresses for the same function; you need to restrict the definitions to a in single translation unit (with INLINE defined to the empty string) and add some external declarations in the header file.
  1. A C99 Model
Use inline in a common header, and provide definitions in a .c file somewhere, via extern declarations. For instance, in the header file:
inline int max(int a, int b) {
return a > b ? a : b;
...and in exactly one source file:
#include "header.h"
extern int max(int a, int b);
To support legacy compilers, you have to swap the whole thing around so that the declarations are visible in the common header and the definitions are restricted to a single translation unit, with inline defined away.
  1. A Complicated Portable Model
Use macros to choose either extern inline for GNU C, inline for C99, or neither for a definition. For instance, in the header:
#ifndef INLINE
# if __GNUC__ && !__GNUC_STDC_INLINE__
# define INLINE extern inline
# else
# define INLINE inline
# endif
INLINE int max(int a, int b) {
return a > b ? a : b;
...and in exactly one source file:
#define INLINE
#include "header.h"
Supporting legacy compilers has the same issues as the GNU C model.
  1. Problems With Inline Functions
Besides the problems associated with inline expansion in general, inline functions as a language feature may not be as valuable as they appear, for a number of reasons:
  • Often, a compiler is in a better position than a human to decide whether a particular function should be inlined. Sometimes the compiler may not be able to inline as many functions as the programmer indicates.
  • An important point to note is that the code (of the inline function) gets exposed to the calling function.
  • As functions evolve, they may become suitable for inlining where they were not before, or no longer suitable for inlining where they were before. While inlining or un-inlining a function is easier than converting to and from macros, it still requires extra maintenance which typically yields relatively little benefit.
  • Inline functions used in proliferation in native C-based compilation systems can increase compilation time, since the intermediate representation of their bodies is copied into each call site where they are inlined. The potential increase in code size is mirrored by a potential increase in compilation time.
  • The specification of inline in C99 requires exactly one additional external definition of a function in another compilation unit, when the corresponding inline definition that can occur multiple times in different compilation units, if that function is used somewhere. That can easily lead to linker errors because such a definition wasn't provided by the programmer. In particular, the C++ specification of inline does not require an additional non-inline definition of such a function. For this reason, inline in C99 often is used together with static, which gives the function internal linkage.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.