Linux programming-C ++ game program optimization

Source: Internet
Author: User
Linux programming C ++ game program optimization-general Linux technology-Linux programming and kernel information. For details, see the following. In general, C ++ games are reusable and maintainable compared with C Programs. But is this really valuable? Can complicated C ++ be compared with traditional C Programs in terms of speed?
If you have a good compiler and a good understanding of the language, it is really possible to use C ++ to write some efficient game programs. This article describes several typical techniques you can use to accelerate your game. It assumes that you are already very certain about the benefits of using C ++, and you are quite familiar with the basic concepts of optimization.
The first basic concept that often benefits people is obviously the importance of profiling. If there is a lack of profiling, the programmer will make two kinds of errors. One is to optimize the wrong code: if the main indicator of a program is not efficiency, it is a waste of time to make it more efficient. Intuition is used to determine which code's main indicator is that efficiency is not credible and can only be measured directly. The second concept is that programmers often "optimize" to reduce the speed of code. This is a typical problem in C ++. A simple command line may generate a large number of machine code. You should always check the output of your compiler and analyze it.

1. Object Construction and Analysis
Object Construction and analysis is one of the core concepts of C ++ and a major part of the code generated by the compiler. Uncarefully designed programs often spend a lot of time calling constructors, copying objects and initializing temporary objects. Fortunately, the general feeling and a few simple rules can make the heavy object code run as little as C.
It is not constructed unless necessary.
The fastest code is the code that does not run at all. Why do you want to create an object that you don't even use? In the code below:

Voide Function (int arg)
{
Object boj;
If (arg = 0)
Return;
...
}

Even if arg is 0, we pay the cost of calling the Object constructor. Especially if arg is often 0 and the Object itself is still allocated memory, this waste will be more serious. Obviously, the solution is to move the definition of obj after judgment.
Be careful when defining complex variables in a loop. If a complex object is constructed in a loop based on the principle of not constructing unless necessary, in this case, you have to pay a constructor price for every loop. It is best to construct it only once outside the loop. If a function is called in an internal loop and the function constructs an object in the stack, You can construct it externally and pass an application to it.

1.1 Use the initialization list
Consider the following classes:

Class Vehicle
{
Public
Vehicle (const std: string & name)
{
MName = name
}
Private:
Std: string mName;
}

Because the member variables are constructed before the constructor is executed, this code calls the constructor of string mName and then calls the = Operator to copy the value. A typical disadvantage in this example is that the default constructor of string will allocate memory, but in fact it will allocate much more space than actually needed. The following code will be better and block the call to the = Operator. Further, the non-default constructor will be more effective because more information is provided, in addition, the compiler can optimize the constructor if its body is empty.

Class Vehicle
{
Public
Vehicle (const std: string & name): mName (name)
{}
Private:
Std: string mName;
}

1.2 Pre-auto-increment or post-auto-increment (I .e., ++ I or I ++)
When writing x = y ++, the problem is that the auto-increment function will create a copy of the original value to keep y, then auto-increment y, and return the original value. Post-auto-increment includes the construction of a temporary object, but not the pre-auto-increment. There is no extra burden on integers, but for user-defined types, this is a waste. You should use pre-auto-increment when possible, in the loop variable, you will often encounter this situation.
The addition of vertices is often seen in C ++ without the return value operator:

Vector operator + (const Vector & v1, const Vector & v2)

This operation will cause a new Vector object to be returned, and it must also be returned as a value. Although the expression v = v1 + v2 can be written in this way, the burden of constructing temporary objects and copying objects is as follows, it is too big for things that are often called like vertex addition. Sometimes the code can be well planned so that the compiler can optimize the temporary object (this is called the return value optimization ). But in more general cases, you 'd better put down your shelf and write code that is ugly but faster:

Void Vector: Add (const Vector & v1, const Vector & v2)

Note that the + = operator does not have the same problem. It only modifies the first parameter and does not need to return a temporary object. Therefore, you can replace ++ with ++ = if possible.

1.3 use lightweight Constructors
In the previous example, does the Vector constructor need to initialize its element 0? This problem may occur several times in your code. If yes, it makes all calls, whether necessary or not, have to pay the initialization cost. Typically, temporary vertices and member variables have to bear these extra overhead.
A good compiler can remove unnecessary code, but why is it so risky? As a general rule, you want the constructor to initialize all member variables, because uninitialized data will produce errors. However, in small classes that are frequently instantiated, especially some temporary objects, you should be prepared to compromise the efficiency rules. The first choice is the vector and Matrix classes in many games. These classes should obviously provide some methods to set 0 and identify, but their default constructor should be empty.
The inference of this concept is that you should provide another constructor for this type. If the Vebicle class in our second example is written as follows:

Class Vehicle
{
Public:
Vehicle ()
{
}
Void SetName (const std: string & name)
{
MName = name;
}
Private:
Std: string mName
};

We saved the overhead of constructing mName, and later set its value using the SetName method. Similarly, using the copy constructor is better than constructing an object and then using the = Operator. We would rather construct it like this: Vebicle V1 (V2) should not be constructed like this:

Vehicle v1; v1 = v2;

If you need to prevent the compiler from helping you copy objects, declare the copy constructor and operator = as private, but do not implement any of them. In this way, any attempt to copy this object will generate a compile-time error. It is best to develop the habit of defining Single-parameter constructor unless you want to perform type conversion. This prevents hidden temporary objects generated by the compiler during type conversion.

1.4 pre-allocated and Cache objects
A game generally has some categories that are frequently allocated and released, such as weapons. In the C program, you will allocate a large array and use it as needed. In C ++, after a small plan, you can do the same. Instead of constructing and destructing objects all the time, this method requests a new one and returns the old one to the Cache. Cache can be implemented as a template, which can work for all classes with a default constructor. The Cache template Sample can be found in the attached CD.
You can also allocate some objects to fill the Cache as needed, or pre-allocate them. If you want to maintain a stack for these objects (before you delete object X, you need to delete all objects allocated after object X ), you can allocate the Cache to a continuous memory block.

2. Memory Management
C ++ applications generally have to go deeper into the memory management details than C Programs. In C, all allocation is simply done through malloc and free, while C ++ can also implicitly allocate memory by constructing temporary objects and member variables. Many C ++ game programs require their own memory management programs. The C ++ game program needs to execute a lot of allocation, so be careful with heap fragments. One method is to select a complex path: either no memory is allocated after the game starts, or a large continuous memory block is maintained and released on schedule (such as between checkpoints ). On Modern Machines, strict rules are unnecessary if you want to be cautious with your memory usage.
The first step is to reload the new and Delete operators and use the self-implemented operators to direct the most frequently-used memory allocation in the game from malloc to pre-allocated memory blocks. For example, you find that you can allocate a maximum of 10000 4-byte memory at any time. You can allocate 40000 bytes first and then reference them as needed. To track which blocks are empty, you can maintain a free list that points each empty block to the next empty block. During the allocation, remove the previous block. When releasing the block, move the empty block to the front. Describes how the free list works with a series of allocation and release operations in a continuous memory block.


A linked free list


You can easily find that a game has a lot of short-lived memory allocations, and you may want to reserve space for many small blocks. It will waste a lot of memory to keep large memory blocks for those that are not currently in use. In a certain size, you should allocate the memory to a different large memory allocation function or directly to malloc ().

3. Virtual Functions
Critics of C ++ games always point their finger at virtual functions and think of them as a mysterious feature to reduce efficiency. Conceptually, the mechanism of virtual functions is very simple. To call a virtual function of an object, the compiler accesses the virtual function table of the object, obtains a pointer to the member function, sets the call environment, and jumps to the address of the member function. Compared with the function call of C program, C Program sets the call environment and jumps to an established address. The extra burden of calling a virtual function is the indirect direction of the virtual function table. because you do not know the address to be redirected in advance, it is also possible that the processor cannot hit the Cache.
All real C ++ programs use a large number of virtual functions, so the main method is to prevent virtual function calls in areas that place great importance on efficiency. Here is a typical example:

Class BaseClass
{
Public:
Virtual char * GetPointer () = 0;
};

Class Class1: public BaseClass
{
Virtual char * GetPointer ();
};

Class Class2: public BaseClass
{
Virtual char * GetPointer ();
};

Void Function (BaseClass * pObj)
{
Char * ptr = pObj-> GetPointer ();
}

If Function () attaches great importance to efficiency, we should change GetPointer from a virtual Function to an inline Function. One way is to add a new protected data member to BaseClass, set the value of the member in each class, and return the Member to the caller in the GetPointer inline function:

Class BaseClass
{
Public:
Inline char GetPointerFast ()
{
Return mpPointer;
}
Protected:
Inline void SetPointer (char * pData)
{
MpData = pData;
}
Private:
Char * mpData;
};

Void Function (BaseClass * pObj)
{
Char * ptr = pObj-> GetPointerFast ();
}

A more radical way is to re-plan your class inheritance tree. If Class1 and Class2 are only a little different, you can bind them to the same class, A Flag is used to indicate that it will work like Class1 or Class2, and remove pure virtual functions in BaseClass. In this way, you can write GetPointer as inline as in the previous example. This kind of flexibility does not seem very elegant, but when running an inner loop on a machine without a Cache, you may be willing to make things more ugly to get rid of virtual function calls.
Although each new virtual function only adds a pointer size to the virtual table of each class (usually at a negligible cost ), the first virtual function requires a pointer to the virtual table on each object. This means that using any virtual function in a very small and frequently used class causes extra burden, which is unacceptable. Generally, one or several virtual functions (at least one virtual destructor) are used for inheritance. Therefore, you do not need to use any inheritance on small and frequently used objects.

4. code size
The compiler is notorious for producing lengthy code in C ++. Because the memory is limited and small items are often fast, it is very important to make your executable files as small as possible. The first thing we can do is to use a compiler for research. If your compiler saves Debug information in the executable file, remove them. (Note that ms vc places the Debug information outside the executable file, so it does not matter) exception handling will generate additional code and try to remove the exception handling code. Make sure that the connector is configured to Remove useless functions and classes. Enable the maximum optimization level of the compiler and try to set it to minimize the size rather than maximize the speed-sometimes the Cache hit increases to produce better running results. (Check whether the instrinsic function is enabled when you use this setting.) Remove all strings that are wasted space in Debug output, the compiler binds multiple identical strings into an instance.
Inline is usually the first offense to create large functions. The compiler can freely choose to pay attention to or ignore the inline keywords you write, and they will create some inline with you. This is another reason you need to maintain a lightweight constructor, so that the objects in the stack will not expand because of a large amount of Inline code. At the same time, be careful to overload the operator. Even the simplest expression, such as m1 = m2 * m3, may generate a lot of Inline code if m2 and m3 are matrices. Be sure to have a deep understanding of your Compiler's inline settings.
To enable the runtime type information (RTTI), the compiler must generate some static information for each class. RTTI is generally enabled by default. In this way, our code can call dynamic_cast and detect the type of an object. We recommend that you disable RTTI and dynamic_cast to save space. (Further, sometimes dynamic_cast has to pay a high price in some implementations.) On the other hand, when you really need different types-based behaviors, add a virtual function with different behaviors. This is a better object-oriented design (note that static_cast is different from this, and its efficiency is the same as the type conversion in C ).

5. Standard Library (STL)
The standard library is a set of templates that implement common structures and algorithms, such as dynamic arrays (vector), set, and map. Using STL can save you a lot of time writing and debugging those containers. As mentioned earlier, if you want to maximize the efficiency of the system, you must pay attention to the specific implementation details of your STL.
To be able to correspond to the maximum range of applications, STL standards remain silent in the field of memory allocation. Each operation in the STL container has a certain efficiency guarantee. For example, to insert a set, it only takes O (log n) time. However, the memory usage of a container is not guaranteed.
Let's take a closer look at a very common problem in Game Development: You want to save a group of objects (we will call it an object list, although it doesn't have to be saved in the STL List) generally, you need to have only one object in this table, so you don't have to worry about accidentally inserting an existing unit operation in the container. The STL set ignores copies. All the insert, delete, and query speeds are O (log n). Is this a good choice?
Although the speed of most operations on the set is O (log n), there is still a potential crisis. Although the container memory usage depends on implementation, many implementations are implemented based on the red and black trees. On the red-black tree, each node of the tree is an element of the container. A common implementation method is to assign a node when each element is added to the tree, and release a node when each element is removed from the tree. Based on the frequency of insertion and deletion, the time spent on the memory manager will more or less affect the benefits you get by using the set.
Another solution is to use vector to store elements. vector ensures high efficiency in adding elements at the end of the container. This indicates that in fact, the vector only re-allocates the memory in the case of an accident, that is, it doubles when it is full. When using vector to save a list of different elements, you must first check whether the element already exists. If not, add it. It takes O (n) Time to check the whole vector, but the actual amount of time involved should be relatively small, because every element of the vector is stored continuously in the memory, therefore, checking the entire vector is actually an easy-to-cache operation. Checking the entire set will result in cache miss, because the elements stored separately on the red/black tree may be scattered in various corners of the memory. At the same time, we also noticed that set must maintain an additional set of tags to set the entire tree. If you want to save the object pointer, set may take 3 to 4 times the memory consumed by the vector.
The time consumed by the Set delete operation O (log n) seems to be very fast, if you do not consider calls to free. The deletion operation of the Vector consumes O (n) because each element is copied to the previous position from the beginning to the end of the deleted element. If all elements are pointers, this copy can be completed by a simple memcpy () function, which is quite fast. (This is also the reason why object pointers are usually stored in STL containers, rather than the object itself. If you directly Save the object itself, it will cause many additional constructor calls in many operations, such as deletion ).
Set and map are generally more troublesome than useful. If you are not aware of this, consider traversing a container. For example:

For (Collection: iterator it = Collection. begin (); it! = Collection. end (); ++ it)

If the Collection is a vector, ++ it is a pointer auto-incrementing. However, if the Collection is a set or a map, ++ it includes accessing the next node on the red/black tree. This operation is quite complex and can easily cause cache miss, because the nodes in the tree are almost everywhere in the memory.
Of course, if you want to save a large number of elements in the container and perform many Member requests, the efficiency of set O (log n) can fully offset the memory consumption. Similarly, if you use containers occasionally, the efficiency difference here is very small. You should make some efficiency evaluations to know how much n will make the set faster. You may be surprised to find that vector is more efficient than set in most typical applications of the game.
This is not all of STL memory usage. Be sure to know whether the container has actually released its memory when you use the clear method. If not, memory fragments may be generated. For example, if you create an empty vector when you start the game, add elements during the game, and then call clear during the game restart, then the vector may not release all of its memory. This empty vector may still occupy the memory in the heap and turn it into fragments. If you really need to implement the game in this way, there are two solutions to this problem. First, you can call reserve () when creating a vector to reserve enough space for the maximum number of elements you may need. If this is not feasible, you can force the vector to completely release the memory:

Vector V;
//... Elements are inserted into V here
Vector (). swap (v); // causes v to free its memory

Set, list, and map do not have this problem, because they allocate and release memory for each element separately.

6. Advanced features
You may not need to use some features of programming languages. Seemingly simple features may lead to low efficiency. However, it seems that complicated features may be executed well. These dark corners of C ++ depend on the compiler. When you want to use them, you must understand their costs.
The string of C ++ is a good example, but it should be avoided when the efficiency is extremely important. Consider the following code.

Void Function (const std: string & str)
{
}
Function ("hello ");

The call to Function () includes the call to the constructor of the given const char * parameter. In general implementation, this constructor executes a malloc (), a strlen (), and a memcpy (), the Destructor immediately started to do something meaningless. (Because the string in this example is not applied more) and then followed by a free (). Memory allocation is a waste because the string "hello" has long been in the Data Segment of the program. We already have a copy in the memory. If Function defines a const char * parameter, there is no additional call mentioned above. This is the high cost of using string for convenience.
Template is an example of the opposite of efficiency. According to language standards, the compiler generates code when the template is instantiated as a specific type. Theoretically, it seems that a template is declared, but a lot of similar code is actually generated. If you have a class1 pointer vector and a class2 pointer vector, you have made two copies of the vector in your executable file.
In fact, most compilers do better. First, only the template member functions actually used are generated. Secondly, if you know the correct behavior in advance, the compiler can generate only one copy of the code. You can find from the example of vector that only one code (usually vector) is generated ). With a good compiler, templates can still provide your general programming benefits while maintaining high efficiency.
Some features of C ++, such as initialization list and pre-auto-increment, can improve efficiency in general. Some other features, such as operator overloading and RTTI, seem innocent, but sometimes cause serious efficiency problems. STL containers describe how the algorithm running time of a function can lead you astray. Avoid using potentially inefficient language or class library features, and take some time to understand the various options of your compiler. You will soon learn to design efficient code and solve efficiency problems in your game.

7. Other References
Thanks to Pete Isensee and Christopher Kirmse revIEwing this gem.
Cormen, Thomas, Charles Leiserson, and Ronald Rivest, Introduction to Algorithms, Cambridge, mascript usetts, MIT Press, 1990
Isensee, Peter, C ++ Optimization Strategies and Techniques, www.tantalon.com/pete/cppopt/main.htm Koenig, Andrew "Pre-or Postfix Increment"
The C ++ Report, June, 1999 Meyers, Scott, valid tive C ++, Second Edition, Reading, mascript usetts: Addison-Wesley Publishing Co., 1998.
Sutter, Herb, Guru of the Week #54: Using Vector and Deque, www. gotw. ca/gotw/054.htm
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.