13 Performance Optimization

Source: Internet
Author: User
Tags case statement
 
Performance Optimization GPROF, a program performance analysis tool in Linux

GPROF is installed in the/usr/bin directory of Linux. It analyzes your program and determines which part of the program is the most time-consuming.

GPROF will tell you the number of calls to each function in the program and the percentage of time each function is executed. This information is useful if you want to improve the performance of your program.

To use GPROF in your program, you must add the-PG option when compiling the program. this will make the program generate a gmon every execution. out file. GPROF uses this file to generate profiling information. after you run your program and generate gmon. after the file is out, you can use the following command to obtain the profiling information:

GPROF <program_name>

The program_name parameter is the name of the program that generates the gmon. Out file. To illustrate the problem, the count_sum () function is added to the program to consume CPU time. The program is as follows:

# Include <stdio. h>

Static void my_print (char *);

Static void my_print2 (char *);

Main ()

{

Char my_string [] = "Hello world! ";

My_print (my_string );

My_print2 (my_string );

My_print (my_string );

}

Void count_sum ()

{

Int I, sum = 0;

For (I = 0; I <1000000; I ++)

Sum + = I;

}

Void my_print (char * string)

{

Count_sum ();

Printf ("the string is % s", string );

}

Void my_print2 (char * string)

{

Char * string2;

Int size, I, sum = 0;

Count_sum ();

Size = strlen (string );

String2 = (char *) malloc (size + 1 );

For (I = 0; I <size; I ++) string2 [size-1-I] = string [I];

String2 [size] = '';

For (I = 0; I <5000000; I ++)

Sum + = I;

Printf ("the string printed backward is % s", string2 );

}

$ Gcc-PG-O hello. c

$./Hello

$ GPROF Hello | more

The following output will be generated:

Flat profile:

Each sample counts as 0.01 seconds.

% Cumulative self total time seconds US/call us/call name

69.23 0.09 0.09 1 90000.00 103333.33 my_print2

30.77 0.13 0.04 3 13333.33 13333.33 count_sum

0.00 0.13 0.00 2 0.00 13333.33 my_print

From the data above, we can see that it takes no time to execute the my_print () function, but it calls

Count_sum () function, so the cumulative number of seconds is 0.13.

Tip: GPROF generates a large amount of profiling data. If you want to check the data, you 'd better redirect the output to a file.

.

Factors Affecting software Efficiency

Software performance is determined by two factors, for example. Software design is language-independent. Designers must fully understand the theme functions of the software and do not expect to solve performance problems caused by poor design by coding. Encoding also affects the performance. For example, you should not put a constant expression in a loop, which will increase the number of computations.

The impact on performance in software design includes two factors: algorithm and data structure, and program decomposition. From a technical point of view, a program is an algorithm, but the term "algorithm and data structure" usually refers to searching, sorting, accessing, compressing, and operating large data sets. Algorithms and data structures often affect program performance, but they are not the only factor. Program decomposition includes the decomposition of a complete large task into a series of associated sub-tasks, object architecture, functions, data and business flow.

The effect of encoding on performance can also be divided into four sub-factors, for example.

C ++ has supplemented and improved C, but some C ++ components have sacrificed some performance. This is a language factor.

The current design of Many operating systems will make you feel that the memory does not seem to be used up, and can be executed in parallel. The CPU is dedicated to serving your own program and the unified memory access mode, but in fact, even if the operating system is designed with virtual memory, the memory will be used up; the CPU cannot always execute your program, but all programs use the CPU in turn, each time a certain amount of time slice is obtained for execution, the memory access mode is not uniform, and the access to the hard disk and memory and high-speed cache are different, so the performance is different. On a single CPU machine, parallel Execution often does not bring benefits, and may also lead to slow program running.

Different Libraries provide the same functions, but the performance is not necessarily the same, such as sprintf and ITOA.

Compiler Optimization depends entirely on the compiler implementation vendor, so there is a big difference.

Suggestions for writing high-performance code

In the name of performance, the design or code is more complex, resulting in worse readability, but there is no verified performance requirement (such as the actual measurement data and comparison results with the target) as a legitimate reason, it is essentially not good for the program.

First, you should pay attention to the Code as clearly and easily as possible, and the clear code is easier to optimize. It is easier to let the program do the right thing first and then make it faster.

However, you must master some common skills, such as prefix-type ++, -- operation, transfer reference, delay definition variable, and initialization of member variables using the initialization list.

Do not use inline too early. use tools to analyze which functions actually require inline.

When you really need to optimize the performance, first use modern performance analysis tools to identify the main bottlenecks in the program.

However, as a library designer, it is almost impossible to predict which operations will finally be used for performance-sensitive customer code. In this special case, experience, speculation, and extensive testing of Customer Code are almost all used. For example, Ace will inline the member functions of the C ++ class in the wrapper facade layer. I believe this is a tested decision.

Constructor and constructor

If there is an inheritance, the constructor of a derived class always calls the constructor of the parent class first, in addition, the compiler must generate code to correctly set the virtual function table and virtual function pointer (because the parent class usually requires virtual destructor, the virtual function table is usually unavoidable ), and the initialization of the parent class and subclass member variables. If this inheritance system is deeper, more will be paid. Because destructor are the inverse process of constructors, they also have to pay a lot of cost. Therefore, in a system with high performance requirements, we need to consider this factor in designing C ++ classes to avoid generating redundant parent classes. (This is hard to grasp, because a design that conforms to is a will naturally design the parent class)

In combination, a class constructor must correctly initialize its member variables. If a member variable has a constructor, it may also be called, as is the case for destructor.

Virtual Functions

The advantage of virtual functions is that dynamic types provide better abstraction. The customer Code only needs to deal with the base-class interface, and the code is more elegant and easy to maintain. However, virtual functions may affect performance in three aspects:

1) the initialization of the virtual function table layout (discussed earlier) will cause performance overhead for the construction and destructor.

2) virtual functions use the runtime lookup table and then call appropriate functions. Therefore, it is slower than direct function calls.

3) the compiler usually has difficulty and complexity in inline processing of virtual functions. Some compilers do not support inline virtual functions.

The first two do not cause performance burden, because if you do not use virtual functions, as a common alternative, you can set a type variable for the base class, then, each sub-type sets the value of this variable in the constructor. These overhead is equal to the pointer variable in the initialization virtual function table. Then, use the switch/case statement to determine the type and perform forced type conversion (from the base class pointer to the subclass pointer). These overhead is equivalent to the search operation of the virtual function table.

It seems that the only problem is that if the virtual function is relatively simple and frequently called, failure to inline will lead to performance loss.

The performance problems caused by virtual functions can sometimes be eliminated. Think about the close relationship between virtual functions and inheritance. If there is no inheritance, there will be no virtual functions. There is one way to reduce the dependency on inheritance, template.

For example, if there is a mystring class that supports thread security, we may have two thread synchronization schemes: mutex and critical section. Assume there are two classes, they all implement the lock and unlock functions:

Class criticalsection

{

Public:

Void lock ();

Void unlock ();

...
};

Class mutex

{

Public:

Void lock ();

Void unlock ();

...
};

Here, we do not intend to let these two classes derive from the lockbase class, so that the lock and unlock functions become virtual functions, which are common member functions. We designed mystring as a template class:

Template <typename lock>

Class mystring

{

PRIVATE:

Lock _ lock;

};

Using the template technology, we replace the Runtime polymorphism with the compilation polymorphism, which effectively eliminates virtual functions. Therefore, the chance that the lock and unlock functions can be inline is greatly increased.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.