The path to C program optimization

Source: Internet
Author: User

The path to C program optimization
This article is excerpted from a game to a development article.
This article describes the common optimization methods for compiling C program code, including I/O, memory, algorithm, and MMX assembly.
I. I/O
If a file is read/written, access to the file will be a major factor affecting the program running speed. There are two main ways to speed up file access: one is to use memory ing files, and the other is to use memory buffering. The following is a set of test data (see section 3.9 of advanced programming in UNIX environment). The results of Reading 1 468 802-byte files with 18 different cache lengths are displayed.
Buffer size user CPU (second) system CPU (second) clock time (second) cycle times (second)
123.8397.9423.41 468 802
212.3202.0215.2734 401
46.1100.6107.2367 201
83.050.754213183 601
161.525.327.091 801
320.712.813.745 901
640.36.67.022 951
1280.23.33.611 476
2560.11.81.95 738
51202131.01.12 869
1 0240.00.60.61 435
2 0480.00.40.4718
4 0960.00.40.4359
8 191000000.30.3180
16 3840.00.30.390
32 7680.00.30.345
65 5360.00.30.323
131 072.1600.30.312

It can be seen that when the memory buffer size is 8192, the performance is already the best, which is why in Image Coding programs such as H.263, the reason why the buffer size is 8192 (sometimes the buffer size is 2048 ). The main advantage of using the memory buffer method is ease of transplantation, low memory usage, and easy hardware implementation. Below is the c pseudo code for reading the file:
Int Len;
Byte buffer [8192];
Assert (buffer = NULL );
If buffer is empty {
Len = read (file, LD-> rdbfr, 8192 );
If (LEN = 0) No data and exit;
}
However, if the memory size is relatively large, the use of memory ing files can achieve better performance, and programming is simple. For more information about memory ing, see the Platform SDK in msdn October 2001.
Documentation-base services-file storage-file mapping. Below are some suggestions:
① The Memory ing file cannot exceed the virtual memory size. It is best not to be too large. If the memory ing file is close to the virtual memory size, on the contrary, the program speed will be greatly reduced (in fact, the system running efficiency will be reduced due to insufficient virtual memory). At this time, you can consider block ing, but I think if this is the case, it is better to directly use the memory buffer.
② The two methods can be used in a unified manner. For example, when I process large image files (because of the UNIX workstation, the memory is large in GB), the memory ing file is used, however, for the best performance, a row of image cache is also used, which ensures that only sequential read/write (in memory ing files, provides special optimization for sequential read/write ).
③ It is a little tricky to use a memory ing file when writing a file: You should first create a large enough file, and then map the file. After processing the file, use the setfilepointer and setendoffile functions to cut the end of the file.
④ Operations on memory ing files are similar to operations on memory (just like an array). If there is large data read/write, remember to use memcpy () function (or copymemory () function)

To use a memory ing file, you must: 1. the file to be processed is relatively small, 2. the file to be processed is large, but the runtime environment memory is also large. Generally, when running this program, you do not run other programs that consume large memory. At the same time, you have special requirements on speed, there are no requirements for memory usage. If the preceding two conditions are not met, we recommend that you use the memory buffer.
Ii. Memory
In the previous article, we talked about how to optimize file read/write. This article focuses on the optimization of memory operations, including array addressing, pointer linked list, and some practical skills.
I. optimized array addressing.
When programming, we often use a one-dimensional array a [m × n] to simulate a two-dimensional array a [n] [M]. when accessing a [] One-dimensional array: we often write a [J × m + I] (for a [J] [I]) in this way. There is no doubt about this writing, but it is clear that every addressing statement J × m + I must carry out a multiplication operation. Now let's look at the addressing of two-dimensional values. Here we have to go deep into the internal details of the C compiler applying for two-dimensional arrays and one-dimensional arrays-actually applying for two-dimensional arrays and one-dimensional arrays, the processing of the compiler is different. Applying for an array of a [n] [m] takes more space than applying for an array of a [m x n! The structure of a two-dimensional array is divided into two parts:
① Is a pointer array that stores the starting address of each row, which is why in a [n] [m, A [J] is a pointer rather than a [J] [0] data.
② Is a real m × n continuous data block, which explains why a two-dimensional array can be addressed as a one-dimensional array. (That is, a [J] [I] is equivalent to (a [0]) [J × m + I])
With this in mind, we can know that the addressing efficiency of a two-dimensional array is higher than that of a one-dimensional array (simulating the two-dimensional array. Because the addressing of a [J] [I] is only to access the pointer array to get the address of Row J, and then + I, there is no multiplication operation!
Therefore, when processing a one-dimensional array, we often use the following optimization method: (pseudo code example)
Int A [M * n];
Int * B =;
For (...)
{
B […] = ...;
............
B […] = ...;
B + = m;
}

This is an optimization example for traversing the access array. Every time B + = m is used, B is updated to the header pointer of the next row. Of course, if you want to, you can define an array pointer to store the starting address of each row. Then, the addressing method of the Two-dimensional array is used to process the one-dimensional array. However, here I suggest you simply apply for a two-dimensional array. Below is the C code for dynamically applying for and releasing a two-dimensional array.
Int get_mem2dint (INT *** array2d, int rows, int columns) // H.263 source code
{
Int I;

If (* array2d = (INT **) calloc (rows, sizeof (int *) = NULL) no_mem_exit (1 );
If (* array2d) [0] = (int *) calloc (rows * columns, sizeof (INT) = NULL) no_mem_exit (1 );
For (I = 1; I (* Array2d) [I] = (* array2d) [I-1] + columns;
Return rows * columns * sizeof (INT );
}
Void free_mem2d (byte ** array2d)
{
If (array2d)
{
If (array2d [0])
Free (array2d [0]);
Else
Error ("free_mem2d: trying to free unused memory", 100 );
Free (array2d );
}
Else
{
Error ("free_mem2d: trying to free unused memory", 100 );
}
}
By the way, if your array addressing has an offset, do not write it as a [x + offset], instead of B = a + offset, and then access B [X].
However, if you are not a program that requires special processing speed, such optimization is unnecessary. Remember, if you compile a Common Program, readability and value portability are the first.
Ii. array starting from negative number
Do you often have to deal with boundary problems during programming? When dealing with boundary problems, subscripts often start from negative numbers. Generally, we separate the boundary processing and write it with additional code. When you know how to use an array starting with a negative number, the boundary processing is much easier. The following is a static array starting from-1:
Int A [m];
Int * pA = a + 1;
Now if you use Pa to access a is from-1 to M-2, it's that simple. (If you dynamically apply for a, free (a) is not free (PA) Because PA is not the header address of the array)
Iii. Do we need a linked list?
I believe you are quite familiar with the linked list when learning "Data Structure", so I think some people use the linked list when writing some time-consuming algorithms. Writing in this way certainly consumes less memory (it seems), but the speed? If you test: apply for and traverse the Time of the 10000 element linked list and traverse the array of the same element, you will find that the time difference is times! (I used to test an algorithm. The linked list is 1 minute and the array is 4 seconds ). So my suggestion here is: Do not use linked lists when writing time-consuming code!
In fact, using a linked list does not actually save memory. When writing many algorithms, we know how much memory will be occupied (at least we know ), as a result, it is better to use arrays to consume memory in one step than using linked lists to consume a little bit of memory. The form of a linked list must be when there are few elements or this part is not time-consuming.
(I estimate the slow speed of the linked list is that it applies for memory step by step. If a large memory block can be allocated like an array, it will not take much time. This has not been tested in detail. Just conjecture: P)
This article describes the common Optimization Methods for writing C program code, including I/O, memory, and algorithm. MMX was originally intended to be here, but because the content and title are not very similar, I decided to change the name to MMX technical details, and the mmx application in the h263 video compression technology.
Iii. Algorithm
In the previous article, we talked about the optimization of memory operations. This article mainly describes some common optimization algorithms. There are too many things, and the content may be a little messy. Sorry.
I. Starting from an early point:
Let's talk about some tips first:
① For example, writing n/2 to n> 1 is a common method, but note that these two are not completely equivalent! Because: If n = 3, n/2 = 1; n> 1 = 1; however, if n =-3, n/2 =-1; n> 1 =-2. Therefore, when a positive number is used, they are all rounded down, but the negative number is different. (The YUV-to-RGB integer in jpg2000 must be used> to replace division)
② There is also a = a + 1 to be written as a ++; A = a + B should be written as a + = B (it is estimated that a = a + 1: P will be written only when VB is used)
③ Merge multiple operations: for example, a [I ++]; that is, access a [I] first, and then add I to 1; from the Assembly perspective, this is indeed optimized. If it is written as a [I], and I ++, there may be two reads to the I variable, one write (depending on the compiler's optimization capabilities), but if a [I ++] is used, the I variable must be read-only once. However, there is a problem here: Be careful when merging in the condition judgment, such as: (0 block judgment in IDCT transformation, Chen Wang algorithm)
If (! (X1 = (BLK [8*4] integrates the value assignment statement in condition judgment. However, if the condition is true, these value assignment statements are not required, that is to say, when the condition is true, some junk statements are added, which is a problem with h263 source code. Although these junk statements increase the time by 30% when 0 blocks are calculated, however, because IDCT only accounts for 1% of the time, 0 blocks are only 30% ~ 70% of the time, so these performance losses are irrelevant. (This is my conclusion when I used the compilation to rewrite the source code ). It also shows that the focus of program optimization is on the most time-consuming part. It does not have much practical significance for non-time-consuming code optimization.
Ii. Change speed with memory:
The world is always hard to achieve, and programming is the same. In most cases, the speed is the same as the memory (or performance, such as compression performance or something. At present, one of the common algorithms for Program Acceleration is to use the lookup table to avoid computation (for example, there is a Huffman code table in JPG and a conversion table from YUV to RGB) in this way, the original complex computation is now only available for table queries. Although the memory is wasted, the speed is significantly improved and it is quite cost-effective. This idea also exists in database queries, storing hot spots to accelerate queries. Now we will introduce a simple example (for temporary purposes, haha): for example, it should be frequent in the Program (it must be frequent !) Calculate the factorial between 1000 and 2000, then we can use an array a [1000] to calculate these values first, keep them, and calculate 1200 in the future! You can check the table a [1200-1000.
Iii. Convert to zero
Due to the scattered memory allocation and time-consuming creation of a large number of small objects, their optimization is sometimes very effective. For example, the problems in the linked list I mentioned in the previous article, this is because a large amount of scattered memory is allocated. Now let's start with a VB program. When I used VB to compile small programs for others, (Haha, I can write a program later than VC in half a day) when using the msflexgrid control (a table control), I found that if a row adds a new row, the refresh speed is very slow, so I add 100 rows each time, when the data volume is too large to add a new row, add 100 rows, so that the data volume is changed to an integer. Using this method, the refresh speed is n times faster than the original one! In fact, there are a lot of such ideas and applications. For example, when the program runs, it actually takes up a certain amount of space. Later, the allocation of small pieces of memory is based on this space, this ensures that as few memory fragments as possible while accelerating the operation.
Iv. conditional statements or case statements are most likely to be placed before
The optimization effect is not obvious. If you want to get it, use it. If you don't think of it, forget it.
V. For the sake of program readability, do not do the processing that the compiler can do or the optimization is not obvious:
This is very important. A common program is good or bad, mainly because of its readability, portability, reusability, and then its performance. Therefore, if the compiler itself can help us with optimization, we do not need to write things that everyone does not understand very well. For example, a = 52 (end)-16 (start); this write may be because when someone else reads the program, it will understand the meaning of. We don't need to write it as a = 36, because the compiler will help us calculate it.
Iv. Specific analysis:
The specific analysis of specific situations is a perfect truth. Without specific analysis, you cannot flexibly apply solutions to problems. Next I will talk about the analysis method. That is, how to find the time consumption point of the program: (starting from the simplest method, first, a function gettickcount () is called each time at the beginning and end. The return value is the time consumption of the program, accurate to 1 ms)
① For a function that is considered to be time-consuming, run it twice, or comment out the internal statements of the function (ensure that the program can run) to see how much time (or less) is spent. This method is simple and inaccurate.
② The gettickcount () function test time is used in each place. Note that gettickcount () can only be accurate to Ms. Generally less than 10 ms is not accurate.
③ Use another function: queryperformancecounter (& Counter) and queryperformancefrequency (& frequency). The CPU clock cycle is calculated before, and the CPU frequency division is time. However, if you want to be accurate to this step, we recommend that you set the process to the highest level to prevent it from being blocked.
Finally, let's talk about a program I am processing: The program requires me to forget that there is a function in it. There is a large loop in the function, and the processing inside the loop is time-consuming. As a result, the program initially showed that the process started very quickly and became slower. When I tracked the variables in the program, I found that the initial loop jumped out after several loops, the subsequent cycles increase. After finding out why the cycle is slow, you can take the right remedy. My solution is that each cycle does not start from the beginning, instead, it starts the Left and Right cycles from the place where the previous loop jumps out (because the next loop may jump out, rather than the last small one, we need to traverse the previous one ), the speed of the program is also very fast. In practice, we need to analyze the real cause of slow programs to achieve optimal optimization results.

This article is transferred from
Http://g6a71.mail.163.com/a/s? Func = mbox: readmessagehtml & SID = nbtopwddidaqlwqfbjddfkqdolcowjld & Mid = 8: 1 tbisgefleifvkrwcaaast

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.