108 odd tricks for Linux programming-4 (Compilation) (continued)

Source: Internet
Author: User

This blog compiled and expanded was pushed by csdn to the homepage website.

The key code is:

# Define do (x) x
# Define do4 (x) x
# Define do8 (x) do4 (x) do4 (X)
# Define do16 (x) do8 (x) do8 (X)

It is not hard to understand that the readers can continue to expand based on the gourd image.

 

We chose to calculate the Fibonacci number as an example and expanded the code to a multiple of 16. However, because FX is input by the user, it may be a multiple of 16 or not, so some conversions are required, change FX to fx = 16Idx+RIn this way, you can perform idx expansion, and then use a loop at the end, as shown below:

Int r = FX % 16;
Int idx = FX/16;
Int I = 2;
For (; I <idx ;)
{
Do16 (F [I] = f [I-1] + F [I-2]; I ++;); // unfold to 16 segments of code
}
For (; I <FX; I ++)
{
F [I] = f [I-1] + F [I-2];
}
If we do not need to compile for loop expansion, our code may have to be

For (; I <idx ;)
{
F [I] = f [I-1] + F [I-2]; I ++;

F [I] = f [I-1] + F [I-2]; I ++;

F [I] = f [I-1] + F [I-2]; I ++;

It's ugly to write the same code 16 times.
}
For (; I <FX; I ++)
{
F [I] = f [I-1] + F [I-2];
}
Second, we should pay special attention to the use of memset. memset is performed before timing because such calculation is more accurate. malloc allocates large memory in the MMAP mode, that is, only virtual addresses are allocated, there is no actual page adjustment, and we make a memset for page adjustment. You can do this experiment, perform the same malloc twice, and use rdtsc for timing, you will find that the first time malloc will be slower, and the second time will be faster, because there is a page adjustment for the first time, and the second time there will be almost no page adjustment (the memory should be large enough, otherwise it may cause swap, there may also be a small number of page adjustments ).

I believe it is not hard to understand with memset, because this time mix may not see the error, assuming that a city's GDP in the first quarter is 10, the second quarter is 15, it seems to have increased by 50%, however, in the calculation process, five pieces of GDP (which can be imagined as the cost of memset) are included in the formula. If the five portions are deducted, it actually increased from 5 to 100%. Therefore, when designing an experiment, we need to deduct some interference items and compare them to make the experiment data valuable.

 

Third, my experiment results show that four times are relatively fast. I think there are some main reasons, but further experiments need to be designed to prove that:

(1) If the compilation and expansion layers are too large, the code will become larger, and the code is stored in the file, the execution requires a read disk process. In addition, if the code is large, the Code locality is not strong, part of L1 cache stores code segments. The smaller the code, the more compact it will be executed and the 1024 layers will be expanded. The Code locality must be poor.

(2) The compilation and expansion layers are too small. Although the code is compact, the pipeline is not smooth and there are too many jumps, which may also lead to lower performance.

 

Therefore, there will always be a level of expansion that can trade off the pipeline and the closeness of the code segment. The result of my experiment is the fastest to expand to Layer 4, which is affected by the machine environment, I don't know what the results of other students' experiments are.

 

The last semicolon of do16 (F [I] = f [I-1] + F [I-2]; I ++;); is unnecessary, but it is added here to make the code more natural. I believe most careful readers can see this.

 

Other articles in this series: http://blog.csdn.net/pennyliang/category/746545.aspx

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.