108 odd tricks for Linux programming-4 (Compilation) (continued)

Last Update:2018-12-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This blog compiled and expanded was pushed by csdn to the homepage website.

The key code is:

# Define do (x) x
# Define do4 (x) x
# Define do8 (x) do4 (x) do4 (X)
# Define do16 (x) do8 (x) do8 (X)
It is not hard to understand that the readers can continue to expand based on the gourd image.

We chose to calculate the Fibonacci number as an example and expanded the code to a multiple of 16. However, because FX is input by the user, it may be a multiple of 16 or not, so some conversions are required, change FX to fx = 16Idx+RIn this way, you can perform idx expansion, and then use a loop at the end, as shown below:

Int r = FX % 16;
Int idx = FX/16;
Int I = 2;
For (; I <idx ;)
{
Do16 (F [I] = f [I-1] + F [I-2]; I ++;); // unfold to 16 segments of code
}
For (; I <FX; I ++)
{
F [I] = f [I-1] + F [I-2];
}
If we do not need to compile for loop expansion, our code may have to be

For (; I <idx ;)
{
F [I] = f [I-1] + F [I-2]; I ++;

F [I] = f [I-1] + F [I-2]; I ++;

It's ugly to write the same code 16 times.
}
For (; I <FX; I ++)
{
F [I] = f [I-1] + F [I-2];
}
Second, we should pay special attention to the use of memset. memset is performed before timing because such calculation is more accurate. malloc allocates large memory in the MMAP mode, that is, only virtual addresses are allocated, there is no actual page adjustment, and we make a memset for page adjustment. You can do this experiment, perform the same malloc twice, and use rdtsc for timing, you will find that the first time malloc will be slower, and the second time will be faster, because there is a page adjustment for the first time, and the second time there will be almost no page adjustment (the memory should be large enough, otherwise it may cause swap, there may also be a small number of page adjustments ).

I believe it is not hard to understand with memset, because this time mix may not see the error, assuming that a city's GDP in the first quarter is 10, the second quarter is 15, it seems to have increased by 50%, however, in the calculation process, five pieces of GDP (which can be imagined as the cost of memset) are included in the formula. If the five portions are deducted, it actually increased from 5 to 100%. Therefore, when designing an experiment, we need to deduct some interference items and compare them to make the experiment data valuable.

Third, my experiment results show that four times are relatively fast. I think there are some main reasons, but further experiments need to be designed to prove that:

(1) If the compilation and expansion layers are too large, the code will become larger, and the code is stored in the file, the execution requires a read disk process. In addition, if the code is large, the Code locality is not strong, part of L1 cache stores code segments. The smaller the code, the more compact it will be executed and the 1024 layers will be expanded. The Code locality must be poor.

(2) The compilation and expansion layers are too small. Although the code is compact, the pipeline is not smooth and there are too many jumps, which may also lead to lower performance.

Therefore, there will always be a level of expansion that can trade off the pipeline and the closeness of the code segment. The result of my experiment is the fastest to expand to Layer 4, which is affected by the machine environment, I don't know what the results of other students' experiments are.

The last semicolon of do16 (F [I] = f [I-1] + F [I-2]; I ++;); is unnecessary, but it is added here to make the code more natural. I believe most careful readers can see this.

Other articles in this series: http://blog.csdn.net/pennyliang/category/746545.aspx

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

108 odd tricks for Linux programming-4 (Compilation) (continued)

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support