Linux Kernel Study Notes: CPU cache line alignment

Source: Internet
Author: User
High-speed cache of CPU is generally divided into level-1 cache and level-2 Cache. More CPUs now provide level-3 cache. When the CPU is running, it first reads data from the first-level cache. If the read fails, it reads data from the second-level cache. If the read fails, it reads data from the memory. However, the gap in the clock cycle between the CPU and the data that is eventually read from the first-level cache, second-level cache, or main memory is very large. Therefore, the high-speed cache capacity and speed directly affect the CPU performance. The first-level cache is built inside the CPU and runs at the same speed as the CPU, which can effectively improve the high-speed cache of the CPU. Generally, it is divided into the first-level cache and the second-level cache, today, more CPUs provide third-level caching. When the CPU is running, it first reads data from the first-level cache. If the read fails, it reads data from the second-level cache. If the read fails, it reads data from the memory. However, the gap in the clock cycle between the CPU and the data that is eventually read from the first-level cache, second-level cache, or main memory is very large. Therefore, the high-speed cache capacity and speed directly affect the CPU performance. The first-level cache is built in the CPU and runs at the same speed as the CPU, which can effectively improve the CPU running efficiency. The higher the level-1 cache, the higher the CPU running efficiency.

Level-1 cache is divided into data cache and command cache, which are composed of high-speed cache lines. For CPU in X86 architecture, high-speed cache lines are generally 32 bytes, in the early stage, the CPU only had about 512 rows of High-speed cache rows, that is, about 16 KB of first-level cache. The current CPU is generally a cache of more than 32 KB.

When the CPU needs to read a variable, the memory data of the variable in 32-byte groups will be read into the cache line together. Therefore, for programs with strict performance requirements, it is very important to make full use of the advantages of high-speed cache rows. Alignment 32-byte frequently accessed data at one time and read it into the cache to reduce data exchange between the CPU advanced cache and low-level cache and memory.

However, for computers with multiple CPUs, the situation is different. For example:

1. CPU1 reads a byte and Its Adjacent bytes are read into the cache of cpu1.

2. CPU2 has done the same job. In this way, the cache of CPU1 and CPU2 has the same data.

3. CPU 1 modifies the byte. After modification, the byte is put back to the cache line of CPU 1. However, this information is not written into RAM.

4. CPU2 accesses this byte, but because CPU1 does not write data to RAM, data is not synchronized.

When a CPU modifies the bytes in the cache row, other CPUs in the computer will be notified, and their cache will be regarded as invalid. Therefore, in the above situation, CPU2 finds that the data in its cache is invalid, and CPU1 will immediately write its data back to RAM, and then CPU2 reads the data again. It can be seen that the high-speed cache row may cause some disadvantages on the multi-processor.

From the above situation, we can see that when designing the data structure, we should try to separate the read-only data from the read/write data, and try to combine the data accessed at the same time. In this way, the CPU can read the required data at a time.

 

For example:

Struct _

{

Int id; // not easy to change

Int factor; // variable

Char name [64]; // not easy to change

Int value; // variable

};

Such a data structure is very unfavorable.

In X86, you can try to modify and adjust it.

Struct _

{

Int id; // not easy to change

Char name [64]; // not easy to change

Char _ Align [32-sizeof (int) + sizeof (name) * sizeof (name [0]) % 32]

Int factor; // variable

Int value; // variable

Char _ Align2 [32-2 * sizeof (int) % 32]

 

};

 

32-sizeof (int) + sizeof (name) * sizeof (name [0]) % 32

32 indicates that the cache behavior in the X86 architecture is 32 bytes in size. _ Align is used for explicit alignment.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.