Deep understanding of Intel Core microarchitecture

Source: Internet
Author: User
Tags prefetch

Core 2, Level 2 cache.

Level 1 cache is divided into 32KB l1i cache and 32KB l1d cache, are 8-way group linked write back buffer,64bytes per line, each core has an independent L1 cache, sharing L2 cache and bus interface, L2 Cache for the 16-way group, 64bytes per line, and L1 cache between the data bandwidth of 256bit. Two core l1d cache can transmit data between each other, L1 cache has several data and instruction hardware prefetching, L2 cache prefetch is based on L1 cache prefetch access mode and intensity to work, An improved Round-robin algorithm is used to dynamically allocate bandwidth between two processors. The front-end bus interface also uses a similar approach to ensure balance. L1 Cache and L2 cache use an independent access design, that is, core can directly from the L2 L1 or main memory directly from the data, do not need to step up. Intel cache uses the mainly inclusive design, compared with the exclusive design used by AMD.

The importance of aligning data and instruction addresses

Modern microprocessor architecture in load and store memory, if you access an integer multiple of n, you can manipulate n bytes (n = 2^m) at a time, but if the operation's n bytes are at an address that is less than 0, then you need 2 or a previous clock cycle, especially if the address spans the cache After line, the performance impact is more serious. For example, read 4 byte int variable, if this variable in the memory physical address (note is the physical address, not the linear address) is exactly 4 of the integer times, you can efficiently complete the operation (in the same buffer). And the data is aligned to its natural length. Another advantage is that the probability of data crossing the cache line is minimized (we'll talk about the cache workings of the core microprocessor architecture later). Since alignment is so important, how do we get the data on the aligned address?

We can use __declspec (align (N)) for global variables to achieve the goal, of course, some of the intel related types such as __m128, such as:

typedef __declspec (align) int a16int;

A16int i;

Now the address of this variable i is 16-byte aligned.

For variables that are in the stack, of course, we can also use __declspec (align (N)) to align variable addresses, but we need to be reminded that the default stack alignment for Microsoft Visual C + + compiler is 4BYTE. But the official document says the compiler will intelligently judge the data organization, to make the necessary alignment conversion, the stack default alignment is quite bizarre, I tested it with the Compile optimization option, in fact, the order of variables in the stack is usually inconsistent with the order in which they are defined, and there is no guarantee of struct byte alignment, double in 8 words section alignment unless you manually add __declspec (align (N)), when the optimization condition is/o2, the display stack in the test is natural length aligned, limited to my personal test, I did not find the relevant document clearly stating that the stack must be naturally aligned with the data under/o2 conditions, so if the Specify where you want the data to align, or add the alignment flag.

Alignment of dynamic memory allocations

Refer to _aligned_malloc, _mm_malloc, if the memory allocated with malloc is greater than 8, then the memory is 8-byte aligned, otherwise the minimum 2 of the multiple-side alignment, for example, if 7 bytes are allocated, 4 bytes are aligned.

Pentium Pro, 2 and 3 pipeline

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.