Intel System Programming Guide Chapter 1-11th Cache Control

Source: Internet
Author: User
Tags prefetch intel core i7 intel core 2 duo

The intel 64 and IA-32 architectures provide a variety of caching mechanisms for controlling data and instructions, as well as mechanisms for controlling the read/write order between processors, caches, and memories. These mechanisms can be divided into two groups:

 

1,Cache control registers and bits-- Intel 64 and the IA-32 architecture define several specialized registers and individual bits within the control register, as well as pages and directory table entries that control the cache system memory locations in L1, L2, and L3 caches. These mechanisms control the cache of virtual memory pages in the physical storage area.

 

2,Cache Control and memory order commands-- The Intel 64 and IA-32 architectures provide several commands to control data cache, memory read/write order, and data prefetch. These commands allow the software to control the cache of a specified data structure to Control Memory consistency at a specified location in the memory, and to force the specified location in a program to have a strong memory order.

 

The following sections describe two sets of Cache control mechanisms.

 

11.5.1 Cache control registers and bits

 

Figure 11-3 depicts the Cache control mechanism in the IA-32 processor. Unlike the memory address space, these work in the same way as Intel 64 processor.

The intel 64 and IA-32 architecture provides the following Cache control registers and bits to allow or restrict cache for various pages or regions in the memory:

 

1,CD flag, control register Cr0 bit 30-- Control the cache of the system memory location (see section 2.5 ). If the CD mark is cleared to 0, the cache is allowed for the entire memory, but individual pages or regions of the memory can be restricted through other Cache control mechanisms. When the CD mark is set to 1, for P6 and the updated Processor family, the cache is restricted to the processor cache (Cache level), and for the Pentium processor is disabled. However, if the CD mark is set to 1, the cache will still respond to traffic monitoring. The cache should be explicitly washed out to ensure memory consistency. For the highest processor performance, the CD and NW bits in the control register Cr0 should be cleared.

The effect of setting the 1cd mark is somewhat different for the P6. To ensure the consistency of the memory after the CD flag is set to 1, the cache should be explicitly washed (see section 11.5.3 ). Set 1 for P6 or the updated Processor family to modify the cache row filling and update behavior. Setting the CD flag of these processors to 1 does not force a strict sequence of memory access, unless mtrr is disabled and/or all the memory is referenced as not cached (see section 8.2.5 ).

 

2,NW flag, control register Cr0 bit 29-- Write policy for system memory location control (see section 2.5 ). If the NW and CD flags are cleared to 0, write-back is allowed for the entire system memory, but individual pages or areas of the memory can be restricted through other Cache control mechanisms.

Note: For Pentium 4 and Intel Xeon processors, the NW mark is not a sign of concern. That is, when the CD mark is set to 1, the processor uses the cache mode without filling, whether or not 1 is set for NW.

For Intel Atom processor, the NW mark is not a sign of concern; that is, when the CD mark is set to 1, the processor prohibits the cache, regardless of whether the NW mark is set to 1.

For Pentium processors, when L1 cache is disabled (the CD and NW signs of the control register Cr0 are set to 1), external Snoop is accepted in the DP (dual-processor) system, in a single-processor system, external Snoop is prohibited.

When Snoop is prohibited, the address parity is not checked, and the apchk # signal is not asserted for a damaged address. However, when Snoop is accepted, the address parity is checked, and apchk # asserted the damaged address.

 

3,The PCD flag in the page Directory and page table entries-- Control the cache for individual page tables and pages respectively (see Section 4.9 ). This flag is valid only when the page is allowed and the CD flag in the control register Cr0 is cleared. The PCD flag allows cache page tables or pages when 0 is cleared, and the cache is disabled when 1 is set.

 

4,PWT flag in the page Directory and page table entries-- Control write policies for individual page tables and pages (see Section 4.9 ). This flag is valid only when the page is allowed and the NW flag in the control register Cr0 is cleared. PWT indicates that when 0 is cleared, the page table or page is allowed to be written back to the cache, while when 1 is set, the cache is written.

 

5,Control the PCD and PWT flags in the register S3.-- Control global cache and write policies for the page Directory (see section 2.5 ). The PCD mark allows cache for the page Directory when 0 is cleared, and the cache is disabled when 1 is set. The PWT mark allows the write-back cache of the page Directory when 0 is cleared, while the write cache is used when 1 is set. These flags do not affect cache and write policies for individual page tables. These flags are effective only when the page is allowed and the CD flags in the control register Cr0 are cleared.

 

6,G (global) flag in the page Directory and page table entries (introduced to the IA-32 architecture in the P6 family processor)-- Controls the erosion of TLB entries on individual pages. See section 4.10.

 

7,PGE flag in control register Cr4-- Allows creation of global pages with G signs. See section 4.10.

 

8,Memory type range register (mtrr) (introduced in P6 family processor)-- Controls the cache type used in the specified area of the physical storage. Any cache type described in section 11.3 can be selected.

 

9,Page Attribute Table (PAT) MSR (introduced in Pentium III processor)-- Extended processor memory type performance to allow memory types to be allocated on one page (see section 11.12 ).

 

10,Level 3 cache prohibited flagIa32_misc_enablesMSR bit 6 (only available on Intel netburst microarchitecture)-- The L3 cache is allowed and disabled, independent of L1 and L2 cache.

 

11,Ken #AndWB/WT #Pin (Pentium processor)-- Allow external hardware to control the cache method used to store the specified region. They execute mtrr functionality similar to the P6 family processor (but not the same ).

 

12,PCD and PWT pins (Pentium processor)-- These pins (associated with the PCD and PWT flags in the control register S3. the page table directory and page table entries) Allow the cache in an external L2 cache to be controlled on one page, it is consistent with the control of operations on the L1 cache of these processors. P6 and the updated Processor family do not provide these pins because L2 cache is inside the chip package.

 

 

11.5.2 Cache control priority

 

The cache control flag and mtrr have hierarchical operations to restrict the cache. That is, if the CD mark is set to 1, the cache is globally disabled (see table 11-5 ). If the CD flag is cleared to 0, the page-level Cache Control flag and/or mtrr can be used to restrict the cache. If there is a heap between page-level and mtrr Cache control, the cache disabling mechanism has a priority. For example, if an mtrr changes an area of the system memory into a non-cache, a page-level Cache control cannot be used to permit a page in that area. The opposite is true. That is, if a page-level Cache Control assigns a page that cannot be cached, A mtrr cannot change the page to a cache.

When the write-back and write-back cache policies allocate a stack to the memory of a page or region, the write policy obtains the priority. The write binding policy (which can only be assigned through mtrr or Pat) gets priority over write or write-back.

The selection of page-level storage types depends on whether Pat is used to select a storage type for the page. This is described in the following section.

On an Intel netburst-based Processor, Level 3 cache can be disabled with the ia32_misc_enable MSR bit 6. Use ia32_misc_enalbes [bits 6] to obtain priority over the CD mark, mtrr, and pat of the L3 cache in those processors. That is, when the third-level cache flag is set to 1 (the cache is disabled), other cache controls have no effect on the L3 cache; when the flag is cleared to 0 (allowed, the cache control on L3 has the same effect as those on L1 and L2 cache.

 

Ia32_misc_enalbes [bit 6] is not supported in the Intel core i7 processor, nor is it supported on the Intel Core and Intel Atom micro-architecture processor.

 

 

11.5.2.1 select a memory type for the Pentium Pro and Pentium II processors

 

The Pentium Pro and Pentium II processors do not support pat. Here, mtrr is used for the effective storage type of a page, as well as the page table or page Directory Entry in the PCD and PWT bit to select. Table 11-6 describes the mtrr memory type ing and the page-level cache attribute for the valid memory type. When the normal cache is valid (the CD and NW mark of the control register Cr0 are cleared ). The combination of the Pentium Pro and the Pentium II processor is defined in gray. System designers are encouraged to avoid the combination of these Implementation definitions.

 

When a normal cache is valid, the following rules are used to determine the valid memory type displayed in table 11-6:

1. If the page's PCD and PWT attributes are both 0, the valid memory type is the same as the memory type defined by mtrr.

2. If the PCD flag is set to 1, the valid storage type is UC.

3. If the PCD mark is cleared 0 and the PWT mark is set to 1, the valid memory type for WB memory is wt, mtrr defines valid memory types for all other storage types.

4. Swap the values of the PCD and PWT flags. The WP and WC memory types are considered to be specified by the model, while the WB, wt, and UC memory types are defined by the architecture.

 

11.5.2.2 select the memory type for the Pentium III and later processor families

 

Intel Core 2 Duo, Intel lingdong, Intel Core Duo, Intel Core solo, Pentium M, Pentium 4, Intel Xeon and Pentium III processors use pat to select effective page-level memory types. Here, the memory type of a page is selected through mtrr and the value in a pat entry. This Pat entry is selected by the pat, PCD, and PWT bits in a page table or page directory entry (see section 11.12.3 ). Table 11-7 describes the ing between the mtrr memory type and the Pat entry type to the valid memory type. When the normal cache is valid (the CD and NW flags in the control register Cr0 are cleared ).

 

11.5.2.3 use different memory types for cross-page write

 

If two adjacent pages in the memory have different storage types, and one word or longer operand is written to a memory location, the location spans the boundaries of those two pages, the operands may be written twice to the memory. This action does not present a problem for writing data to the memory. However, if a device is mapped to the bucket assigned to the two pages, the device may fail.

 

11.5.3 prevent Cache

 

To disable L1, L2, and L3 caches when they are allowed and are filled with the cache, perform the following steps:

1. Enter the non-fill cache mode. (Set the CD flag from the control register Cr0 to 1 to 1, and the NW flag is cleared to 0)

2. Run the wbinvd command to flush all the caches.

3. Disable mtrr and set the default storage type to not cached, or set 1 to all mtrr types not cached.

After the CD mark is set to 1, the cache must be washed (step 2) to ensure the consistency of the system memory. If the cache is not washed out, the read hit of the cache will still occur, and the data will be read from the valid cache row.

The independent steps listed above are intended to specify three clear requirements: (I) Stop the replacement of existing data in the cache, (ii) ensure that data in the cache has been evicted to the memory, and (iii) ensure that subsequent memory references observe the type semantics of the UC memory. The different processor implementations of the cache control hardware allow some changes to the software implementations of these three requirements.

 

11.5.4 disable and allow L3 Cache

 

On a netburst-based processor, the third-level cache can be disabled with the ia32_misc_enable MSR bit 6. The third-level cache flag (the bit 6 of ia32_misc_enable MSR) allows the L3 cache to be disabled and allowed, and is independent of L1 and L2 cache. Before using this control to allow or disable L3 cache, the software should disable and cleanse all processor caches, as described in section 11.5.3, to prevent information loss stored in L3 cache. After the L3 cache is disabled or allowed, the cache for the entire processor can be restored.

Update intel 64 processors with L3 do not support ia32_misc_enables [bit 6], which is described in section 11.5.3.

 

11.5.5 Cache Management commands

 

The intel 64 and IA-32 architecture provides several instructions for managing L1, L2, and L3 commands. Invd, wbinvd, and wbinvd commands are system commands that operate on the entire L1, L2, and L3 cache. Prefetchh and clflush commands, as well as non-temporary moving commands (movnti, movntq, movntdq, movntps, and movntpd) are introduced in SSE/sse2 extensions to provide greater cache granularity control.

The invb and wbinvd commands are used for invalid L1, L2, and L3 cache content. Invd command invalidates all internal cache entries, and then generates a special function Bus Cycle, indicating that the external cache should also be invalid. The invd command should be used with caution. It does not force the write back of the modified cache row. data that is stored in the cache row and is not written back to the system memory will be lost. Unless there is a specific requirement or it is possible to make a profit from the invalid cache without writing back the modified cache row (such as during testing or error recovery, the cache consistency of the primary storage is not concerned), the software should use the wbinvd command.

The wbinvd command First writes back any modified cache lines in all internal caches, and then invalidates the content of L1, L2, and L3 caches. It ensures that the cache consistency with the primary storage is maintained, regardless of the effective write policy (that is, write-through or write-back ). After this operation, the wbinvd command generates one (P6 family processor) or two (Pentium and intelease processor) special function bus cycles to indicate to the external cache controller that has been written back to the modified data, the subsequent invalidation of the external cache should occur. The time and period of wbinvd completion vary depending on the cache level and other factors. As a result, the use of the wbinvd command may affect the interrupt/event response time.

The prefetchh command allows a program to prefetch cache lines from a specified location in the system memory to the cache level recommended by the processor (see section 11.8 ).

The clflush command allows the selected cache lines to be washed out from the memory. This command explicitly releases the cache space capability for a program. When it is known that the system memory cache segments will not be accessed in the near future.

Non-temporary shift commands (movnti, movntq, movntdq, movntps, and movntpd) allow data to be directly moved from the processor's registers to system memory, instead of being written to L1, L2, and/or L3 cache. These commands can be used to prevent cache contamination when only one data operation is modified before being stored back to the system memory. These commands operate data in general purpose, MMX, and XMM registers.

 

11.5.6 L1 data cache context Mode

 

The L1 data cache context mode is a feature of an Intel netburst microarchitecture processor that supports intel hyper-Threading Technology. When cpuid.1: ECx [bit 10] = 1, the processor supports setting the L1 data cache context mode using the L1 data cache context mode flag (ia32_misc_enable [bit 24. Optional modes include adaptive (default) mode and shared mode.

BIOS is responsible for configuring the L1 data cache context mode.

 

11.5.6.1 Adaptive Mode

 

The Adaptive Mode facilitates L1 data cache sharing between logical processors. When running in adaptive mode, L1 data cache is shared across logical processors in the same core. if:

1. The control registers for the logical processor of the shared cache are the same.

2. The same paging mode is used by the logic processor of the shared cache.

In this case, the entire L1 data cache is available for each logic processor (rather than competitive sharing ).

If the b3.3 value is different for the logic processor that shares the L1 data cache, or the logic processor uses different paging modes, the processor will compete for the cache resources. This reduces the effective cache size for each logical processor. The cache alias is not allowed (this prevents repeated data exchange and replacement)

 

11.5.6.2 Sharing Mode

 

In the sharing mode, L1 data cache is shared among logical processors. This is true even if the logical processor uses the same 93 register and paging mode.

In the Shared Mode, the linear addresses in the L1 data cache can be aliases, which means that a linear address in the cache can point to different physical locations. The alias resolution mechanism will cause repeated data exchanges. For this reason, ia32_misc_enable [bit 24] = 0 is a better configuration for Intel netburst microarchitecture processors that support Intel hyper-Threading Technology.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.