About PCIe non-transparent bridge cache consistency
The PCIe non-transparent bridge provides two mechanisms for migrating data from local node to remote node, respectively, based on address mapping and embedded
Dma. For remote nodes, the CPU may not be aware when it accepts data, so cache consistency needs to be ensured;
On the local node, when the data is transferred to its own memory via DMA, the CPU is not notified, so the cache should be considered
Consistency.
Different platforms implement the cache consistency mechanism, the ARM platform requires software participation, and the INTELX86 platform hardware can automatically dimension
Cache consistency. x86 provides different levels of cache consistency, and some special applications may require a custom cache-consistent
Management strategy.
1. Intel X86 Cache Conformance level
Intel defines different levels of conformance based on different application requirements:
650) this.width=650; "src=" Http://s3.51cto.com/wyfs02/M01/77/A0/wKiom1ZqX2byK1ZoAAL7mKcRJjg553.png "title=" cache_ Level.png "alt=" Wkiom1zqx2byk1zoaal7mkcrjjg553.png "/>
2. Three levels of Intel X86 cache management
Intel provides different granularity to manage cache consistency: The CD/NW bit of the CR0 register in the processor core: Enable or disable the cache for the entire system;
CR3 PCD/PWT bit and PCD/PWT property bit for page table and page catalog table entries: Controls all page tables, the cache properties of a specific page/page table, respectively;
MTRR (Memory Type Rang Register): Specifies whether the address of a range is cache or Uncache
3. Cache Consistency and DMA
DMA Buffer and DMA memory
DMB Buffer: is a part of physical memory that holds data from DMA or data to be sent to DMA
DMA Memory: A piece of storage space on a physical peripheral, such as a separate memory on the video card, or IO space, PCI memory space
Coherence DMA and streaming DMA
The function of DMA is to move data between DMA buffer and DMA memory, and the consistency requirement is guaranteed: when it is necessary to read from DMA
The data that is received is the latest data, and the data to be sent to the DMA must be up to date when the DMA is written.
Coherence DMA: If the DMA buffer corresponds to the physical memory contiguous, consists of a continuous physical page, as long as the DMA transfer
Data length allows DMA operations to pass data to these contiguous physical pages. The advantage of it is that fast, insufficient is the need to find
To a contiguous block of physical memory pages. In addition, it's good to ask for cache consistency. Kernel-provided dma_malloc_coherence ()
Function can do this two points, for x86, because the hardware has ensured the cache consistency of the DMA buffer, only need to find things
It is good to have a contiguous page block of addresses. If the hardware does not guarantee cache consistency, these physical addresses are required to be uncached.
Streaming DMA: If the virtual address of the DMA buffer is contiguous, but the physical address is not determined to be contiguous, a DMA
Descriptor to the physical address of the continuous requirements, you need to find the virtual address corresponding to all the physical page box, each with DMA transmission. It's good
There is no limit to the address, and the driver and the kernel shield the details of the Split physical page box, which is called along with the pass. In this mode, the cache consistency
Protection is dependent on the transmission direction:
From memory to DMA: To write the cache contents of each page box back to memory
From DMA to memory: to invalidate the cache for each page box, ensure that the subsequent access points to the memory
For X86, the hardware has implemented a mechanism for managing cache consistency, and the above cache writeback and invalidation work is not required.
4. PCIe non-transparent bridge cache consistency considerations
Regardless of special circumstances, according to the above analysis, in the case of data transmission using DMA, the local DMA buffer
The cache consistency is always guaranteed due to the address translation of the non-transparent bridge of the PCIe Opaque bridge, in the actual application scenario, the local
The DMA Memroy actually maps to the remote node's local DMA buffer, so its cache consistency is x86 hardware
of protection. Of course, if you consider the support for non-volatile storage of the PCIe opaque bridge, the requirement to prevent data loss, in addition to ensuring
Cache consistency, also requires:
All write accesses go directly to the memory;
All reads are also from memory (allows reading from the cache in the context of performance considerations)
Resources:
1.intel,ia32_dev_3a.pdf
2. Chen Cossong, "Deep Linux kernel device driving mechanism"
This article from "Storage Chef" blog, reproduced please contact the author!
Multi-function PCIe switch three: Data migration and its cache consistency