"Go" DMA and cache consistency

Source: Internet
Author: User

DMA and cache consistency issues

Cache principle

The CPU cache memory is a temporary memory that sits between the CPU and the RAM, which is much smaller than memory but is much faster to exchange than memory. The main purpose of the cache is to solve the contradiction between the CPU speed and the memory read/write speed, because the CPU operation speed is much faster than the memory read and write speed, so it will take a long time for the CPU to wait for data arrival or write data to memory. The data in the cache is a small part of the memory, but this is a small part of the CPU is about to be accessed in a short time, and when the CPU calls a lot of data, it can avoid the memory directly from the cache to call, thereby speeding up the reading speed.

As long as the cache space and main memory space within a certain extent to maintain a proper proportion of the mapping relationship, the cache hit rate is still quite high. Generally, the cache and memory space ratio is 4:1000, that is, 128kB cache can map 32MB memory, 256kB cache can be mapped 64MB memory. In this case. The hit rate is above 90%. As for data that is not hit, the CPU has to be fetched directly from memory. Get it and copy it into the cache as well.

Cache consistency Issues

Because the cache exists in the middle of the CPU and memory, any peripheral modifications to the memory do not guarantee the same updates in the cache, nor does the processor's modification of the contents of the caches ensure that the in-memory data is updated. This out-of-sync and inconsistency between data in the cache and in-memory data can cause errors when data is transferred using DMA or when the processor runs self-modifying code.

The cache consistency is the data in the direct cache, consistent with the corresponding in-memory data.

The basic structure of the cache

The cache is usually implemented by the associated memory. Each storage block of the associated memory has additional storage information, called a tag (tag). When the associated memory is accessed, the address and each label are compared simultaneously to access the same storage block as the label. The 3 basic structures of the cache are as follows:

Full-connected cache

In the fully-connected cache, the stored blocks are between blocks. And there is no direct relationship between the storage order or the saved memory address. Programs can access many subroutines, stacks, and segments, which are located on different parts of the main memory. So. The cache holds a number of unrelated pieces of data.

The cache must store the address of each block and block itself. When the data is requested, the cache controller compares the request address with all addresses to confirm.

The main advantage of this cache structure is.

It is able to store the different blocks in the main memory device in a given time, the hit rate is high; the disadvantage is that each time the request data is compared with the address in the cache, it takes a considerable amount of time and is slower.

Direct image Cache

The direct image cache is different from the full-associated cache. The address needs to be compared only once.

In the direct image cache. Because each main memory block has only one location in the cache, the number of addresses is reduced to one time. This is done by assigning an index field to each block location in the cache, and using the tag field to distinguish between different blocks stored in the cache location. A single direct image divides the main memory into several pages. Each page of the main memory is the same size as the cache memory. The offset of the matching primary memory can be directly imaged as the cache offset. The cache's tag memory (offset) holds the main memory's page address (page number).

The above can be seen. The direct image cache is superior to the full-phase cache and can be quickly searched, with the disadvantage that the cache controller must do multiple conversions when frequent calls are made between groups of primary storage.

Group-linked Cache

The group-linked cache is a structure between the fully-connected cache and the direct image cache. This type of cache uses several sets of direct image blocks. There are several block locations that can be allowed for a given index number. As a result, you can increase the hit rate and system efficiency.

Cache and DRAM Access consistency

After the cache is added between the CPU and main memory, there is an issue of how the data is accessed between the CPU and the cache and main memory. There are 2 ways of reading and writing.

Read-Through (look Through)

This way the cache is separated between the CPU and main memory, the CPU to the main memory of all data requests are first sent to the cache, the cache itself in its own search. If hit. The CPU requests the main memory and sends the data out; The data request is passed to main memory.

The advantage of this method is to reduce the number of CPU requests to main memory, and the disadvantage is to delay the CPU's access time to main memory.

By-Pass readout (look aside)

In this way, the CPU makes a data request, not a single channel through the cache. Instead, a request is made to the cache and main memory simultaneously. Because the cache is faster and if it hits, the cache can interrupt the CPU's request to main memory while the data is being sent back to the CPU; The cache does not do any action. The main memory is accessed directly by the CPU. Its advantage is that there is no time delay, the disadvantage is that every time the CPU access to main memory exists, so. It takes up a portion of the bus time.

Write-through style (write Through)

Any from the CPU sent to the cache, but also write the main storage, to ensure that the main memory data can be updated synchronously. It has the advantage of simple operation, but due to the slow speed of main memory, it reduces the write speed of the system and takes up the bus time.

Writeback (Copy back)

In order to overcome the penetration of each data write to access the main memory. So that the system write speed down and occupy the disadvantage of bus time, to minimize the number of main memory access, and write-back.

It works like this: data is generally written only to the cache, which is likely to occur when the data in the cache is updated and the data in main memory is not changed (data stale). However, you can set a flag address and data stale information in the cache at this time. Only if the data in the cache is changed again. The original updated data is written to the corresponding unit of main memory, and then the data that is updated again is accepted. This ensures that the data in the cache and main memory does not conflict.

Cache and DMA Consistency issues

In the case of DMA operations, the following two errors can occur if the cache is not properly operated:

1.DMA reads data from the peripheral to the processor for use. DMA uploads the external data directly into memory, but the cache still retains the old data, so that the processor accesses the data directly when the cache gets the wrong data.

2.DMA writes the data provided by the processor to the peripheral. The data is stored in the cache when the processor is processing the data, and the data in the cache may not be written back to the in-memory data at this time. If the DMA is then transferred directly from memory to the peripheral, the peripheral may get the wrong data.

In order to properly perform the DMA transfer, the necessary cache operation must be performed. The cache operations are mainly divided into invalidate (void) and writeback (writeback), and sometimes both are used together.

DMA If you use cache, be sure to consider the cache consistency. The easiest way to resolve the consistency caused by DMA is to disable the cache function within the DMA target address range. But that would sacrifice performance.

Therefore, if the DMA is using the cache, it is possible to make decisions based on the length of time the DMA buffer expects to be retained. The dam mapping is divided into: consistent DMA mapping and streaming DMA mapping.

The buffer used by the consistent DMA mapping request can use the cache and maintain cache consistency. A consistency map has a long life cycle, and the mapped registers that are occupied during that time are not freed even if not used. Life cycle is the life cycle of the drive.

The Flow DMA mapping implementation is more complex. Only know that the life cycle of the method is relatively short, and the cache is disabled. Some hardware is optimized for streaming mapping. To establish a streaming DMA mapping, you need to tell the flow direction of the kernel data.

1. DMA can be advanced invalidate operation when it reads data from the peripheral to the processor. This forces the processor to read the data from memory to the cache, ensuring consistency between the cache and the in-memory data as it reads the data in the cache.

2.DMA writes the data provided by the processor to the peripheral, advanced writeback operation is possible. This allows the data in the cache to be written back into memory before the data is transferred to the DMA.

If you do not know the direction of the DMA operation, you can also perform invalidate and writeback operations at the same time. The result of the operation is equivalent to the invalidate and writeback operation effect.

The wince operating system also has a cache operation interface:

void Oemcacherangeflush (Lpvoidpaddr, DWORD dwlength, DWORD dwFlags);

"Go" DMA and cache consistency

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.