1 Basic Concepts
In embedded software development, often come across to say that a block of memory is cache, or non-cache, what they mean. What scene to use separately. How Non-cache and cache memory areas are configured. This blog post will be discussed around these issues.
Cache, which is a caching mechanism between the CPU and the DDR, provides a memory buffer for reading and writing between the CPU and the DDR. Cache is usually SRAM, it uses the same as the production of the same CPU semiconductor technology, its price is higher than the DDR, but read and write faster than the DDR. For example, the CPU to perform the instructions in the DDR, you can read a section of the instructions to the cache, the next time you can directly from the cache to obtain instructions, rather than repeatedly to access the slow DDR. For example, the CPU will write a piece of data into the DDR, it can quickly write data to the cache, and then manually execute a flush cache instructions can update the data to the DDR, or simply do not refresh, when the cache to the appropriate time, the content flush to the DDR. In a word, the existence of cache is to shorten the CPU and DDR direct performance differences, improve the overall system performance.
2 What conditions can not be used cache
Most of the time, cache is a good helper and we need it. But there are exceptions, considering the following scenario Case 1 CPU reads the memory data of the peripherals, if the peripheral data itself will change, such as the network card to receive external data, then the CPU if the operation of 2 consecutive read peripherals time is very short, and access to the same address, the last memory data still exist in the cache, Then the second reading of the CPU may be the first time cached in the cache data. Case 2 CPU writes data to peripherals, such as the memory space to the serial controller to write data, if the CPU 1th write the data is still in the cache, the 2nd time to the same address to write data, the CPU may only update the cache, from the cache output to the serial port of only the 2nd time content, The 1th time you write the data is lost.
Case 3 in an embedded development environment, it is often necessary to use the debugging tools on the PC side to determine the occurrence of certain events by directly viewing the memory, and if a global variable is defined to record the interrupt count or the number of task cycles, if the variable is defined as cache, You will find that sometimes the system is running normally, but the global variable does not move for a long time. In fact, its cumulative effect in the cache, because no one refers to the variable, and long time does not flush to the DDR case 4 consider the dual-CPU operating environment (not dual-core). CPU1 and CPU2 share a DDR that can be accessed, and this shared memory is used for communication between processors. CPU1 give Cpu2 an interrupt signal as soon as the data is finished. Notify CPU2 to read this memory, if the method of cache, CPU1 may update the content only write to the cache, has not been swapped out to the DDR, CPU2 has run to read, then read not the expected data. The process is as shown in the figure:
In addition, the relatively small performance requirements of the embedded Bare Board program (such as bootloader, board-level peripheral function verification, etc.) friends know that the cache will often be counterproductive, so they tend to directly the CPU's cache function off, Lose a bit of performance to ensure that the data that the CPU writes and reads is exactly the same as the operating peripherals.
Here we can sum up a little: for a single CPU, not operating peripherals, only the DDR read and write, you can rest assured that the use of cache. For other cases, there are two ways: 1 manual update cache, which requires a better understanding of the mechanism, and to find the right time to refresh (the cache in the data flush into memory) or invalid (Invalidate, will be the contents of the cache to clear out, The next time you read it you need to go to the DDR to read the latest content) 2 set the memory to Non-cache, more accurately non-cacheable
3 How to set the memory for non-cacheable.
Different processor platforms for non-cacheable processing is not the same, in the advanced CPU, generally in the run, dynamically using the page table to mark the way some memory is non-cacheable, such as the Linux kernel There is a commonly used function called IOREMAP, when accessing peripherals, it is often used, its role is to map the physical address of the peripheral to the virtual address space for kernel driver use, in mapping, the Register Address page table is configured to non-cacheable, data directly from the peripheral address space read and write, Maintain the consistency of the data.
In lower-level CPUs (such as Arm cortex-a5,mips R3000. Close to MCU), it is generally used to compile link-time preset methods to distinguish between cache area and Non-cache area, in link script or scatter File defines a different section, placing the memory blocks that need to be non-cacheable into a specific section. The CPU reads the configuration files at startup, MMU or mpu the different section configurations in the boot code, and maps the Non-cache section to the CPU-aware, specific address. After that, when the CPU accesses the memory, after MMU or MPU, you will know whether the memory is cacheable or non-cacheable.
In MIPS, the program address space is divided into kseg0,kseg1 and other areas, where kseg0 is 0x80000000~0x9fffffff, it is unmapped, CACHED;KESG1 is 0XA0000000~0XBFFFFFFF, it is not mapped , uncached, the physical address space they point to is the same, that is to say, 0x82001234 and 0xa2001234 point to the same physical address, but MIPS have different ways of accessing them, when the CPU learned to access the space of Kseg0, Will want to go to the cache to find the target memory, if not found, will go to physical address search, and if the address space to access is KSEG1, then the CPU will bypass the cache, directly to the physical address to read and write.
is there a limit to the size of the 4 non-cacheable area?
The answer is no, as long as you like, you can even put almost the entire DDR is set to non-cacheable, but the cost of doing so is huge, you are tantamount to give up the CPU factory carefully designed cache mechanism, A DDR that is much slower than the CPU is needed to access every read and write data, and the entire system slows down significantly. The landlord has done a test, under the same conditions, write cache memory speed of about one times faster than non-cache memory.
5 Application Cases
Suppose an embedded application needs to copy a 4K of memory to another address, and the physical address of the source and destination mappings is in the DDR. Naturally, we will think of the hard-working Porter--dma, let DMA to do this tedious and time-consuming copy work, let the CPU busy other. The students who often use DMA know that it has a standard 3 steps:
1) to the source address cache flush, to prevent the source address content has been updated in the cache, and DDR is still obsolete content, DMA can not know what cache 2 for the target address cache Invalidate, because the content of this target address will soon be covered, cache if there is its content, is outdated 3 set the source, destination address and data length, start DMA is not very troublesome. If both the source and destination addresses are preset in the context of global data, they can be defined in the non-cacheable segment, so that the 1th and 2nd steps are no longer needed. Please note that both the source and destination addresses need to be defined to the Non-cache area, one less, you will probably not get the correct results, because the cache of the strong randomness, DMA after the move, you can access the contents of memory is likely to be a very strange number.