1.Cache Introduction
The same is based on the local program access, a high-speed, relatively small memory between the main memory and the CPU General register, a portion of the instruction or data near the executing instruction address is transferred from main memory to this memory for the CPU for a period of time, which is helpful to improve the performance of the system. This high-speed, small-capacity memory between main memory and CPU is called the cache.
When the cache is enabled, when the CPU reads the data, if a copy of the data in the cache is returned directly, the data is read from main memory and stored in the cache, and the copy in the cache can be used directly the next time the data is read.
2.Write through mode with Write back mode
After the cache is enabled, the CPU writes the data with write through and write back two ways.
(1) Write through mode
Any from the CPU sent to the cache, but also write the main storage, to ensure that the main memory data can be updated synchronously. This method is characterized by simple operation, but due to the relatively low speed of main memory, it reduces the write speed of the system and takes up more bus time.
(2) Write back mode
Data is generally written only to the cache, which may occur when the data in the cache is updated and the data in main memory is not changed, and the updated data is written to the corresponding unit of main memory when the data in the cache is swapped out or forced to "empty" the operation.
two operations for 3.Cache
(1) "Empty": Writes the updated data in cache or write buffer to main memory.
(2) "invalidates": it is no longer available and does not write updated data to main memory.
4.2440 built-in instruction cache, data cache, and write cache (write buffer)
(1) instruction Cache (icaches)
The use of icaches is relatively simple, when the system just power up or reset, the contents of icaches is invalid, and the Icaches function is closed. To ICR (CP15 coprocessor Register 1 in the 12th bit) write 1 can start icaches, write 0 can stop using icaches.
Icaches is usually used after the MMU is turned on, and the C-bit (CTT) in the page table descriptor is used to indicate whether a memory can be deleted by the cache. However, even if the MMU is not turned on, icaches can also be used, when the CPU reads the instructions involved in the memory is considered to allow the cache. Icaches is turned off, the CPU reads the main memory every time instruction, the performance is very low, so should start the icaches as soon as possible.
Once the icaches is turned on, the CPU reads the instruction in icaches to see if the required instructions can be found, regardless of whether the CTT is 0 or 1. After the icaches is turned on, the CPU read instruction is divided into the following 3 cases:
A.cache hit and Ctt is 1 (ctt=1 means allow cache), read instruction from Icaches, return CPU.
B.cache is missing and CTT reads instructions from main memory for 1,CPU. At the same time, 8 words in the region of the instruction are written to a icaches entry, and the replacement algorithm uses the pseudo-random or Round-robin algorithm, which can be used to determine which substitution algorithm to use using the 1 bits of register 1 in the CP15.
C.CTT is 0 o'clock, the CPU reads instructions from main memory.
(2) data cache (dcaches)
The default dcaches is off when the system is powered on or reset, and the contents of Write buffer are not valid. Write 1 to the CCR bit (12 bits of CP15 coprocessor Register 1) to start Dcaches, write 0 to close dcaches. Write buffer tightly binds to the dcaches and does not have a specific control bit to open or stop it.
Dcaches must be used after the MMU is opened, since the MMU can be opened to use the descriptors in the page table to define how a piece of memory uses dcaches and write buffer.
Once the dcaches is closed, the CPU will have to manipulate main memory each time it reads and writes data, and dcaches and write buffer are completely ignored. After the dcaches is turned on, the CPU reads and writes data in dcaches first to see if the required data can be found, regardless of whether the CTT is 0 or 1.
mapping of 5.Cache and main memory
There are roughly three mapping methods between cache and main memory: full-linked mappings, direct-linked mappings, and group-linked mappings. Both the cache and main memory are divided into blocks of the same size, assuming that the main memory is divided into 300 (1~300) blocks, and the cache is divided into 10 (1~10) blocks for illustration.
(1) Full-linked mapping
Any block of 1~300 in main memory can be mapped to any piece in the cache1~10, which is characterized by simple, but low-efficiency lookups.
(2) direct-linked mappings
Suppose that the 1~30 block in main memory can only be mapped to the cache's block 1th, and so on. This results in a high efficiency compared to full-phase lookup, which can be quickly obtained (hit or missing) based on Va.
(3) Group-linked mappings
Direct-linked mappings in the processing of jitter will occur in the frequent substitution cache, that is, memory 1, 2, 3rd is alternately accessed, there will be no constant replacement cache in the block 1th. Combining the combination of all-in-phase and direct-linking, the cache is divided into chunks (chunks are directly connected) and the small blocks are fully associative, which can effectively avoid the impact of jitter access.
6.2440 MMU, TLB, and cache control commands
The 2440 coprocessor is also a microprocessor that helps the main CPU to perform some special functions, such as floating-point calculations. For the MMU, TLB, and cache operations, the coprocessor is involved. These two instructions are used when transferring data between the CPU core and the coprocessor: MRC and MCR, which are in the following format:
<mcr| mrc> {cond} P#,<expression1>,rd,cn,cm{,<expression2>}
MCR: Data transfer from coprocessor to CPU core
MRC: Getting data from the CPU cores to the coprocessor
{cond}: Execution condition, omitting to indicate unconditional execution
p#: Co-processor serial number
<expression1>: Constant 1
RD:CPU Core Register
CN and CM: registers in the coprocessor
<expression2>: Constant 2
7.Cache Operation Note Point
Similar to TLB, the use of the cache requires that the contents of the cache, Write buffer, and main memory remain consistent, following two principles:
(1) Empty the dcaches, so that the main memory data is updated.
(2) Empty the icaches so that the CPU reads the instruction and re-reads the main memory.
When actually writing the program, pay attention to the following points:
(1) Before opening the MMU, invalidate Icaches, dcaches, and write buffer.
(2) Before closing the MMU, empty icaches, Dcaches, and the updated data is written to main memory.
(3) If the code is to the right, invalidates the icaches so that the CPU fetches the instruction and reads the main memory again.
(4) Use the DMA operation can be the memory of the cache: when the memory data sent out, to empty the cache, the memory of the data when read, to invalidate the cache.
(5) Change the address mapping relationship in the page table should also be considered carefully.
(6) When opening icahces or dcaches, consider whether the contents of icahces or dcaches are consistent with main memory.
(7) For I/O address space, do not use cache and write buffer. The so-called I/O address space, that is, two consecutive write operations can not be merged together, each read/write must be direct access to the device, or the program can not be expected to run the results.
Self-learning Drive 15--cache