Objective: To enable MMU, map the address space of the SDRAM, and operate the virtual address to implement the "lighting method" to master the usage of MMU.
Tutorial environment and Description: h2410, hengyi S3C2410 Development Board. The h2410 core board is extended with 64 MB k4s561632 SDRAM (4 M * 16bit * 4 bank). The address range is 0x30000000 ~ 0x33ffffff. The IP address range of the gpio port is 0x56000000 ~ 0x5620.b0.
Experiment idea: After the Development Board is powered on, it automatically copies 4 k Data starting with nandflash to the SRAM, jumps to 0 address to start execution, and then initializes the storage controller SDRAM, copy the 2k code from the SRAM to the SDRAM (stored in 0x30004000, the first 16kb is used to store the page table), set the page table, and start the MMU to implement virtual address ing between gpio registers and SDRAM, finally, jump to the SDRAM (address 0xb0004000) to run. Reset the treasure pointer and jump to the entry point of the lighting code to perform the lighting operation.
Knowledge: MMU address translation, memory access permission check, TLB and Cache Usage
I. MMU address conversion:
1. First clear why MMU Na is used? MMU is a memory management unit. To put it bluntly, it is like tableware in the canteen. It is not enough for all students to eat together, however, the canteen does not want to invest any more in purchasing new tableware (the reason is obvious: on the one hand, the cost is required, and on the other hand, it occupies the place. This is like adding memory). Is there a solution? According to past experience, it is impossible for the whole school to study together and eat in the dining room. Therefore, the dining room should find several people responsible for the management of tableware (equivalent to MMU). On the one hand, they should issue tableware, ensure that the students have the tableware available, and recycle the used tableware (this is equivalent to establishing a ing between the virtual address and the physical address, and there is still so much memory, but from any single program point of view, it seems that it is not enough ). Of course, if a student takes several sets of tableware, it is definitely not allowed (this is equivalent to the memory permission check ). MMU involves three addresses during address translation: (va --- virtual
Address, virtual address) --- this is equivalent to where the tableware is stored (everyone can receive the tableware ). What the CPU core sees and uses is the virtual address va. If the VA corresponds to the physical address Pa, the CPU core will not care about the total number of tableware; (MVA --- modified virtual address, the transformed Virtual Address) --- this is equivalent to a holiday, there are very few people, only make the tableware is good, if used, do not recycle first, save manpower. Caches and MMU can't see va. They use MVA to convert to Pa, and those who are out of service do not need to keep looking for used tableware; (PA --- physical address, physical address) --- the actual amount of tableware is that. The actual devices do not see VA and MVA. The physical address PA is used for reading and writing. students usually receive tableware when they eat.
2. The conversion process from a virtual address to a physical address. Arm uses a page table for conversion. at most two levels of page tables can be used for S3C2410. Only the first-level page table is used for conversion in Section (1 m) mode) two-level page table is used for conversion. There are three types of page sizes: large page (64 KB), small page (4 kb), and extremely small page (1 kb ). This article uses the segment address translation process as an example to illustrate the page conversion process.
★First, there is a base address register of the page table (the location is the register C2 of the coprocessor CP15), which writes the address of the first-level page table. by reading it, you can find the starting position of the first-level page table. The address of the first-level page table is 16 K aligned (so [] is 0 and [31: 14] is used to store the base address of the page table ). A level-1 page table uses 4096 descriptors to represent 4 GB space. Therefore, each descriptor corresponds to a virtual address of 1 MB, stores the starting address of the corresponding 1 MB physical space, or stores the address of the level-1 page table. Use MVA [31: 20] to index the first-level page table (31-20 has 12 digits in total, 2 ^ 12 = 4096, so it is 4096 descriptors) and obtain a descriptor, each descriptor occupies 4 bytes.
★When the last two digits of the descriptor are 0b10, It is the ing of segments. [31: 20] is the base address of a segment. If this descriptor is filled with 20 characters low, it is the starting address of a 1 MB physical address space. MVA [] is used for addressing in this 1 MB space. The descriptor's bits [31: 20] And MVA [] constitute the physical address corresponding to the virtual address MVA. When ing in segments, the conversion process from the virtual address MVA to the physical address PA is as follows: ① page table base address register bit [31: 14] And MVA [31: 20] to form a 32-bit address with the lower two digits as 0. MMU uses this address to find the segment descriptor; ② retrieves the bit of the segment descriptor [31: 20] (Segment Base Address ), it and MVA [] form a 32-bit physical address (this is the PA corresponding to MVA ).
Ii. memory access permission check
The memory access permission check determines whether a piece of memory allows reading/writing. This is determined by CP15 register C3 (domain access control), descriptor domain, CP15 register C1 R/S/a bit, and descriptor AP bit. The "Domain" determines whether to perform a permission check on a block of memory. The "AP" determines how to perform a permission check on a block of content. There are 16 fields in the S3C2440. Each two fields in the CP15 register C3 correspond to a domain (a total of 32 bits), which indicates whether the domain performs permission checks.
Meaning of each two-digit data: 00 --- no access permission (any access will result in "domain Fault" exception); 01 --- client mode (use segment descriptor and page descriptor for permission check ); 10 --- retain (retained, which is equivalent to "no access permission" currently); 11 --- management mode (no permission check is performed and any access is allowed ). "Domain" occupies 4 bits to indicate which domain the memory belongs.
Iii. TLB and Cache
First, both use the local principle of program access to improve performance by setting high-speed and small-capacity memory.
1. (TLB --- translation lookaside buffers, translation search cache): Because the conversion from MVA to PA requires multiple accesses to the memory, which greatly reduces the CPU performance, a TLB method is proposed for improvement. When the CPU sends a virtual address, MMU first accesses TLB. If TLB contains a descriptor that can be used to convert the virtual address, the descriptor is used for address conversion and permission check. Otherwise, the MMU access page table will find the descriptor before performing address conversion and permission check, enter this descriptor in TLB. the descriptor used by TLB will be used directly the next time you use this virtual address. When using TLB, ensure that the content in TLB is consistent with that in the page table. Before starting MMU, pay special attention to the changes in the content in the page table. Generally, the entire TLB is invalid before MMU is started. When the page table is changed, the entries in the TLB corresponding to the involved virtual address are invalid.
2. (cache, high-speed cache): to improve the program running speed, set a high-speed memory with relatively small capacity between the master memory and general-purpose CPU registers, call part of the instruction or data near the instruction address to the memory for the CPU to use for a period of time.
★Two Data Writing Methods: ① (write through, write style) --- when any CPU sends a write signal to the cache, it is also written to the primary storage to ensure that the data in the primary storage is updated synchronously. The advantage is that the operation is simple, but because the primary storage speed is slow, the system write speed is reduced and the bus time is occupied. ② (Write back) --- data is generally only written to the cache, which may result in updates of the data in the cache while the data in the primary storage remains unchanged (the data is outdated. In this case, you can set a flag address and old data information in the cache. Only when the data in the cache is swapped out or forced to "clear, the original updated data is written to the unit of the primary storage response, ensuring the data consistency between the cache and the primary storage.
★The cache has the following two operations: ① (clean, clear) --- write the dirty data in the cache or write buffer (modified but not written to the primary storage) to the primary storage. ② (Invalidate) --- make it unusable and do not write dirty data to the primary storage.
★S2c2440 has built-in (icaches, command cache), (dcaches, data cache), and (write buffer, write cache). C-bit (CTT) in the descriptor is used for operations) and B-bit (BTT ). ① (Icaches, command cache) --- The content in icaches is invalid when the system is powered on or reset, And the icaches function is disabled. Write 1 to the (12th-bit register 1 in the CP15 coprocessor) to start icaches and write 0 to stop icaches. Icaches is generally used after MMU is enabled. At this time, the C bit of the descriptor is used to indicate whether a piece of memory can be cached. If the CTT is 1, the cache is allowed; otherwise, the cache is not allowed. If MMU is not enabled, icaches can also be used. In this case, the memory involved when the CPU reads the command is used as a cache. When icaches is disabled, the CPU reads the primary memory for each fetch, which has low performance. Therefore, icaches is usually started as early as possible. After icaches is enabled, the CPU will first check whether the command can be found in icaches each time the CPU gets the specified value, regardless of whether the CTT is 0 or 1. If it is found to be a cache hit, it cannot be found to be the cache loss. After icaches is enabled, there are three types of CPU indicators: When the cache hits and the CTT is 1, it is retrieved from icaches, CPU is returned. When the cache is lost and the CTT is 1, the CPU gets the pointer from the master memory and caches the command to the cache. When the CTT is 0, the CPU gets the pointer from the master memory. ② (Dcaches, data cache) --- similar to icaches, the content in dcaches is invalid when the system is powered on or reset, And the dcaches function is disabled, write
The buffer content is also discarded. Write 1 to start dcaches (the second digit of register 1 in CP15 coprocessor) and write 0 to stop dcaches. Write buffer and dcaches are closely integrated, with specific control to enable and stop it. Unlike icaches, The dcaches function must be enabled after MMU. When dcaches is disabled, the CPU fetches data from the memory each time. After dcaches is enabled, the CPU will first check whether the desired data can be found in dcaches each time it reads and writes data. Whether the CTT is 0 or 1, it is called cache hit, cache loss is not found.
★When using the cache, ensure that the content of the cache and write buffer is consistent with that of the primary storage. Ensure that the following two principles are met: ① clear the dcaches and update the primary storage data. ② Make icaches invalid, and re-read the master memory when the CPU gets the finger.
Note the following points when writing a program: ① disable icaches, dcaches, and write buffer before enabling MMU. ② Before closing MMU, clear icaches and dcaches and write "dirty" data to the primary storage. ③ If the code changes, the icaches will be invalid. In this way, the CPU will read the master memory from the new one. ④ Memory in which the DMA Operation can be cached: When the memory data is sent out, the cache should be cleared; when the memory data is read, the cache should be invalid. ⑤ Exercise caution when changing the address ing relationship in the page table. ⑥ When enabling icaches or dcaches, consider whether the content in icaches or dcaches is consistent with that in the primary storage. 7. No cache or write is used for the I/O address space.
Buffer.
Iv. MMU, TLB, and Cache control commands
In addition to the CPU core of ARM920T, S3C2410 also has several coprocessors to help the primary CPU complete some special functions. The operations on MMU, TLB, and cache involve coprocessors. The format is as follows:
<MCR | MRC> {condition} coprocessor encoding, coprocessor operation code 1, destination register, source register 1, source register 2, coprocessor operation code 2
<MCR | MRC> {cond} p #, <expression1>, RD, CN, CM {, <expression2>}
MRC // obtain data from the coprocessor and pass it to the ARM920T CPU core register
MCR // data is transferred from the ARM920T CPU core register to the coprocessor.
{Cond} // execution condition. If it is omitted, it indicates unconditional execution.
P # // coprocessor No.
<Expression1> // a constant
Rd // ARM920T CPU core register
Registers in CN and CM // coprocessor
<Expression2> // a constant
<Expression1>, CN, cm, and <expression2> are only used by coprocessor. Their functions depend on the specific coprocessor.
Sample code parsing:
Enable MMU and change the virtual address 0xa0000000 ~ 0xa0100000 maps to the physical address 0x56000000 ~ 0x56100000 (the physical IP address of gpfcon is 0x56000050, and the physical IP address of gpfdat is 0x56000054); 0 x ~ 0xb3ffffff ing to physical address 0x30000000 ~ 0x33ffffff. In this example, address ing is performed in segments. Only the first-level page table is used. The first-level page table uses 4096 descriptors to represent 4 GB space (each descriptor corresponds to 1 MB ), each descriptor occupies 4 bytes, so the first-level page table occupies 16 kb. The initial 16 KB of SDRAM is used to store the first-level page table, so the remaining memory start address is 0x30004000. This address will eventually correspond to the virtual address 0xb0004000 (so the code run address is 0xb0004000 ).
★Sample Code for the main process of program execution.
. Text
. Global _ start
_ Start:
BL disable_watch_dog @ shut down watchdog, otherwise the CPU will continue to restart
BL mem_control_setup @ sets the storage controller to use SDRAM
LDR sp, = 4096 @: Set the stack pointer. The stack needs to be set before calling the C function.
BL copy_2th_to_sdram @ copy the second part of the code to the SDRAM
BL create_page_table @ sets the page table
BL mmu_init @ start MMU. After startup, the following code uses a virtual address.
LDR sp, = 0xb4000000 @ reset the stack pointer, pointing to the top of the SDRAM (using a virtual address)
Ldr pc, = 0xb0004000 @ jump to SDRAM to continue executing the second part of the code
Halt_loop:
B halt_loop
★Set the page table.
Void create_page_table (void)
{
/*
* Some macro definitions used for segment descriptors: [31: 20] Segment Base Address, [11: 10] AP, [8: 5] domain, [3] C, [2] B, [1:0] 0b10 is a segment descriptor
*/
# Define mmu_full_access (3 <10)/* access permission AP */
# Define mmu_domain (0 <5)/* domain to which the domain belongs */
# Define mmu_special (1 <4)/* must be 1 */
# Define mmu_cacheable (1 <3)/* cacheable C bit */
# Define mmu_bufferable (1 <2)/* bufferable B bit */
# Define mmu_section (2)/* indicates this is a segment descriptor */
# Define mmu_secdesc (mmu_full_access | mmu_domain | mmu_special | mmu_section)
# Define mmu_secdesc_wb (mmu_full_access | mmu_domain | mmu_special | mmu_cacheable | mmu_bufferable | mmu_section)
# Define mmu_section_size 0x00100000/* Each segment descriptor corresponds to 1 MB space */
Unsigned long virtuladdr, physicaladdr;
Unsigned long * mmu_tlb_base = (unsigned long *) 0x30000000;/* The SDRAM start address stores the page table */
/*
* The starting physical address of steppingstone is 0, and the starting running address of the first part of the program is also 0. To enable MMU to run the first part of the program ~ 1 m virtual address maps to the same physical address
*/
Virtuladdr = 0;
Physicaladdr = 0;
/* The Virtual Address [31: 20] is used to index the level-1 page table and find its descriptor, corresponding to (virtualaddr> 20 ). The physical address of the [31: 20] Save segment in the segment descriptor, corresponding to (physicaladdr & 0xfff00000 )*/
* (Mmu_tlb_base + (virtuladdr> 20) = (physicaladdr & 0xfff00000) | mmu_secdesc_wb;
/*
* 0x56000000 is the starting physical address of the gpio register. The physical addresses of gpbcon and gpbdat are 0x56000010, 0x56000014, to operate gpbcon and gpbdat with IP address 0xa0000010 and 0xa0000014 In the second program,
* Map the 1 m virtual address space starting with 0xa0000000 to the 1 m physical address space starting with 0x56000000
*/
Virtuladdr = 0xa0000000;
Physicaladdr = 0x56000000;
* (Mmu_tlb_base + (virtuladdr> 20) = (physicaladdr & 0xfff00000) | mmu_secdesc;
/*
* The physical address range of SDRAM is 0x30000000 ~ 0x33ffffff: Convert the virtual address 0xb0000000 ~ 0xb3ffffff ing to physical address 0x30000000 ~ 0x33ffffff, 64 MB in total, involving 64 segment Descriptors
*/
Virtuladdr = 0xb0000000;
Physicaladdr = 0x30000000;
While (virtuladdr <0xb4000000)
{
* (Mmu_tlb_base + (virtuladdr> 20) = (physicaladdr & 0xfff00000) | mmu_secdesc_wb;
Virtuladdr + = mmu_section_size;
Physicaladdr + = mmu_section_size;
}
}
★Start MMU.
Void mmu_init (void)
{
Unsigned long TTB = 0x30000000;
_ ASM __(
"Mov r0, #0 \ n"
"MCR P15, 0, R0, C7, C7, 0 \ n"/* invalidate icaches and dcaches */
"MCR P15, 0, R0, C7, C10, 4 \ n"/* drain write buffer on V4 */
"MCR P15, 0, R0, C8, C7, 0 \ n"/* invalid command, data TLB */
"Mov R4, % 0 \ n"/* r4 = base address of the page table */
"MCR P15, 0, R4, C2, C0, 0 \ n"/* set the base address register of the page table */
"MVN r0, #0 \ n"
"MCR P15, 0, R0, C3, C0, 0 \ n"/* The domain access control register is set to 0 xffffffff, and no permission check is performed */
/*
* For the control register, first read its value, modify the bit of interest on this basis, and then write
*/
"MRC P15, 0, R0, C1, C0, 0 \ n"/* read the value of the control register */
/* The low 16-bit meaning of the control register is:. RVI... Rs B... Cam
* R: indicates the algorithm used to swap out the entries in the cache. 0 = random replacement; 1 = round robin replacement
* V: the location of the abnormal vector table. 0 = low addresses = 0x00000000; 1 = high addresses = 0xffff0000
* I: 0 = Disable icaches; 1 = Enable icaches
* R and S: used together with the descriptor in the page table to determine the memory access permission
* B: 0 = CPU is in small byte order; 1 = CPU is in large byte order
* C: 0 = Disable dcaches; 1 = Enable dcaches
* A: 0 = address alignment check is not performed during data access; 1 = address alignment check is performed during data access.
* M: 0 = Disable MMU; 1 = Enable MMU
*/
/*
* Clear unnecessary bits first, and set them again if needed
*/
/*. RVI... Rs B... Cam */
"Bic r0, R0, #0x3000 \ n"/* ...... 11 ...... clear V, I */
"Bic r0, R0, #0x0300 \ n"/* ...... 11 ...... clear R, S */
"Bic r0, R0, #0x0087 \ n"/* ...... 1 ...... 111 clear B/C/A/m */
/*
* Set the required bit
*/
"Orr r0, R0, #0x0002 \ n"/* .............. 1. Enable alignment check */
"Orr r0, R0, #0x0004 \ n"/* .............. 1 .. enable dcaches */
"Orr r0, R0, #0x1000 \ n"/*... 1 ...... enable icaches */
"Orr r0, R0, #0x0001 \ n"/* ...... .......... 1 enable MMU */
"MCR P15, 0, R0, C1, C0, 0 \ n"/* write the modified value to the control register */
:/* No output */
: "R" (TTB ));
}