Linux memory Initialization (ii) identity mapping and kernel image mapping

Source: Internet
Author: User
Tags bit set

First, preface

There is no framework for this article, that is, according to the execution path of the __create_page_tables code day after time, recorded in the initialization phase, the kernel is how to create the kernel to run the required page table process. You can refer to the memory initialization document for some general, framework-like things.

The code for this article comes from ARM64, the kernel version is 4.4.6, and in addition, reading this article is best to familiarize yourself with the format of the translation table descriptor in ARMV8.

Second, Create_table_entry

This macro definition is primarily used to create a translation table descriptor in the middle level. If you use the term Linux, you are creating a descriptor for PGD, PUD, or PMD. If you use ARM64 terminology, you are creating descriptors for L0, L1, or L2. The translation table descriptor that specifically creates the level is specified by the TBL parameter, and TBL points to the memory of the translation table. The Virt parameter gives the virtual address where you want to create the address map, and the shift parameter and the PTRS parameter are related to which entry write the descriptor. We know that when locating the page table description, we need to intercept part of the virtual address as offset (index) to locate the descriptor, in fact, the virtual address shift right, and then intercept the ptrs size of the bit field can be entry index. TMP1 and TMP2 are temporary variables. The code for Create_table_entry is as follows:

. Macro Create_table_entry, TBL, Virt, Shift, Ptrs, TMP1, TMP2
LSR \tmp1, \virt, #\shift
And \tmp1, \TMP1, #\ptrs-1//Table index-------------------(1)
Add \tmp2, \tbl, #PAGE_SIZE-------------------------(2)
Orr \TMP2, \TMP2, #PMD_TYPE_TABLE---------------------(3)
Str \TMP2, [\tbl, \TMP1, LSL #3]--------------------------(4)
Add \tbl, \tbl, #PAGE_SIZE---------------------------(5)
. endm

(1) Save Virt address in TMP1 corresponding to entry index in translation table.

(2) The initial stage of the page table page is defined in the link script, as follows:

Bss_section (0, 0, 0)

. = ALIGN (page_size);
Idmap_pg_dir =.;
. + = Idmap_dir_size;
Swapper_pg_dir =.;
. + = Swapper_dir_size;

The initial stage of the page table (PGD/PUD/PMD/PTE) is arranged together, each occupying a page. In other words, if Create_table_entry is currently working with PGD, then TMP2 saves the next Level page table, which is the PUD.

(3) This step is the numeric value of the synthetic descriptor. The address of the next Level translation table is not available, but also to tell if the descriptor is valid (bit 0), which type is the type of the descriptor (bit 1). For the page table at the middle level, the descriptor cannot be a block entry, only the descriptor of the table type, so the lowest two bits of the descriptor are 0B11.

#define Pmd_type_table (_at (pmdval_t, 3) << 0)

(4) This is the most critical step in writing the descriptor to the page table. There is a "LSL #3" operation because a descriptor occupies 8 bytes.

(5) Move the address of the translation table to the next level for further setup.

Third, Create_pgd_entry

Literally, Create_pgd_entry seems to be used to create a descriptor in the PGD, but in fact the function does not just create a descriptor in the PGD, but if the next level of translation table, such as PUD, PMD, needs to be established at the same time, The final requirement is to be able to complete the creation of all Intermediate level translation table (in fact each table has only one descriptor created), leaving only PTEs, which is done by other code. The function requires four parameters: TBL is the address of the PGD translation table, specifically the descriptor to create which address is specified by Virt, TMP1 and TMP2 are temporary variables, create_pgd_entry specific code is as follows:

. Macro Create_pgd_entry, TBL, Virt, TMP1, TMP2
Create_table_entry \tbl, \virt, Pgdir_shift, PTRS_PER_PGD, \TMP1, \tmp2-------(1)
#if swapper_pgtable_levels > 3------------------------(2)
Create_table_entry \tbl, \virt, Pud_shift, Ptrs_per_pud, \tmp1, \tmp2--------(3)
#endif
#if swapper_pgtable_levels > 2------------------------(4)
Create_table_entry \tbl, \virt, Swapper_table_shift, Ptrs_per_pte, \TMP1, \TMP2
#endif
. endm

(1) Create_table_entry is described in the previous section, where you create a descriptor for the table type in the PGD for virtual address virt by calling the function.

(2) Swapper_pgtable_levels this macro definition and arm64_swapper_uses_section_maps related, and this macro in the Volute already has an article description, here does not say. Swapper_pgtable_levels actually defines the progression of the page table of the Swapper process address space, which may be 3 or 2, and how many levels of the translation table in the middle are related to the configuration, if the section Mapping, the middle level includes PGD and pud OK, PMD is the last level. In the case of page mapping, the three intermediate level,pte that require PGD, PUD, and PMD are the last level. Of course, if the entire page level is 3 or 2, there may not be a level of pud or PMD.

(3) When Swapper_pgtable_levels > 3, you need to create translation table at the level of PUD.

(4) When Swapper_pgtable_levels > 2, it is necessary to create translation table at PMD level.

It's so boring, we give some examples:

Example 1: When the virtual address is 48 bit,4k page size, the page level is equal to 4, the mapping relationship is PGD (L0)--->pud (L1)--->pmd (L2)--->page table (L3)--- >page, but if the section mapping (4k page is bound to use section mapping), the mapping relationship is PGD (L0)--->pud (L1)--->pmd (L2)---> Section The two intermediate level of PGD and PUD will be created in the Create_pgd_entry function.

Example 2: When the virtual address is 48 bit,16k page size (cannot use section mapping), the page level is equal to 4, the mapping relationship is PGD (L0)--->pud (L1)--->pmd (L2)--- >page table (L3)--->page. The three intermediate level of PGD, PUD, and PMD will be created in the Create_pgd_entry function.

Example 3: When the virtual address is 39 bit,4k page size, the page level is equal to 3, the mapping relationship is PGD (L1)--->pmd (L2)--->page table (L3)--->page. Because it is a 4k page, with section mapping, the mapping relationship is PGD (L1)--->pmd (L2)--->section. The intermediate level of PGD is created in the Create_pgd_entry function.

Iv. Create_block_map

Create_block_map's name is good, and the function is to create a block descriptor in the translation table specified by TBL to complete the address mapping. The specific mapping content is the start to end of this section of VA mapping to Phys starting PA up, the code is as follows:

. Macro Create_block_map, TBL, Flags, Phys, start, end
LSR \phys, \phys, #SWAPPER_BLOCK_SHIFT
LSR \start, \start, #SWAPPER_BLOCK_SHIFT
And \start, \start, #PTRS_PER_PTE-1//Table index
Orr \phys, \flags, \phys, LSL #SWAPPER_BLOCK_SHIFT//Table entry
LSR \end, \end, #SWAPPER_BLOCK_SHIFT
And \end, \end, #PTRS_PER_PTE-1//Table End Index
9999:str \phys, [\tbl, \start, LSL #3]//Store the entry
Add \start, \start, #1//Next entry
Add \phys, \phys, #SWAPPER_BLOCK_SIZE//Next BLOCK
CMP \start, \end
B.ls 9999b
. endm

Wu, __create_page_tables

1. Preparation Stage

__create_page_tables:
ADRP X25, idmap_pg_dir------------------------(1)
ADRP X26, Swapper_pg_dir
mov x27, LR

mov x0, x25-----------------------------(2)
Add x1, X26, #SWAPPER_DIR_SIZE
BL __inval_cache_range

mov x0, x25-----------------------------(3)
Add X6, X26, #SWAPPER_DIR_SIZE
1:STP XZR, XZR, [x0], #16
STP XZR, XZR, [x0], #16
STP XZR, XZR, [x0], #16
STP XZR, XZR, [x0], #16
CMP x0, X6
B.lo 1b

Ldr X7, =swapper_mm_mmuflags-----------------(4)

(1) Take idmap_pg_dir the physical address of this symbol, save to x25. Take Swapper_pg_dir the physical address of this symbol and save it to x26. There is nothing special in this code, except ADRP this instruction. ADRP is the relative offset of calculating the specified symbolic address to the run Time PC value (however, this offset is not as accurate as 4 K, or 12 bit lower is 0). In the instruction code, the immediate number (that is, offset) occupies 21 bit, in addition, because the offset calculation is performed according to 4K, so the last calculated symbolic address must be between the -4g and 4G of the instruction. Since this command was executed, the MMU was not opened, so the physical address was obtained through ADRP, of course, the low 12 bits of the physical address were all zeros. Also, because Idmap_pg_dir and Swapper_pg_dir are page size aligned in the link script, it is OK to use the ADRP directive.

(2) This code is to do the invalid cache operation, the specific scope of operation is the identity mapping and kernel image mapping the corresponding page table area, the starting address is idmap_pg_dir, The end address is swapper_pg_dir+swapper_dir_size.

Why call __inval_cache_range to invalidate Idmap_pg_dir and swapper_pg_dir the cache for the page table space? According to the boot protocol, the code executes to this, and the cache line for the kernel image corresponding to that space is clean to the POC, but Idmap_pg_dir and Swapper_pg_ Dir corresponds to a page table space that is not part of the kernel image, so its corresponding cacheline is likely to have some old, invalid data that must be cleaned out.

(3) It is meaningful to set the content of Idmap and Swapper page table to 0. In fact, most of these translation table entry are unused, and both PGD and PUD are only one entry useful, and the number of valid entry in PMD is related to the mapping address size. Clearing the contents of a page table 0 means that all descriptors in the page table are set to invalid (bit 0 of the descriptor indicates whether it is valid, or 0 indicates an invalid descriptor).

(4) In addition to the VA and PA required to create the mapping, a memory attribute parameter is required, which is defined as follows:

#if arm64_swapper_uses_section_maps
#define SWAPPER_MM_MMUFLAGS (Pmd_attrindx (mt_normal) | Swapper_pmd_flags)
#else
#define SWAPPER_MM_MMUFLAGS (Pte_attrindx (mt_normal) | Swapper_pte_flags)
#endif

In order to understand these definitions, we need to understand the format of the descriptor for block type and page type, and we will not map the ARMV8 document by ourselves. Swapper_mm_mmuflags This flag actually defines the memory attribut to map the address. For kernel image This section of memory, of course, is normal memory, so the mt_normal is that the subsequent address mappings are created for the normal memory. The other flags are defined as follows:

#define SWAPPER_PTE_FLAGS (Pte_type_page | Pte_af | pte_shared)
#define SWAPPER_PMD_FLAGS (Pmd_type_sect | Pmd_sect_af | pmd_sect_s)

AF in Pmd_sect_af (PTE_AF) is the abbreviation for Access flag, which is used to indicate whether the entry is used for the first time (when the program accesses the corresponding page or section, the entry is used, and if it has never been visited, Then its value is equal to 0, and no is equal to 1). This bit is mainly used by the operating system to track whether a page has been used (recently accessed), when the page was first created, AF equals 0, when the code first visited the page, will produce the MMU fault, this time, the exception handler function should be set AF equals 1, This prevents the MMU Fault from being generated the next time the page is accessed. Here, the page of the kernel image, whose descriptor has the AF bit set to 1, indicates that the page's current state is actived (recently accessed) because only the page of the user-space process determines which page is being swap out, based on the AF bit. The page that corresponds to kernel image is always actived.

pmd_sect_s (pte_shared) corresponds to shareable attribute bits, which defines the shareable attribute of the page for this two bits. So, what about shareable attribute? Shareable attribute defines the properties that memory location is shared by bus master in multiple systems. It is defined as follows:

SH[1:0] Normal Memory
00 Non-shareable
01 Invalid
10 Outer shareable
11 Inner shareble

Here the memory attribute SH is set to 0B11, that is inner shareable. If a page is labeled inner shareable, in inner shareable domain, all bus mast accesses the memory in the page is coherent (HW handles the cache Coherence problem), the software does not need to consider the cache. In general, all CPU cores comprise inner shareable domain, which means that for kernel direct mapping, the corresponding memory accesses all CPU cores coherent.

Other flags in memory attribute are not explicitly specified, meaning they are all 0, and we can simply cross it. The value of the AP is 0, indicating that the page is read/write to kernel mode (EL1) and is not allowed for userspace (EL0). The NG bit is 0, which means that the address translation is global, not process-specific, which is reasonable, and the kernel page mapping is of course global.

2. Establish identity mapping

mov x0, x25-------------------------(1)
ADRP X3, __idmap_text_start--------------------(2)

#ifndef config_arm64_va_bits_48---------------------(3)
#define EXTRA_SHIFT (Pgdir_shift + page_shift-3)-----------(4)
#define EXTRA_PTRS (1 << (48-extra_shift))---------------(5)


#if va_bits! = extra_shift-------------------------(6)
#error "mismatch between Va_bits and page size/number of translation levels"
#endif

ADRP X5, __idmap_text_end-------------------------(7)
CLZ X5, X5
CMP X5, Tcr_t0sz (va_bits)----------------------(8)
B.ge 1f

Adr_l X6, idmap_t0sz---------------------------(9)
Str X5, [X6]
DMB Sy
DC Ivac, X6

Create_table_entry x0, X3, Extra_shift, Extra_ptrs, X5, x6--------(10)
1:
#endif

Create_pgd_entry x0, X3, X5, x6-----------------------(11)
mov x5, x3//__PA (__idmap_text_start)
adr_l X6, __idmap_text_end//__PA (__idmap_text_end)
Create_block_map x0, X7, X3, X5, x6---------------------(12)

(1) X0 preserves the physical address of the IDMAP_PG_DIR variable, which is the PGD of the identity mapping.

(2) X3 Save the Physical address of __idmap_text_start, for identity mapping, X3 also holds the virtual address, because the virtual address is equal to the physical address.

(3) Basically creating identity mapping is not a big problem, but if the address of the physical memory is in a very high position, then the identity mapping is problematic, because it is possible that you configured va_bits is not large enough to exceed the scope of the virtual address. At this point, you need to extend the virtual address range. Of course, if you configure 48bits va_bits There is no such problem, because the ARMV8 maximum supported VA bits is 48, it is impossible to expand.

(4) In the virtual address address is not a bit, and the physical address of the system memory placed in a very very high position, at this time, in order to complete the identity mapping, we have to extend the virtual address, then how much expansion? Extended to 48 bit. After the extension, an extra level is added, and the address mapping relationship is extra--->pgd---..., where extra_shift equals (Pgdir_shift + page_shift-3).

(5) After the extension, the address map several level, we call it extra level, the level of translation table how many entry? Extra_ptrs gave the answer.

(6) In fact, in the current Linux kernel, the address mapping is required, that is, require PGD is full. For example: the virtual address of the 9-bit bit, the 4k page size, the corresponding mapping relationship is PGD () +pud (9-bit) +PMD (9-bit) +pte (9-bit) +page offset (12-bit), for 42bit virtual address, The page size of 64k, the corresponding mapping relationship is PGD (13-bit) + PTE (13-bit) + page offset (16-bit). One common feature of these two examples is that the number of entry in PGD is full, meaning that the number of bits indexed to PGD is page_size-3. If this relationship is not met, Linux kernel will consider your configuration to be problematic. Note: This is the kernel requirement, in fact ARM64 hardware is not so required.

Just because the correct configuration, PGD are full, so after the extension extra_shift must be equal to va_bits, otherwise it must be your configuration has a problem. We continue with an example to illustrate how to extend the number of bits in a virtual address. For 42bit virtual address, 64k page size, after extension, the virtual address is 48 bit, the address mapping relationship is extra (6-bit) + PGD (13-bit) + PTE (13-bit) + page offset (16-bit).

(7) X5 saves the physical address of the __idmap_text_end, because it is necessary to determine the highest physical address of the identity mapping, and to calculate the number of leading 0 of that physical address, so that it can be determined that the address is located in a higher position in the physical address space.

(8) macro definition Tcr_t0sz can calculate the number of leading 0 under the given number of virtual addresses. If the virtual address is 48, then the leading 0 is 16. If the number of leading 0 for the current physical address (the value of X5) is less than the number of leading 0 of the current configured virtual address, then the extension is required.

(9) OK, now enter the branch that needs to expand, of course, specifically to configure the virtual address is through the TCR_EL1 register in the T0SZ domain, and now is not the time (specifically set in the __cpu_setup function), here, we just set the IDMAP_ T0sz This variable value is OK, in the __cpu_setup function from the variable value and set to the TCR_EL1 register. In the code, X6 is the physical address of the IDMAP_T0SZ variable, X5 is the number of physical address leading 0, which is saved to the IDMAP_T0SZ variable.

(10) Create a entry of extra translation table. The parameters are passed as follows:

X0: Page Table address Idmap_pg_dir

X3: Prepare the virtual address of the map (although X3 holds the physical address, but the identity mapping, VA and PA are the same)

Extra_shift: When you normally build the highest level mapping, SHIFT is pgdir_shift, but because the physical address location is too high and requires additional mapping, you need to add a bit of mapping to it, So SHIFT requires Pgdir_shift + (page_shift-3).

Extra_ptrs: Adding a Level translation table, we need to determine the number of descriptors included in the translation table for this increment, Extra_ptrs gives this parameter.

Create_pgd_entry This function has been explained above, the table descriptor of each intermediate level is established.

(12) Create a entry of the last level translation table. The entry may be page descriptor, or block descriptor, and the parameters are passed as follows:

X0: Translation table pointing to the last level

X7: To create a mapped memory attribute

X3: Physical Address

X5: The starting address of the virtual address (as in fact, X3)

X6: End address of virtual address

3. Create Kernel Direct mapping

mov x0, x26------------------------(1)
mov x5, #PAGE_OFFSET-----------------------(2)
Create_pgd_entry x0, X5, X3, x6---------------------(3)
Ldr X6, =kernel_end//__va (Kernel_end)
MOV x3, x24//Phys offset
Create_block_map x0, X7, X3, X5, X6-------------------(4)


mov x0, x25
Add x1, X26, #SWAPPER_DIR_SIZE
DMB Sy
BL __inval_cache_range

mov lr, x27
Ret

(1) Swapper_pg_dir is actually the swapper process (PID equals 0 of that, in fact, is the idle process) address space, this time, x0 points to the kernel address space of the PGD base address.

(2) Page_offset is the kernel image's first address, for 48bit VA, the address is 0xffff8000-00000000

(3) Create page_offset (i.e. kernel image first address, virtual address) corresponding to the table descriptor of the intermediate level.

(4) Create a descriptor for the last level of address mappings between Page_offset~kernel_end.

Reference documents:

1. ARMV8 Technical Manual

2. Linux 4.4.6 Kernel source code

Linux memory Initialization (ii) identity mapping and kernel image mapping

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.