Linux memory Initialization (i)

Source: Internet
Author: User

First, preface

All along, I was fascinated by two kinds of film shooting techniques: One is slow, and each detail is presented to the audience. Another is the fast lens, mostly reflects the changes of an era, from a very long period of time, the interception of a few typical snapshot, synthesized in more than 10 seconds of the lens, can let the audience quickly understand a thing of the development of the context. corresponding to the technical level, the slow lens is a bit like scenario analysis, each line of code is analyzed in detail, to understand the technical details. Fast camera similar to data flow analysis, outlining a process, the evolution of data structures. This article uses the fast lens method, the memory initialization part carries on the description, does not dwell on the concrete function the code realization, just hoped can give everybody a rough impression (interested schoolmate can study the code by oneself). BTW, in this article we are all based on ARM64 to describe architecture-related content.

Second, before the start

Before we elaborate on the initialization process of Linux kernel for memory, we must first understand the situation that kernel faced before executing the first statement. The memory status can refer to:

Bootloader has its own way to understand the layout of memory in the system, and then it will copy the Green kernel image and the blue DTB image to the specified memory location. Kernel image is preferably located in the main memory start address offset Text_offset location, of course, text_offset need and kernel negotiate good. Kernel image must be located at the beginning of the main memory (memory address lowest)? Not necessarily, but for kernel, less than kernel image memory, kernel is not included in its own memory management system. For the location of DTB image, there is no special requirement for Linux. Since the MMU is turn off, the CPU can only see the physical address space. The requirements for the cache are simple: the kernel image corresponding cache must be clean to the PoC, that is, all the observer in the system are consistent when accessing kernel image corresponding memory address.

Iii. the era of compilation

Once you jump to Linux kernel execution, the kernel has full control of the memory system, the first thing it needs to do is to open the MMU, and in order to open the MMU, it is necessary to create the Linux kernel normal operation of the page table, this is the main content of this section.

In the architecture-related assembly initialization phase, we will prepare a two-segment page table: The identity mapping, in fact, the address is equal to the physical address of those virtual address mapping to the physical address up, Opening the MMU-related code requires such mapping (other CPUs do not know, but Arm Arch strongly recommends doing so). The second paragraph is kernel image mapping, and the cheerful execution of the kernel code will of course need to map kernel running required addresses (kernel txt, rodata, data, BSS, and so on). The specific mapping situation can be referred to:

The

Turn on MMU-related code is put into a special section, whose name is. Idmap.text, which actually corresponds to the idmap_text of the physical address space in this block. The code for this area was mapping two times, as part of the kernel image, which was mapped to the virtual address where the __idmap_text_start started, and assuming that the physical address of the Idmap_text block is a address, Then it is mapped to the virtual address at the beginning of the a address. Although the a address appears to be larger than page_offset, it does not necessarily require such a relationship, which is related to the implementation of the specific processor.

The compiler senses the virtual address of kernel image (left), which defines a number of symbols in the kernel's link script, which are virtual addresses. But at the beginning of the kernel, the code was actually run on the physical address before the MMU was opened, so the assembly code at the beginning of the kernel was basically pic, which first needed to locate the page table and then populate the page table with kernel image mapping and identity The page table entry for the mapping. The starting position of the page table is better (after the BSS section), but the specific size still needs to be considered. We want to select a suitable size to ensure that the address segments of kernel image mapping and identity mapping are overwritten, and then not wasted. Taking kernel image mapping as an example, we describe the thought process of determining the size of a tranlation table. Assume that the virtual address configuration of the 4k, the page size, this time requires a Level 4 mapping, the address is divided into 9 (levels 0 or PGD) + 9 (Level 1 or PUD) + 9 (Level 2 or PMD) + 9 (degree 3 or PTE) + (page offset), assuming that we allocate 4 page translation table to save level 0, the maximum address map can be set to a range of 3 (512 entry) X 4k = 2M. 2M this size is certainly not ideal, can not accommodate kernel image address area, how to do? Using section mapping, let PMD perform block descriptor, so that 3 page can be used to mapping the address space range of 2M X = 1G. Of course, there is a side effect of this approach: Page_offset must be 2M aligned. For 16K or 64K page size, the use of section mapping is a bit inappropriate, because this time alignment requirements are too high, for the 16K page size, need 32M alignment, for the 64K page size, 512M is required to align. However, this is not what, after all, the page size has also become larger, do not use section mapping can also cover a large area. For example, for a 16K page size, you can save 2K entry in a 16K page size, so you can override the address range of 2K X 16K = 32M. For 64K page size, aA 64K page size can hold 8K entry, so it can overwrite the address range of 8K X 64K = 512M. 32M and 512M are basically enough to meet your needs. The final conclusion: the size of the swapper process (kernel space) required to reserve a page table is related to page table level, and if you use section mapping, you need to reserve pgtable_levels-1 page. If you do not use section mapping, you need to reserve Pgtable_levels page.

The above conclusion is suitable for most cases of identity mapping, but there are exceptions (the points to consider are primarily related to the location of their physical addresses). Let's assume a configuration like this: the virtual address is configured to be 39bit, and the physical address is 48 bits, and the address of the Idmap_text block is located at the high-end address (larger than the range represented by the San bit). In this case, the above conclusion is invalidated because the pgtable_levels is related to the bit number of the virtual address, the definition of page_size, but not to the configuration of the physical address. Linux kernel used a clever way to solve this problem, you can see the code to understand, there is not much to say.

Once the page table is set, then open the MMU, kernel formally will enter the virtual address space of the world, in the ointment is the virtual world of the kernel is not so big. The entire physical address space that was originally owned has disappeared, and only the kernel image mapping and identity mapping that you can see are visible. But it doesn't matter, it's just the beginning, the memory initialization is a long way off.

Four, see DTB

Although the physical address space can be spied on by kernel image mapping and identity mapping, it is glimpse and does not understand the whole world, so how does the kernel understand the physical worlds of the end? The answer is DTB, but the problem comes, at this time, the kernel has not yet created a mapping for DTB memory, so the kernel after opening the MMU is not directly accessible, you need to create DTB mapping, and to create an address mapping, you need to allocate page table memory, At this time, do not understand the memory layout, memory management module has not been initialized, how to allocate memory?

The following picture shows the solution:

The entire virtual address space is so large, can be divided into two halves, the upper half of the virtual address space is mainly a variety of specific functions, while the lower part is mainly used for the direct mapping of physical memory. For DTB, we borrowed the concept of fixed-mapped address. The fixed map is a mechanism used by Linux kernel to solve a class of problems, which is characterized by: (1) address mapping is required at an early stage, and because the memory management module is not initialized yet, the memory cannot be allocated dynamically. That is, the page table memory space required to create the mapping cannot be dynamically allocated. (2) The physical address is fixed, or can be determined at run time. For this type of problem, the kernel defines a virtual address for a fixed map, so that the various modules using the fix map mechanism can create address mappings early in the system startup, and of course, this mechanism is not so flexible because the virtual addresses are fixed at compile time.

Well, we can consider creating a third-party address map, and of course, creating an address map creates a descriptor in each level. For the virtual address space of fixed-mapped addresses, because it is also located in the kernel space, PGD is of course the re-use of the swapper process of PGD (in fact, the entire system is a PGD), and the other level of the translation Table is statically defined (ARCH/ARM64/MM/MMU.C), located in the kernel BSS segment, because all translation table is within the range of kernel image mapping, so the kernel can be accessed without pressure, and create the entry of the PUD, PMD, and PTEs corresponding to the fixed-mapped address space. All Intermediate level translation table is initialized in the Early_fixmap_init function, and the last level is in each specific module, and for DTB, this occurs in the FIXMAP_REMAP_FDT function.

The system to the size of the DTB, can not be greater than 2M, this requirement is mainly to ensure that the creation of the address map (create_mapping) can not be assigned to other translation table page, that is, all the translation Table must be statically defined. Why is it? Because the memory management module is not initialized at this time, even the Memblock module (the module that allocates memory during the initialization phase) has not been initialized (no memory layout information) and cannot be dynamically allocated memory.

Wu, early ioremap

In addition to DTB, during the start-up phase, there are other modules that also want to create address mappings, of course, for these requirements, the kernel uses a fixmap mechanism to deal with, Fixmap specific information as shown:

From the above picture can be seen fix-mapped virtual address into two paragraphs, a section is permanent fix map, a section is temporary fixmap. The so-called permanent that the mapping relationship is always there, such as the FDT region, once the address mapping is completed, the kernel can access the DTB, the mapping relationship has always existed. and temporary fixmap is not, in general, a module uses this part of the virtual address, need to release the virtual address as soon as possible for other modules to use.

You might be surprised, because in a traditional driver module, you usually use the Ioremap function to do address mapping, in order to have a early IO remap? In fact, the use of Ioremap function requires certain preconditions, in the process of address mapping, if a level of translation Tabe does not exist, then the function needs to invoke the interface of the partner system module to assign a page Size of the memory to create a level of translation table, but in the startup phase, the memory management of the partner system is not ready, in fact, this time, the kernel connected to the system how much memory is not known. The early IO remap can be used after the early_ioremap_init. For more specific information, please refer to the mm/early_ioremap.c file.

Conclusion: If you want to access the device registers before the partner system is initialized, consider the early IO remap mechanism.

VI. Memory layout

After the DTB mapping is completed, the kernel can access the memory of this segment, and by parsing the contents of the DTB, the kernel can outline the entire memory layout and lay the groundwork for subsequent memory management initialization. The information that collects the memory layout comes mainly from the following ways:

(1) Choosen node. The node has a Bootargs property that defines the startup parameters for the kernel, and in the startup parameters, it may include a parameter entry such as MEM=NN[KMG]. The Initrd-start and Initrd-end parameters define the physical address range of the initial ramdisk image.

(2) Memory node. This node mainly defines the physical memory layout in the system. The primary layout information is defined by the Reg attribute, which defines a number of start addresses and size entries.

(3) DTB header in the Memreserve domain. For DTS, this field is a string of strings defined outside root node, for example: two values after/memreserve/0x05e00000 0x00100000;,memreserve define the starting address and size, respectively. For DTB, the Memreserve string is parsed by the DTC and is referred to as part of the DTB header. More specific information can be found in the device tree Base document to understand the structure of the DTB.

(4) Reserved-memory node. This node and its child nodes define the area of memory address reserved in the system. There are two types of reserved memory, one statically defined and the address and size defined with the Reg attribute. The other is dynamically defined, except that the length of the reserved memory area is defined by the Size property, or the Alignment property is defined by the Alignment property, and the properties of the child nodes of the dynamically defined type cannot precisely define the starting address and length of the reserved memory area. In establishing the address mapping, the No-map property can be used to control the establishment of the address mapping relationship of the reserved memory area. More specific information can be read in the reference [1].

Through the analysis of the above information in DTB, in fact the kernel has basically a memory layout, but how to manage this information? This is also known as the Memblock module, which is primarily responsible for managing physical memory during the initialization phase. A reference is as follows:

After the kernel collects several memory-related information, it invokes the interface APIs of the Memblock module (for example: Memblock_add, Memblock_reserve, Memblock_remove, and so on) to manage the information for these layout. The memory resources that the kernel needs to dynamically manage are stored in an array of Memblock's memory type (the green block in the order of address size), and those that need to be reserved, which do not require kernel management, are saved in memblock reserved The array of type (the cyan block in, also in the order of the size of the address). For further information, refer to the implementation of the SETUP_MACHINE_FDT and Arm64_memblock_init functions in the kernel code.

Seven, see the memory

Aware of the current layout of physical memory, but the kernel is still only able to access some of the memory (kernel image mapping and DTB that two memory, the yellow block), most of the memory is still in the dark, waiting for the arrival of light, that is, the need to create an address map of these memory.

At this point in time, there is a paradox in creating an address map for memory: Creating an address map requires allocating memory, but at this moment the partner system is not ready and cannot be allocated dynamically. Perhaps you would say, Memblock is not ready, can not call Memblock_alloc for physical memory allocation? Of course, the physical memory allocated by Memblock_alloc still needs to be accessed through a virtual address, which has not yet created an address map, so once the kernel accesses the physical memory allocated by Memblock_alloc, the tragedy will occur.

What do we do? The kernel takes a clever approach: control the creation of address mappings, memblock_alloc the order in which pages table memory is allocated. In other words, the address map created at the beginning does not require the allocation of the page table memory, and when the kernel needs to call Memblock_alloc for the physical address assignment of the page table, many of the memory that has already created the mapping is ready, so that the Create_ There is no need to allocate page table memory when mapping. For more specific explanations, refer to the following picture:

We know that when the kernel is compiled, several page mappings are allocated after the BSS section for the Swapper process address space (kernel space), of course, because kernel image does not need to mapping so many addresses, so swapper process translation The entry in the last level of table will not be fully populated. In other words: The Swapper Process page table can support a range of addresses that is much larger than kernel image mapping, in fact, the SIZE of the address segment it can support is swapper_init_map_size. Create address mappings for this virtual memory (page_offset,page_offset+swapper_init_map_size), mapping to (Phys_offset,phys_offset+swapper_init_map _size) This physical memory, the call to Create_mapping does not occur because all page tables already exist and do not need to be allocated dynamically.

Once the address mapping of this physical memory has been completed (phys_offset,phys_offset+swapper_init_map_size), it is finally possible to use MEMBLOCK_ALLOC for memory allocation, of course, to limit the Make sure that the allocated memory is in the physical memory (phys_offset,phys_offset+swapper_init_map_size). After completing the address mapping of all memory type memory region, you can remove the limit and allocate memory arbitrarily. At this point, all memory type address areas (upper green block) are already visible, and these valuable memory resources are the objects that the RAM management module needs to manage. Please refer to the implementation of Paging_init--->map_mem function for the specific code.

Viii. concluding remarks

So far, all the preparation for memory management has been completed: The information for the entire memory layout has been collected, and all the information needed to manage memory region has been saved in the Memblock module, and the system has created an address map for all of the RAM (except for reserved). Although the entire memory management system is not ready, the Memblock module has been able to allocate dynamic memory during the subsequent initialization process. With these basics, then the real memory management system is initialized, and we tell.

Reference documents:

1, Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt

2. linux4.4.6 Kernel Code

Linux memory Initialization (i)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.