Original address: http://www.wowotech.net/linux_kenrel/turn-on-mmu.html
First, the preface
After a long foreplay, we finally have the time to open MMU, this article mainly describes the code logic before opening MMU and jumping to Start_kernel. After this section completes, we will leave the painful compilation, enters the populace to be loved the C code.
Ii. Overview Before and after opening the MMU
For the CPU and its executing program, opening MMU is a very interesting thing, as if from the real world suddenly into a wonderful unreal world, this section, we look at how the kernel is "through". Here's a picture of two different worlds:
When the MMU is not turned on, the CPU accesses the physical memory or IO memory directly when taking the reference and accessing the data. Although 64bit CPUs theoretically have very large address spaces, the physical main memory used to store kernel image is not that large, and in general, the main memory of the system is in a small segment of the low As shown in the picture on the right of the figure above. When the MMU is turned on, the CPU's access to the memory system does not directly touch the physical space, but it needs to be translated through a series of translation table. The virtual address space is divided into three segments, the low-end is 0x00000000_00000000~0x0000ffff_ffffffff, for user spaces. High-end is 0XFFFF0000_00000000~0XFFFFFFFF_FFFFFFFF, used for kernel space. The middle of the address is an invalid address, access to it will produce MMU fault. The virtual address space is shown in the picture on the right side of the image above.
Linker is aware of the virtual address, in the kernel of the various object files linked into a kernel image, kernel image binary code to access the virtual address, that is, kernel Image should run on the virtual address space specified by linker. Here's the problem, kernel image runs on that address. In fact, it is a most intuitive idea to put kernel image on the first address of kernel space, but for various reasons, concrete arch can specify an OFFSET (Text_offset) when compiling the kernel. For ARM64 It is 512KB (0x00080000), so the compiled kernel runs on the 0xffff8000_00080000 address. When the system starts, bootloader will kernel image copy to main memory, and of course, like the virtual address space, kernel image does not copy to the first address of main memory, Also maintains an offset of the same size. Now, here's the problem: at the beginning of the kernel, the MMU is off, that is, kernel image runs directly on the physical address, but in fact kernel is linker linked to the virtual address, in which case there is no turn on Before MMU, kernel can run normally. Yes, if the code kernel before turn on MMU is pic, then the code can actually run on any address. You can look closely at the code before turn on MMU, which is a location-independent code.
OK, we've solved the problem before MMU turn on, and now we're ready to "cross". Really open MMU is just an instruction, which is to set a bit of a system register to 1. This allows us to divide the instructions into two groups, turn the green instructions before the on MMU and the following orange instructions, as shown in the following illustration:
Since the design of modern CPUs introduced pipe, super Scalar,out-of-order execution, branching prediction, and so on, in fact, at the moment when turn on MMU's instructions were executed, the specific state of the instructions near the instruction was somewhat confusing, May be the green instructions to perform the data loading in the actual bus transaction on the launch of the MMU, originally it should be visited physical address spaces. It is also possible that the orange instruction should be executed ahead of time, causing its memory operation to be completed before MMU turn on. To solve these confusions, an opportunistic way is to establish a consistent mapping: assuming that the kernel image corresponds to the physical address segment is a~b, then the a~b virtual address segment is mapped to the physical address of A~b section when the page table is created. Thus, the instructions near turn on MMU are not stressful, and whether you are accessing the same physical memory through a virtual or physical address.
Another way is to clearly isolate the instructions before and after turn on MMU, which is to use the instruction synchronization tool, as follows:
The instruction barrier can clearly divide the execution of the instruction into three segments, the first is a green instruction, complete before executing the turn on MMU instruction, and then start the turn on MMU instruction, followed by the instruction barrier to ensure that turn on MMU's instructions are fully executed (the entire computer system view switch to the virtual world), this time to start the orange instructions to the finger, decoding, execution and other operations.
Third, open the MMU code
The code that opens the MMU in the __enable_mmu function is as follows:
__enable_mmu:
Ldr X5, =vectors
MSR Vbar_el1, X5---------------------------(1)
MSR Ttbr0_el1, X25//Load TTBR0-----------------(2)
MSR Ttbr1_el1, X26//Load TTBR1
Isb
MSR Sctlr_el1, x0---------------------------(3)
Isb
BR x27-------------Jump to __mmap_switched Execute, not set LR registers
Endproc (__enable_mmu)
There are four parameters passed into the function, one is the x0 register, which holds the value of the Sctlr_el1 to be set when opening MMU (set in the __cpu_setup function), and the second is the X25 register, which holds the Idmap_pg_dir value. The third parameter is the X26 register, which holds the value of the Swapper_pg_dir. The last parameter is x27, which is where to go to execute (__mmap_switched) after executing the function.
(1) Vbar_el1, Vector Base address register (EL1), which holds the EL1 state of the exception vector table. In ARMv8, a exception occurs, and the first thing to determine is which exception level the exception will be served. If a exception is eventually delivered to the EL1, the CPU jumps to the vector table to execute it. The processing of the specific exception is described by other documents and is not mentioned here.
(2) Idmap_pg_dir is a consistent mapping for turn on MMU, the future will be used for user space process, in the process of switching, its address space switching is actually to modify the TTBR0 value. TTBR1 is used for kernel space, and all kernel threads are sharing spaces that are swapper_pg_dir.
(3) Open MMU. In fact, there are ISB instructions at the top and bottom of this instruction, theoretically can be turn on MMU before the sequence of code execution strictly defined, in fact, I do not feel the need to enable the idmap_pg_dir of those pages, of course, this is just speculation.
Four, leading to Start_kernel
I hate the assembly, if can not use the assembly that absolutely do not use the assembly, fortunately we will soon go to Start_kernel:
__mmap_switched:
adr_l X6, __bss_start
adr_l X7, __bss_stop
1: cmp X6, x7
b.hs 2f
str XZR, [X6], #8---------------clear bss
b 1b
2:
adr_l sp, initial_sp, x4 -----------Establish and Swapper process links
str_l x21, __fdt_pointer, x5 //Save FDT pointer
str_l x24, Memstart_ addr, x6 //Save phys_offset
mov x29, #0
b start_kernel
Endproc (__mmap_switched)
This code is divided into two parts, one part is clear BSS, and the other part is preparing for entering C code (mainly stack). The clear BSS segment is the initial value of setting the uninitialized global variable to 0, and there is nothing to say. To enter the C code such as Start_kernel, no stack can not be, then how to set stack. People who are familiar with kernel know that when the process of user space falls into the kernel state, stack switches to the kernel stack, which is actually the top of the thread info memory segment (4K or 8K) of the process. For the swapper process, the principle is similar:
. Set INITIAL_SP, Init_thread_union + thread_start_sp
If the previous code execution is in a ghost State, "adr_l sp, initial_sp, X4," after the implementation of the command, initialization code finally found the home, initialization code has its own thread info, with its own task struct, With its own PID, there is a process (kernel thread) should have everything, since then the code belongs to the idle process, PID equals 0 of the process.
To facilitate access to the following code, two variables are also initialized, namely __fdt_pointer (device tree information, physical address) and MEMSTART_ADDR (kernel image is the physical address, generally the first address of main memory). MEMSTART_ADDR is mainly used for the conversion of physical address and virtual address in main memory, it can refer to the realization of __virt_to_phys and __phys_to_virt concretely.
V. Reference documents
1, ARM architecture Reference Manual
Change log:
1, 2015-11-30, highlighting the connection between initialization code and idle process
2, 2015-12-2, modified the view of physical space and virtual space
Original article, forwarding please indicate the source. Spiral Fossa technology
Comments: amusion
2015-10-28 11:25 accidentally saw this site, read a few articles, really admire Ah, the article analysis is very in-depth, I do not know whether to write some and SMP first off analysis of the reply linuxer
2015-10-29 08:59 @amusion: This sir, this site temporarily do not accept "la carte", hehe ~ ~ ~ Joking, everyone work very busy, spare time to write articles, let oneself cool, so, think where to write, SMP code distributed in the subsystem of each kernel, Actually not very good to write. Reply Kitty
2015-10-27 17:31 Bo Lord really is hard, like Bo Lord so quiet under the heart to delve into, so seriously bear, too little. Reply linuxer
2015-10-27 18:30 @kitty: Not hard, if you really love the words are not hard, ^_^
Like to delve into a lot of people, just did not aggregate together, volute this site is set up, welcome to everyone intoxicated with technology. ReplyKitty
2015-10-30 10:38 @linuxer: Hi linuxer, read the power-related articles you wrote and write in great detail. But the power related architecture, many to be adjusted, at the end of this year, Linuro to release a new scheduler architecture EAS, will add DVFS and CPU idle in the CFS scheduler, and Themal will also be replaced by IPA mechanism, which is the new research direction, interested in words, can study together. Reply Mobz
2015-10-27 14:16 Hi Linuxer, see you recently have a detailed analysis of the system startup, just met a problem, want to ask you, is in the kernel function Kernel_execve, have the following such a section of assembly code, will eventually call to Ret_to_user. What kind of process.
ASM ("Add r0,%0,%1\n\t")
"mov r1,%2\n\t"
"mov R2,%3\n\t"
"BL memmove\n\t"/* copy regs to top of stack * *
"Mov r8, #0 \n\t"/* Not a syscall * *
"Mov R9,%0\n\t"/* Thread structure *
"Mov sp, r0\n\t"/* Reposition stack pointer * *
"B ret_to_user"
:
: "R" (Current_thread_info ()),
"Ir" (thread_start_sp-sizeof (regs)),
"R" (®s),
"Ir" (sizeof (regs))
: "R0", "R1", "R2", "R3", "R8", "R9", "IP", "LR", "Memory"); Reply linuxer
2015-10-27 17:47 @mobz: Whether userspace or kernel space, there is the need to implement the program. For example, in kernel space, when transferring control to userspace, you need to perform/sbin/init (and possibly other programs). User space more use scenarios, when you enter a program on the terminal command line, the shell program will fork, and then invoke Execve to execute the program.
The so-called execution of a binary program is actually the loader of the kernel state destroys the address space (TEXT,DATA,BSS and stack) of the current process, and uses the image of the new executable program to create a new process. So it doesn't make sense to return the calling function (and it doesn't actually exist). However, the kernel's loader always give control to the newly created process, so loader simulates a kernel-sinking process on the kernel stack of the new process, builds a "scene" on the kernel stack, and then invokes Ret_to_user to return to userspace, Start the execution of a new process, of course, the CPU's PC value will be set as the entry function of the binary image. Reply Mobz
2015-10-27 20:03 @linuxer: Ret_to_user will call to Arch_ret_to_user R1, LR here, but I still do not understand how this arch_ret_to_user is achieved. You can't search in the code. Or did I not read it?
Because I often encounter a problem where the kernel boots up to freeing init memory. The location discovery should be a problem when the card returns to the user space to perform the init process, just as a class is a Bootargs set error that causes
ENTRY (Ret_to_user)
Ret_slow_syscall:
DISABLE_IRQ @ Disable Interrupts
ENTRY (RET_TO_USER_FROM_IRQ)
LDR R1, [Tsk, #TI_FLAGS]
TST R1, #_TIF_WORK_MASK
BNE work_pending
No_work_pending:
#if defined (config_irqsoff_tracer)
asm_trace_hardirqs_on
#endif
/* Perform architecture specific actions before user return * *
Arch_ret_to_user R1, LR
Restore_user_regs fast = 0, offset = 0
Endproc (RET_TO_USER_FROM_IRQ)
Endproc (Ret_to_user) replyLinuxer
2015-10-28 08:58 @mobz: I'm reading the ARM64 code in the 4.1.10 version, which has no kernel_execve this function and no arch_ret_to_user.
It seems that the arm platform has a arch_ret_to_user definition in Linux/arch/arm/kernel/entry-common. S file:
#ifdef Config_need_ret_to_user
#include <mach/entry-macro. S>
#else
. Macro Arch_ret_to_user, TMP1, TMP2
. endm
#endif reply Mobz
2015-10-28 09:49 @linuxer: Well, yes, but how does this happen from return to userspace. Do not understand only the definition, realize it. Do not understand the reply here linuxer
2015-10-28 12:28 @mobz: Rather than Arch_ret_to_user is architecture specific, rather than under Arm arch, machine specific. For some special arm machine (such as arch_iop13xx), a special operation is required when returning user space, but for most arm processors, Arch_ret_to_user is empty. ReplyMobz
2015-10-28 13:39 @linuxer: So, in fact, from the kernel to user space is: The following three statements (instructions) a slightly??。
Disable_irq
LDR R1, [Tsk, #TI_FLAGS]
TST R1, #_TIF_WORK_MASK linuxer
2015-10-28 14:41 @linuxer: There is not a restore_user_regs behind it, the context used to restore userspace from the kernel stack Mobz
2015-10-28 18:46 @linuxer: In your latest reply can not reply, the reply is here, Restore_user_regs this and arch_ret_to_user on the arm platform is the same, are empty, I didn't find the relevant content in the code, but I found it in the FRV platform. linuxer
2015-10-29 08:35 @linuxer: in Arch/arm/kernel/entry-head. S is defined (my kernel version is 4.1.10, other versions should be similar). passerby
2015-10-27 09:10 my map_mem
For_each_memblock in Paging_init (memory, REG) {
phys_addr_t start = reg->base;
& nbsp; phys_addr_t end = start + reg->size;
if (start >= end)
& nbsp; break;
Create_mapping (Start, __phys_to_virt (start), End-start,
false);
&NBSP;&NBSP;&NBSP;&NBSP}
maps all memblock, including kernel image that is mapped in the assembly. Reply to linuxer
2015-10-27 15:30 @passerby: To answer this question, you need to understand the memory blocks module (MM/MEMBLOCK.C file).
The module defines a global variable:
struct memblock memblock
This global variable is used to manage all memory blocks in the system. These blocks are divided into two types:
1, reserved
2, memory
Logically, regardless of that type of block, its address Region cannot be overlap (so there will be memory region splitting and merging operations).
Memblock_reserve (__pa (_text), _end-_text) will be invoked in the function Arm64_memblock_init function to add this section of kernel image memory region to " Reserved "That type of memory blocks.
and you say For_each_memblock (Memory, Reg), is simply traversing the Memroy blocks of the "memory" type. And once you reserved the kernel image corresponding to the section memory region, it will not appear in the memory type blocks, so will not mapping again.
BTW, I simply passed the MEMBLOCK.C code, a lot of it logically deduced, possibly incorrectly. Reply to passerby
2015-10-27 16:52 @linuxer: There is some doubt about not doing mapping, because other reserved memory needs to be mapped. For example
91 cont_splash_mem:splash_region@83000000 {
92 Linux, Reserve-contiguous-region;
93 Linux, Reserve-region;
94 reg = <0x0 0x83000000 0x0 0x2000000>;
95 label = "Cont_ Splash_mem ";
96 };
Kernel image can not be done, but the other reserved space does not do mapping. In other places I did not see the place for Reserverd to do mapping, only to be seen here. Reply to linuxer
2015-10-27 18:26 @passerby: Several reserved-memory can be defined through the reserved node in device tree Memory block, these reserved memory region are basically thought to be used for specific drivers, so my view is that the kernel does not mapping, and memory should be responsible for driver using these mapping.