PowerPC-based Linux kernel tour: 2nd station-_ secondary_start (start

PowerPC-based Linux kernel tour: 2nd station-_ secondary_start (start_here)-bottom

Last Update:2018-12-03 Source: Internet

Author: User

Tags command access

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This section describes part of the PowerPC-based second-stage Linux Startup Process. MMU initialization involves a large amount of content and a large amount of code. This section continues to describe the MMU hardware initialization and final MMU enabling process.

Before getting started, I first pointed out an error in the previous article. When I introduced the RFI instruction in the mmu_off function, I thought it was an interrupted return, but then I thought about it. During CPU initialization, the interrupt is not enabled yet, so the statement returned by the interrupt is incorrect. After checking the information, the RFI command can also be used for program jump. The advantage of using RFI for program jump is that, after the program jumps, the isync command is automatically executed to ensure synchronization of the instruction space. In the Linux Initialization phase, it is common to use the RFI command for program jump, here, the RFI command has nothing to do with the interrupt return. Sorry for the misunderstanding.

In addition, I was overwhelmed by work issues recently. I felt that the initialization and Analysis of MMU hardware was extremely bad and I had to worry about it later, I will definitely improve it. Of course, I also hope that the experts can make friends without your advice.

Let's take a look at the detailed code of mmu_init_hw (in mm/ppc_mmu_32.c ):

Void _ init mmu_init_hw (void) {unsigned int hmask, MB, mb2; unsigned int n_hpteg, lg_n_hpteg;/* defined in hash_low_32.s, fill and clear the hash table */extern unsigned int hash_page_patch_a []; extern unsigned int hash_page_patch_ B [], comment []; extern unsigned int hash_page []; extern unsigned int comment [], flush_hash_patch_ B []; If (! Mmu_has_feature (mmu_ftr_hpte_table) {/* place the BLR command at the beginning of hash_page, because DSI (Data Storage interrupt) can still be received in the 603 processor) exception */hash_page [0] = 0x4e800020; flush_icache_range (unsigned long) & hash_page [0], (unsigned long) & hash_page [1]); /* clear command cache */return;} If (ppc_md.progress) ppc_md.progress ("hash: Enter", 0x105 ); # define lg_hpteg_size6/* Each pteg is 64 bytes */# define sdr1_low_bits (n_hpteg-1)> 10) # define min_n_hpteg102 4/* min 64kb hash table * // * each page has a hpte */n_hpteg = total_memory/(page_size * 8); If (n_hpteg <min_n_hpteg) n_hpteg = min_n_hpteg; lg_n_hpteg = _ ilog2 (n_hpteg); If (n_hpteg & (n_hpteg-1) {++ lg_n_hpteg; /* Round up if not power of 2 */n_hpteg = 1 <lg_n_hpteg;} hash_size = n_hpteg <lg_hpteg_size;/* REQUESTS the memory address for the hash table, these two steps are similar to malloc and memset */If (ppc_md.progress) ppc_md.progress ("hash: Find piece", 0x322 ); Hash = _ VA (memblock_alloc_base (hash_size, hash_size, _ bytes); cacheable_memzero (hash, hash_size); _ sdr1 = _ Pa (hash) | sdr1_low_bits; hash_end = (struct hash_pte *) (unsigned long) hash + hash_size);/* patch up the instructions in hash_low_32.s: create_hpte */If (ppc_md.progress) ppc_md.progress ("hash: patch ", 0x345); hash_mask = n_hpteg-1; hmask = hash_mask> (16-lg_hpteg_s Ize); mb2 = Mb = 32-lg_hpteg_size-lg_n_hpteg; If (lg_n_hpteg> 16) mb2 = 16-lg_hpteg_size; then [0] = (hash_page_patch_a [0] & ~ 0 xFFFF) | (unsigned INT) (hash)> 16); hash_page_patch_a [1] = (hash_page_patch_a [1] & ~ 0x7c0) | (MB <6); hash_page_patch_a [2] = (hash_page_patch_a [2] & ~ 0x7c0) | (mb2 <6); hash_page_patch_ B [0] = (hash_page_patch_ B [0] & ~ 0 xFFFF) | hmask; hash_page_patch_c [0] = (hash_page_patch_c [0] & ~ 0 xFFFF) | hmask;/* Ensure that the patch is saved from the data cache, and clear the command cache */flush_icache_range (unsigned long) & hash_page_patch_a [0], (unsigned long) & hash_page_patch_c [1]);/* patch up the instructions in hash_low_32.s: flush_hash_page */records [0] = (flush_hash_patch_a [0] & ~ 0 xFFFF) | (unsigned INT) (hash)> 16); flush_hash_patch_a [1] = (flush_hash_patch_a [1] & ~ 0x7c0) | (MB <6); flush_hash_patch_a [2] = (flush_hash_patch_a [2] & ~ 0x7c0) | (mb2 <6); flush_hash_patch_ B [0] = (flush_hash_patch_ B [0] & ~ 0 xFFFF) | hmask; flush_icache_range (unsigned long) & records [0], (unsigned long) & records [1]); If (ppc_md.progress) ppc_md.progress ("hash: done ", 0x205 );}

For the implementation of a 32-bit PowerPC MMU, a hash table containing a set of PTES and 16 segment registers is required to define the ing between virtual addresses and actual addresses. Here, the hash table is used as an additional TLB (translation lookaside buffers, fast table), which can also be understood as the cache of the currently available ing. The hash_low_32.s file is used to extract the PTE from the page table tree, put it in the hash table, and then update the change bit in the page table tree. In PowerPC, A pteg contains 8 PTE, each of which has 8 bytes. The pteg address is the entry point for Table query. For DSI, Linux classifies memory access errors into process data space access errors and process program space access errors. At the same time, the e300 kernel also provides two types of exceptions to handle two memory access errors, DSI (Data
Storage interrupt Data Access exception) and ISI (command access exception ). The main cause of DSI exceptions in the kernel is: read some address spaces that cannot be read in MMU and write to address spaces that cannot be written to MMU. In Linux, the MMU page table is deliberately set to generate a DSI exception and then handle the exception. The DSI exception in the above program refers to this. As for the exception handling function, it is a big article. First, record the file and try again later. The flush_icache_range function calls _ flush_icache_range in misc_32.s. It is mainly used to save all modified cache blocks to the memory and erase the corresponding blocks. The Code is as follows:

_ Kprobe (_ flush_icache_range) kernel/* For 601, do nothing */end_ftr_section_ifset (kernel) lir5, L1 cahce of L1_CACHE_BYTES-1/* e300 is 32kbyte */and375, R3, r5subfr4, R3, r4addr4, R4, r5srwi. r4, R4, primary, R31: dcbst0, R3/* Save the data cache block, that is, copy to the memory */addir3, R3, l1_cache_bytesbdnz1bsync/* Wait for the dcbst command to complete */iccci0, r0/* clear */# endifsync/* Additional sync needed on G4 */isyncblr

Here, R3 stores the vsid, R4 stores the virtual address, R5 stores the Linux page table entry, and R6 stores the Linux PTE before _ page_hashpte, r7 saves the offset to the address (0 when MMU is enabled and-kernelbase: 0xc0000000 when off ).

Let's take a look at several hash_page_patches, which involve two functions in hash_low_32.s. create_hpte is relatively simple. The function is to create hpte. In the assembly code, it is to fill R5, as shown below:

_ Global (create_hpte)/* converts Linux-style PTE (R5) to the low byte of PPC-style PTE (R8) */rlwinmr8, R5, 32-10, 31, 31/* _ page_rw-> pp lsb */rlwinmr0, R5, 32-7,31, 31/* _ page_dirty-> pp lsb */andr8, R8, r0/* write writable if _ RW & _ dirty */rlwimir5, R5, 32-, 30/* _ page_user-> pp msb */rlwimir5, R5, 32-2, 31, 31/* _ page_user-> pp lsb */orir8, R8, 0xe04/* clear slave space */andcr8, R5, R8/* PP = user? (RW & dirty? 2: 3): 0 is so complicated !! */Begin_ftr_sectionrlwinmr8, R8, 0 ,~ _ Page_coherent/* clear M. Here, */end_ftr_section_ifclr (cpu_ftr_need_coherent)/* complement the high bytes of PPC-style (R5) */rlwinmr5, R3, 7,1, 24/* put vsid in 0x7fffff80 bits */rlwimir5, R4, 10, 26, 31/* put in API (abbrev page index) */set_v (R5)/* set V (valid) bit */

Next, let's take a look at flush_hash_pages. Its main function is to clear the entry of a specific memory page from the hash table. At the beginning, the interruption is disabled to enable the _ page_hashpte bit throughout the process, it will not be changed by other programs, and can be used to determine whether the hpte exists. The data address translation of MMU is disabled to avoid the Miss of receiving hash tables.

Mfmsrr10sync/* CPU architecture-related isync (FTR) */rlwinmr0, R10, 15/* disable External Interrupt */rlwinmr0, R0, 26/* disable data address translation */mtmsrr0sync_601isync

Then, the program starts to search for and clear the Pte. Here we take the _ page_hashpte as the judgment standard, and here there are additional operations on the vsid.

/* Find PTE */# ifndef config_pte_64bitrlwimir5, R4, 29 # elserlwimir5, R4, 28 # endif1: lwzr0, pte_flags_offset (R5)/* Here is 0, that is, R0 <-R5 */cmpwicr1, R6, 1andi. r0, R0, _ page_hashptebne2f/* Find and jump */blecr1, 19faddir4, R4, 0x1000addir5, R5, pte_sizeaddir6, R6,-1b1b/* Switch context and value in vsid */2: mullir3, R3, 897*16/* multiply context by context skew */rlwinmr0, R4, 31/* Get ESID (Top 4 bits of VA) */mullir0, R0, 0x111/* multiply by ESID skew */addr3, R3, R0/* Note code below TRIMs to 24 bits * // * build a high position of PTE (R11) */rlwinmr11, r3, 24/* put vsid in 0x7fffff80 bits */rlwimir11, R4, 31/* put in API (abbrev page index) */set_v (R11) /* set V (valid) bit * // * Check the _ page_hashpte bit in the current Pte. If it has been cleared, it will be completed; otherwise, it will be cleared automatically */# If (pte_flags_offset! = 0) addir5, R5, pte_flags_offset # endif33: lwarxr8, 0, R5/* Get the pte id */Andi. r0, R8, _ page_hashptebeq8f/* comparison, decide whether to complete */rlwinmr8, R8, 29/* clear hashpte bit */stwcx. r8, 0, R5/* update PTE */bne-33b

The code here is scattered, and it is difficult to analyze it clearly. My level is really limited. After a few days of exploration .... Miserable !!

After checking, correcting, and sorting out the physical memory, mmu_init calls the mapin_ram function to map the physical address space used by the Linux kernel program to the actual state. This function first calls mmu_mapin_ram, and uses the first two bat or three physical address spaces used by the Linux kernel to perform ing based on the actual situation. This function has a setbat function, creates an I/D bat register pair, ranging from kb to MB. After that, call _ mapin_ram_chunk to map a page of the physical address to the start point.

Let's take a look at the code in the second half. Compared with the previous call of several major functions, this part is relatively simple. Because MMU is disabled, the system returns to the unmapped environment, then we can obtain the sdr1 and segment register values.

Lisr4, 2f @ horir4, R4, 2f @ ltophys (R4, R4) lir3, msr_kernel &~ (Msr_ir | msr_dr)/* disable address translation, that is, is MMU */fix_srr1 (R3, R5)/* empty? */Mtsprsprn_srr0, r4mtsprsprn_srr1, R3/* save processor status */syncrfi/* load kernel context */2: blload_up_mmu/* enable MMU */lir4 only now, msr_kernel/* re-open MMU */fix_srr1 (R4, R5) lisr3, start_kernel @ H/* init/main. c. Start the C code of the kernel. setup_arch is called here */orir3, R3, start_kernel @ lmtsprsprn_srr0, r3mtsprsprn_srr1, r4syncrfi

We can see that the program then closes the MMU, loads the kernel into the MMU, and CALLS load_up_mmu. Here, the value of sdr1 indicates the format of the page table used for converting the actual and actual addresses on the memory page. Here is the pointer to the hash table. R3 and R4 are loaded from R31 and R30. These registers are retained at the beginning of the startup, and the information is queried, if Linux does not support the e300 kernel's of structure, the General registers R3, R4, R5, R6, and R7 will be saved (respectively to R31 ,... , R27), these are passed by the boot program, so these registers will store the following values:

R3 stores the address pointer of bd_info. bd_info describes the hardware information of the current processor system, including the processor frequency, physical memory size, and NIC address. R4 stores the starting address of init ramdisk (initrd; r5 stores the ending address of initrd; R6 stores the starting address of the command line parameter (bootargs) of the kernel; R7 stores the starting address of the command line parameter of the kernel. Linux supports this time. Therefore, at the beginning, only two registers are stored, and R3 stores
The physical address of the tree structure. This of tree structure is also called the Device Tree block. R4 points to the physical address of the Linux kernel.

The load_up_mmu function executed after MMU is disabled. The objective is to re-execute the hpte and bat configurations in mmu_init after MMU is enabled.

Load_up_mmu: Sync/* force all PTE updates to finish update all PTE */isynctlbia/* clear TLB entry/Sync/* Wait for completion */tlbsync /*... on all CPUs * // * load the sdr1 register (including the base address and size of the hash table) */lisr6, _ sdr1 @ hatophys (R6, R6) lwzr6, _ sdr1 @ l (R6) mtsprsprn_sdr1, r6lir0, 16/* loading segment register */mtctrr0/* for context 0 */lisr3, 0x2000/* Ku = 1, vsid = 0 */lir4, 03: mtsrinr3, r4addir3, R3, 0x111/* increment vsid */addisr4, R4, 0x1000/* address of next segment */bdnz3b/* load bat. Its value is configured in setbat of mmu_init */mfpvrr3srwir3, R3, 16cmpwir3, 1lisr3, bats @ haaddir3, r3, bats @ ltophys (R3, R3) load_bat (0, R3, R4, R5) load_bat (1, R3, R4, R5) load_bat (2, R3, R4, R5) load_bat (3, R3, R4, R5) BLR

After this step is completed, even if MMU is actually enabled, the kernel can also use malloc in C language to dynamically apply for memory space. Next, the program jumps to the start_kernel function in Main. C to start the C code initialization of the kernel.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More