Linux kernel Source-code scenario analysis-System initialization

Source: Internet
Author: User
Tags apm

We skip Boot,setup, directly to the head code, the kernel image starting point is stext, is also _stext, boot and decompression after the entire image is placed in memory from the 0x100000, that is, the interval of 1MB. The startup_32 of the CPU execution kernel image is at the beginning of the kernel image, so its physical address is also 0x100000.

However, during normal operation the entire kernel image should be in system space, the virtual address of the system space and the physical address has a fixed displacement, this is 0xc0000000, that is, 3GB. Therefore, when the kernel image is connected, an offset of 0xc0000000 is added to all the symbolic addresses, so that the startup_32 virtual address becomes 0xc0100000.

When entering startup_32, it is run in protected mode with segment addressing. The base address provided by the description table with _kernel_cs and _kernel_ds would be 0. Where the code segment register CS is set to _kernel_cs before entering startup_32, the data segment register is not set to _kernel_ds.

Although the code snippet register is already set to _kernel_cs, the startup_32 address is 0xc0100000. However, the instruction used to transfer to this portal is "ljmp 0x100000" instead of "ljmp startup_32", so the address of the register IP loaded into the CPU is the physical address 0x100000 instead of the virtual address 0xC0000000. In this way, the CPU will continue to fetch instructions at the physical address after entering startup_32. As long as you do not refer to an address in a code snippet, such as an absolute transfer to an address, or a subroutine, you can keep running like this, regardless of the content of CS. Additionally, the interrupt for the CPU has been shut down before entering Startup_32.

Assembly code starting from startup_32 in Arch/i386/kernel/head. s, the code is as follows:

/* * Linux/arch/i386/head. S--The 32-bit startup code. * Copyright (C) 1991, 1992 Linus Torvalds * * Enhanced CPU detection and feature setting code by Mike Jagdis * and M Artin Mares, November 1997. */.text#include <linux/config.h> #include <linux/threads.h> #include <linux/linkage.h> #include <asm/segment.h> #include <asm/page.h> #include <asm/pgtable.h> #include <asm/desc.h> #define Old_cl_magic_addr0x90020#define Old_cl_magic0xa33f#define Old_cl_base_addr0x90000#define OLD_CL_OFFSET0x90022# Define new_cl_pointer0x228/* Relative to real mode data *//* * References to members of the BOOT_CPU_DATA structure. */#define CPU_PARAMSSYMBOL_NAME (boot_cpu_data) #define X86cpu_params+0#define x86_vendorcpu_params+1#define X86_ Modelcpu_params+2#define X86_maskcpu_params+3#define X86_hard_mathcpu_params+6#define X86_CPUIDCPU_PARAMS+8# Define X86_capabilitycpu_params+12#define x86_vendor_idcpu_params+16/* * Swapper_pg_dir is the main page directory, Address 0x00101000 * * On entry,%esi points to the Real-mode code as a 32-bit pointer. */entry (stext) ENTRY (_stext) startup_32:/* * Set segments to known values */CLDMOVL $ (__kernel_ds),%EAXMOVL%eax,%dsmovl% EAX,%ESMOVL%EAX,%FSMOVL%eax,%gs//Ds,es,fs,gs are set to _kernel_ds .../* * Initialize page tables */MOVL $pg 0-__pag E_offset,%edi//pg0 is a virtual address, so to subtract the 3GB address, it becomes the physical address MOVL $007,%eax//"007" represents Present+rw+user 2:STOSL//Copy data to the destination address add $0x1000,% EAX//Increment 0x1000cmp $empty _zero_page-__page_offset,%edi//until EMPTY_ZERO_PAG is not replicated Jne 2b////from pg0 until Empty_zero_ The 8K bytes between the pages are set to a temporary page map, followed by 0x0,0x1000,0x2000, which is the physical memory of pages 0, 1, 2. The size of the mapping table is two pages, that is, 2K table entries, so represents a 8MB of storage space, this is the Linux kernel to the minimum memory size requirements */* Enable paging */3:MOVL $swapper _pg_dir-__page_offset, %eax//Page Directory table location movl%eax,%cr3//Settings page Directory table address movl%cr0,%eaxorl $0X80000000,%EAXMOVL%eax,%cr0//Open paging mechanism jmp 1f//use a physical address at this point, This is the page catalog table in the first two of the low 768 table entries are set to 0x00102007,0x00103007, an excessive effect 1:movl $1f,%eaxjmp *%eax//at this time to jump, using the virtual address, that is, the actual physical address of the 1 identifier +3GB, The virtual address is formed, and the virtual address is set to 0x0010 by the paging mechanism, which is the first two entries in the page directory table with a low of 256 entries.2007,0x00103007, get the actual physical address of the 1 identifier, which is actually the actual physical address of the 1 identifier +3GB minus 3gb1:/* Set up the stack pointer */lss stack_start,%esp// Set the position of the stack .../* Clear BSS first so this there is no surprises ... * No need to CLD as DF is already Clear from CLD above ... */xorl%eax,%eax//temporarily ignores MOVL $ symbol_name (__bss_start),%EDIMOVL $ symbol_name (_end),%ecxsubl%edi,% ecxrepstosb/* * Start System 32-bit Setup. We need to re-do some of the things do * in 16-bit mode for the "real" operations.  */call setup_idt//initialize interrupt vector table/* * Initialize eflags.  Some BIOS ' s leave bits like NT set. This would * confuse the debugger if this code is traced. * Xxx-best to initialize before switching to protected mode. */PUSHL $0popfl/* * Copy bootup parameters out of the the. First 2kB of * _empty_zero_page are for boot parameters, second 2kB * are for the command line. * * Note:%esi still have the pointer to the Real-mode data. */MOVL $ symbol_name (empty_zero_page),%edi//copy the boot parameters and command line passed by Setup into Empty_zero_page movl $512,%ECXCLDREPMOVSLXorl%eax,%eaxmovl $512,%ecxrepstoslmovl symbol_name (empty_zero_page) +new_cl_pointer,%esiandl%esi,%esijnz 2f# NEW Command line PROTOCOLCMPW $ (old_cl_magic), Old_cl_magic_addrjne 1fmovzwl Old_cl_offset,%esiaddl $ (old_cl_base_addr),%  ESI2:MOVL $ symbol_name (empty_zero_page) +2048,%edimovl $512,%ecxrepmovsl ... movl $-1,x86_cpuid#-1 for no CPUID initially/* Check if it is 486 or 386.  *//* * Xxx-this does a lot of unnecessary setup. Alignment checks don ' t * apply at our Cpl of 0 and the stack ought to be aligned already, and * we don ' t need to preserve EFlags. */MOVL $3,x86# at least 386//temporarily not interested pushfl# push eflagspopl%eax# get EFLAGSMOVL%eax,%ecx# Save original Eflagsxorl $0x4000  0,%eax# flip AC bit in eflagspushl%eax# copy to eflagspopfl# set eflagspushfl# get new EFLAGSPOPL%eax# put it in Eaxxorl %ecx,%eax# Flagsandl $0x40000,%eax# Check if AC bit changedje is386movl $4,x86# at least 486MOVL%ecx,%eaxxorl $0x200000,%eax# Check ID FLAGPUSHL%eaxpopfl# if we are onA straight 486DX, SX, orpushfl# 487SX we can ' t change itpopl%eaxxorl%ECX,%EAXPUSHL%ecx# Restore Original Eflagspopfland L $0x200000,%eaxje is486/* Get vendor Info */xorl%eax,%eax# Call CPUID with 0 return vendor IDCPUIDMOVL%EAX,X86_CP uid# save CPUID LEVELMOVL%ebx,x86_vendor_id# Lo 4 charsmovl%edx,x86_vendor_id+4# next 4 charsmovl%ecx,x86_vendor_id+8# Last 4 Charsorl%eax,%eax# does we have processor info as Well?je IS486MOVL $1,%eax# with the CPUID instruction to get CPU ty Pecpuidmovb%al,%cl# Save reg for future Useandb $0x0f,%ah# Mask processor Familymovb%ah,x86andb $0xf0,%al# mask modelshr b $4,%almovb%al,x86_modelandb $0x0f,%cl# mask mask revisionmovb%cl,x86_maskmovl%edx,x86_capabilityis486:movl%cr0,% eax# 486 or Betterandl $0x80000011,%eax# Save pg,pe,etorl $0x50022,%eax# set AM, WP, NE and mpjmp 2fis386:pushl%ecx# rest Ore original EFLAGSPOPFLMOVL%cr0,%eax# 386andl $0x80000011,%eax# Save pg,pe,etorl $2,%eax# set MP2:MOVL%eax,%cr0call che ck_x87 ... lgdt GDT_DESCR//Set the CPU's "Global Segment Description Table register" Gdtrlidt idt_descr//set the CPU "Interrupt description Table Register" idtrljmp $ (__kernel_cs), $1f//Reload CS,DS,ES,FS,GS1:  MOVL $ (__kernel_ds),%eax# reload all the segment REGISTERSMOVL%eax,%ds# after changing GDT.MOVL%eax,%esmovl%EAX,%FSMOVL %eax,%gs. LSS stack_start,%esp# Load processor stack ... xorl%eax,%eaxlldt%ax//LDTR Select sub-zeroing cld# GCC2 wants the direction flag cleared at all times ... call Symbol_name (Start_kernel)//Start execution start_kernell6:j MP l6# main should never return here, but# just in case, we know what happens. #ifdef config_smpready:.byte 0#endif/* * we Depend on ET-be correct. This is checks for 287/387. */check_x87:movb $0,x86_hard_mathcltsfninitfstsw%AXCMPB $0,%alje 1fmovl%cr0,%eax/* no coprocessor:have to set bits */xo RL $4,%eax/* Set EM */movl%eax,%cr0retalign1:movb $1,x86_hard_math.byte 0xdb,0xe4/* fsetpm for 287, ignored by 387 */ret/ * * * SETUP_IDT * * Sets up a IDT with the entries pointing to * ignore_int, interrupt gates. It doesN ' t actually load * Idt-that can be do only after paging have been enabled * and the kernel moved to Page_offset. Interrupts * is enabled elsewhere, when we can be relatively * sure everything is OK. */setup_idt://the size of each table entry is 8 bytes, a total of 256 table entries, all pointing to the same interrupt responder Ignore_intlea IGNORE_INT,%EDXMOVL $ (__kernel_cs << 16),% EAXMOVW%dx,%ax/* selector = 0x0010 = cs */MOVW $0x8e00,%dx/* interrupt gate-dpl=0, present */lea symbol_name (idt_table) ,%edimov $256,%ecxrp_sidt:movl%eax, (%edi) movl%edx,4 (%edi) addl $8,%edidec%ecxjne rp_sidtretentry (stack_start)// Task_struct and stacks occupy two pages, stacked at high address end. Long Symbol_name (init_task_union) +8192.long __kernel_ds/* This is the default Interrupt "handler":-) */int_msg:.asciz "Unknown interrupt\n" alignignore_int://Interrupt handler cldpushl%EAXPUSHL%ECXPUSHL% EDXPUSHL%ESPUSHL%DSMOVL $ (__kernel_ds),%eaxmovl%eax,%dsmovl%eax,%espushl $int _msgcall SYMBOL_NAME (PRINTK) popl% EAXPOPL%dspopl%espopl%edxpopl%ecxpopl%eaxiret/* * The Interrupt descriptor table has a hostel for the IDT ' s, * the global descriptor table is dependent on the number * of the tasks we can have. */#define Idt_entries256#define gdt_entries (__TSS (Nr_cpus)). Globl symbol_name (IDT). Globl symbol_name (GDT) Align.word 0idt_descr:.word idt_entries*8-1//Interrupt Descriptor Descriptor Length Symbol_name (IDT):. Long Symbol_name (idt_table)// The base address of the interrupt Descriptor table. idt_table is a global variable. Word 0gdt_descr:.word gdt_entries*8-1//Global segment describes the length symbol_name (GDT):. Long Symbol_name (GDT  _table)//The base address of the global Segment Description table, gdt_table the following */* * This is initialized to create an identity-mapping at 0-8m (for bootup * purposes) and Another mapping of the 0-8m area at virtual address * page_offset. */.org 0x1000entry (Swapper_pg_dir)//Refer to the explanations below. Long 0x00102007//point to Pg0.long 0x00103007//point to Pg1.fill boot_user_pgd_ ptrs-2,4,0//768/* default:766 Entries */.long 0x00102007//point to Pg0.long 0x00103007//point to pg1/* default:254 entries */.fil L boot_kernel_pgd_ptrs-2,4,0//256/* * The page tables is initialized to only 8MB here-the final page * Tables is set Up later depending on memory size. */.org 0x2000//Actual Physical address is 0x00102007entry (pg0). org 0x3000//Actual physical address is 0x00103007entry (PG1)/* * Empty_zero_page must immediately Follow the page tables! (The * initialization loop counts until Empty_zero_page) */.org 0x4000entry (empty_zero_page). org 0x5000entry (empty_bad_ page). org 0x6000entry (empty_bad_pte_table) #if config_x86_pae. org 0x7000 ENTRY (empty_bad_pmd_table). org 0x8000#else. ORG 0x7000#endif/* * This starts the data section. Note that the above are all * in the text sections because it has alignment requirements * This we cannot fulfill any other The. */.dataalign/* * This contains typically-quadwords, depending on nr_cpus. * * note! Make sure the GDT descriptor in head. S matches this if you * change anything. */entry (gdt_table). Quad 0x0000000000000000/* NULL descriptor */.quad 0x0000000000000000/* not used */.quad 0x00cf9a000000ffff/* 0x10 kernel 4GB code at 0x00000000 */.quad 0x00cf92000000ffff/* 0x18 kernel 4GB data at 0x00000000 */ . Quad 0x00cffa000000ffff/* 0x23 user 4GB code at 0x00000000 */.quad 0x00cff2000000ffff/* 0x2b user 4GB data at 0x00000000 */.quad 0x0000000000000000/* not used */.quad 0x 0000000000000000/* not used *//* * The APM segments has byte granularity and their bases * and limits is set at run time . */.quad 0x0040920000000000/* 0x40 APM set up for Bad BIOS ' s */.quad 0x00409a0000000000/* 0x48 APM CS code */.quad 0x000 09a0000000000/* 0x50 APM CS-code (+ bit) */.quad 0x0040920000000000/* 0x58 APM DS data */.fill nr_cpus*4,8,0/* SPAC  E for TSS ' s and LDT's *//* * This is to aid debugging, the various locking macros would be putting * code fragments here.  When a oops occurs we ' d rather know that it ' s * inside the. Text.lock sections rather than as some offset from whatever * function happens to is last in the. Text segment. */.section. Text.lockentry (Stext_lock)

. org 0x1000entry (swapper_pg_dir). Long 0x00102007.long 0x00103007.fill boot_user_pgd_ptrs-2,4,0//768/* default:766 Entries */.long 0x00102007.long 0x00103007/* default:254 entries */.fill boot_kernel_pgd_ptrs-2,4,0//256
we explain this code separately, a page catalog table has 1024 table entries, representing a total of 4GB of virtual space. The Linux kernel divides the entire virtual space into user space and system space in 3GB. Therefore, the low 768 table entries in the page Catalog table are used for mapping of user space, while the high 256 table entries are used for system space mapping.


In Linux0.11, the kernel space and user space are switched in this way.

The first page directory entry is this:

The first 4 items of the page catalog table are for kernel space, pointing to page table 0, Page Table 1, Page Table 2, page table 3, mapping 16MB space, kernel state using GDT, base address is 0, can access all memory address.

When in the user state of Process 2, the corresponding page catalog table is a 32~48 item, and the corresponding 16 page table is created by itself. The base address is 128MB due to the use of the LDT by the user state. For example, CS:EIP, which EIP is 0, then after the segmentation mechanism, virtual address is 128MB, after paging mechanism, first of all, according to the first 10 bits of the virtual address is the 32nd item in the page directory entry, and then according to the middle 10 bits of the virtual address is selected 32nd item is the first page table item in the page table. Finally, according to the last 12 bits are 0, this page table entry points to the memory address is the physical address to access.


In Linux2.4, the kernel space and user space are switched in this way.

each process has a different page catalog table, the page Catalog Price table has 1024 table entries, representing a total of 4GB of virtual space. The Linux kernel divides the entire virtual space into user space and system space in 3GB. Therefore, the low 768 table entries in the page Catalog table are used for mapping of user space, while the high 256 table entries are used for system space mapping.

The virtual address of the user space is 0~3g, which corresponds to a low of 768 table entries in the page catalog table. Remember that the virtual address where we allocated the user space was allocated from 0 to 3G, and the Linux kernel source-code Scenario Analysis-execve ().

The virtual address of the kernel space is 3g~4g, which corresponds to the high 256 table entries in the page catalog table, because the kernel space identifier is linked to the actual physical address added 3G, so when accessing the kernel space, the virtual address in the 3g~4g, after the paging mechanism (such as above) becomes the actual physical address ( is actually the virtual address minus 3G).

Linux2.4 is not suitable for the LDT, only using GDT, regardless of the kernel space or user space, the logical address through the segmented mechanism, the resulting virtual address and the same logical address.

GDT is as follows:

ENTRY (gdt_table). Quad 0x0000000000000000/* NULL descriptor */.quad 0x0000000000000000/* not used */.quad 0x00cf9a000000ffff/* 0x10 kernel 4GB code at 0x00000000 */.quad 0x00cf92000000ffff/* 0x18 kernel 4GB data at 0x00000000 */ . Quad 0x00cffa000000ffff/* 0x23 user   4GB code at 0x00000000 */.quad 0x00cff2000000ffff/* 0x2b user   4GB data at 0x0 0000000 */.quad 0x0000000000000000/* not used */.quad 0x0000000000000000/* not used *//*

Linux kernel Source-code scenario analysis-System initialization

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.