Author: wztEMail: wzt@xsec.orgSite: http://www.xsec.orgDate: 2008-6-13 & nbsp;. introduction 2. x86 hardware addressing method 3. kernel settings for page tables 4. instance analysis ing mechanism 1. introduction we often see some places like 0x32118965 in the program disassembly code.
Author: wzt
EMail: wzt@xsec.org
Site: http://www.xsec.org
Date: 2008-6-13
I. thread theory
II. X86 hardware addressing method
3. kernel page table settings
IV. instance analysis ing mechanism
I. thread theory
We often see some address similar to 0x32118965 in the program's disassembly code, which is called a linear address or virtual address in the operating system. What is the use of virtual addresses? Virtual Address
How does one convert it to a physical memory address? This chapter briefly describes this.
1.1 Linux memory addressing overview
In the modern sense, operating systems are all in the 32-bit protection mode. Generally, each process can address 4 GB physical space. However, our physical memory is usually several hundred MB. how can a process obtain 4 GB of memory?
What about physical space? This is the benefit of using virtual addresses. we usually use a technology called virtual memory, because part of the hard disk can be used as memory.
. Except that the operating system is divided into system space and user space. using virtual addresses can effectively protect kernel space from being damaged by user space.
For how to convert a virtual address to a physical address, the conversion process is completed by the operating system and the CPU. the operating system sets a page table for the CPU. The CPU uses MMU units for address translation.
1.2 Kernel code browsing tools
The current kernel is very large, so we need some tool to read the huge Source code system. Currently, all kernel development tools use vim + ctag + CSAG to browse the kernel code.
The ready-made makefile file is used to generate ctags, csags, and etags.
I. usage:
Find an empty directory and copy the attachment Makefile. Then, select the following make command in the directory:
$ Make
The source files under/usr/src/linux are processed, and ctags and csags are generated in the current directory.
Note: SRCDIR is used to specify the kernel source code directory. if not specified, the default value is/usr/src/linux/
1) only create ctags
$ Make SRCDIR =/usr/src/linux-2.6.12/tags
2) only create cscope
$ Make SRCDIR =/usr/src/linux-2.6.12/cs.pdf
3) Create ctags and csags
$ Make SRCDIR =/usr/src/linux-2.6.12/
4) only create etags
$ Make SRCDIR =/usr/src/linux-2.6.12/TAGS
II. kernel source files included during processing:
1) the drivers and sound directories are not included.
2) Excluding unrelated architecture directories
3) the fs directory only includes the top-level Directory and the ext2 and proc directories.
3. the simplest ctags command
1) enter
After entering vim, use
: Tag func_name
Jump to function func_name
2) identifier)
To enter the function where the cursor is located, use
CTRL +]
3) Rollback
Roll back with CTRL + T
1.3 Kernel version selection
In this thesis, I chose the kernel of the linux-2.6.10 version. The latest kernel code is 2.6.25. However, the mainstream servers currently use the RedHat AS4 machine, which enables
Use the 2.6.9 kernel. I chose 2.6.10 because it is very similar to 2.6.9. Currently, RedHat Enterprise Linux 4 is based on the Linux 2.6.9 kernel and is the most stable and powerful commercial product. At 2004
During the 2.6 period, open-source projects such as Fedora provided an environment for more mature Linux kernel technology, which allowed Red Hat Enterprise Linux v.4 kernel to provide more and better
Functions and algorithms, including:
• General logic CPU scheduler: processes multiple kernels and hyperthread CPUs.
• Object-based reverse ing virtual memory: improves the performance of memory-constrained systems.
• Read replication update: SMP algorithm optimization for operating system data structure.
• Multi-I/O scheduler: it can be selected based on the application environment.
• Enhanced SMP and NUMA support: improves the performance and scalability of large servers.
• Network interruption mitigation (NAPI): improves the performance of large-traffic networks.
Linux 2.6 kernel uses many technologies to improve the use of a large amount of memory, making Linux more suitable for enterprises than ever before. Including reverse mapping)
Use a larger memory page, page table entries to store in high-end memory, and a more stable manager. So I chose the linux-2.6.10 kernel version as the analysis object.
II. X86 hardware addressing method
Please refer to Intel x86 manual ^_^
3. kernel page table settings
The premise of CPU ING is that the operating system needs to prepare the kernel page table for it. for page table settings, the kernel is set at the beginning of the system startup and after the system initialization is complete.
3.1 several macros related to memory ing
These macros convert unsigned integers into corresponding types.
# Define _ pte (x) (pte_t) {(x )})
# Define _ pmd (x) (pmd_t) {(x )})
# Define _ pgd (x) (pgd_t) {(x )})
# Define _ pgprot (x) (pgprot_t) {(x )})
Convert it to an unsigned integer based on x.
# Define pte_val (x). pte_low)
# Define pmd_val (x). pmd)
# Define pgd_val (x). pgd)
# Define pgprot_val (x). pgprot)
Convert the linear address of the kernel space to the physical address
# Define _ pa (x) (unsigned long) (x)-PAGE_OFFSET)
Converts a physical address to a linear address.
# Define _ va (x) (void *) (unsigned long) (x) + PAGE_OFFSET ))
X is the page table item value. pte_pfn is used to obtain the corresponding physical page number. Finally, the corresponding physical page descriptor is obtained through pfn_to_page.
# Define pte_page (x) pfn_to_page (pte_pfn (x ))
If the corresponding table item value is 0, 1 is returned.
# Define pte_none (x )(! (X). pte_low)
X is the page table item value. after 12 digits are shifted to the right, the corresponding physical page number is displayed.
# Define pte_pfn (x) (unsigned long) (x). pte_low> PAGE_SHIFT )))
Combine the value of a page table item into a page table item value based on the page box number and the attribute value of the page table item.
# Define pfn_pte (pfn, prot) _ pte (pfn) <PAGE_SHIFT) | pgprot_val (prot ))
Combine the value of the page and table item into an intermediate table item value.
# Define pfn_pmd (pfn, prot) _ pmd (pfn) <PAGE_SHIFT) | pgprot_val (prot ))
Write the specified value to a table item
# Define set_pte (pteptr, pteval) (* (pteptr) = pteval)
# Define set_pte_atomic (pteptr, pteval) set_pte (pteptr, pteval)
# Define set_pmd (pmptr, pmdval) (* (pmptr) = pmdval)
# Define set_pgd (pgdptr, pgdval) (* (pgdptr) = pgdval)
Obtain a 10-bit high value based on the linear address, that is, the index in the directory table.
# Define pgd_index (address)> PGDIR_SHIFT) & (PTRS_PER_PGD-1 ))
Obtains a page table item value based on the page descriptor and attribute.
# Define mk_pte (page, pgprot) pfn_pte (page_to_pfn (page), (pgprot ))
3.2 kernel page table initialization
Before the kernel enters the protection mode, the paging function has not been enabled. Before that, the kernel must first create a temporary kernel page table, because after entering the protection mode, the kernel continues initialization until it is created.
Before establishing a complete memory ing mechanism, you still need to use a page table to map the corresponding memory address. The temporary page table is initialized in arch/i386/kernel/head. S:
Swapper_pg_dir is a global directory table on a temporary page. it is statically Initialized during kernel compilation.
Pg0 is the place where the first page table starts. it is also statically Initialized during kernel compilation.
The kernel uses the following code to create a temporary page table:
ENTRY (startup_32)
............
/* Obtain the index of the start directory item. From this we can see that the kernel is created from the 768 Table items in swapper_pg_dir, and its linear address is above 0xc0000000.
Address, that is, the kernel is initializing its own page table */
Page_pde_offset = (_ PAGE_OFFSET> 20 );
/* When the pg0 address is compiled in the kernel, 0xc0000000 is added, and 0xc00000000 is subtracted to obtain the corresponding physical address */
Movl $ (pg0-_ PAGE_OFFSET), % edi
/* Pass the directory table address to edx, indicating that the kernel also needs to create a page table starting from 0x00000000. This ensures that commands are obtained from physical addresses to linear addresses in system space.
The following is a detailed explanation of the smooth transition */
Movl $ (swapper_pg_dir-_ PAGE_OFFSET), % edx
Movl $0x007, % eax
Leal 0x007 (% edi), % ecx
Movl % ecx, (% edx)
Movl % ecx, page_pde_offset (% edx)
Addl $4, % edx
Movl $1024, % ecx
11:
Stosl addl $0x1000, % eax
Loop 11b
/* The number of page tables to be created by the kernel, that is, the amount of memory space to be mapped, depends on this judgment condition. During kernel initialization, as long as the kernel can be mapped to the included
Code segments, data segments, initial page tables, and K space for storing dynamic data structures */
Leal (INIT_MAP_BEYOND_END + 0x007) (% edi), % ebp
Cmpl % ebp, % eax
Jb 10b
Movl % edi, (init_pg_tables_end-_ PAGE_OFFSET)
In the above code, why does the kernel map the first few Directory items of the user space and kernel space to the same page table. in S, the kernel has entered the protection mode,
The kernel is currently in the protected mode of segment addressing, because the kernel has not enabled the paging ing mechanism, it is now using physical addresses to obtain commands, if the code encounters a symbolic address
Only 0xc0000000 can be subtracted. after the ing mechanism is enabled, the command pointer eip in the cpu still points to the low zone. if only the ing in the kernel space is established, when
After the kernel enables the ing mechanism, the addresses in the lower-level cannot be addressed. Therefore, there should be no corresponding page table unless a signed address is used as an absolute transfer or a subroutine is called. Therefore
Enable the page ing mechanism of CPU as soon as possible.
Movl $ swapper_pg_dir-_ PAGE_OFFSET, % eax
Movl % eax, % Cr/* the directory table address is saved in the control register */
Movl % cr0, % eax/* enable the ing mechanism to the top position 1 of cr0 */
Orl $0x80000000, % eax
Movl % eax, % cr0
Ljmp $ __boot_cs, $ 1f/* Clear prefetch and normalize % eip */
1:
Lss stack_start, % esp
Run the ljmp $ __boot_cs and $ 1f command to enable the CPU to enter the system space and continue to run the command because _ BOOT_CS is a symbolic address with the address above 0xc0000000.
After head. S completes the creation of the temporary kernel page table, it continues initialization, including initializing INIT_TASK, that is, the first process after the system is started; establishing a complete interrupt processing process
Then re-load the GDT descriptor, and finally jump to the start_kernel function in init/main. c to continue initialization.
3.3 create a complete kernel page table
The kernel continues to initialize the second phase in start_kernel (), because in this phase, the kernel is already in the protection mode, and the kernel page table is simply set.