To understand how the Linux kernel works, you have to know a little bit about the basics of hardware. Here we introduce the functions of several registers in the core components of the Intel 80x86 Series CPU protection mode, which play a vital role in the Linux kernel runtime. As for the other types of hardware, we introduce the device drivers to specific drivers. First, let's look at the main CPU architecture:
The European Union (universal registers, operators, and controllers) performs the part: the function required to complete the instruction.
SU (segment register, segment Converter) segmented part: Complete the address request of the execution unit and convert the virtual address to a linear address.
PU (TLB, page Converter) paging part: Converts a linear address to a physical address.
BIU (Bus Interface) interface part: Complete the command prefetch request and the execution unit data access request, the data access request takes precedence to the instruction prefetch request.
IPU (Control logic and prefetch queue) prefetch part: 16 byte instruction prefetch queue, make prefetch request.
IDU (instruction decoding, 6 byte instruction queue) decoding part: Complete the instruction decoding function.
FPU (in-chip integration of floating-point coprocessor): A processing part dedicated to floating-point operations.
Below, we do the EU, Su and PU modules to do detailed instructions, other modules are not introduced, the corresponding Linux topics will be involved. 1 EU modules
The EU module is the core and most important component of the CPU. Now the Pentium CPU has been developed for several years, but the most useful is the addition unit ALU, a set of universal Register Group, a symbol and control logic. As shown in figure:
First of all, 8 32-bit general-purpose registers are divided into three types by use: pointer registers, variable address registers, data registers.
[1] Pointer register: mainly provide all or part of the offset
ESP: The offset of the top cell in the stack segment that is specifically stored.
EBP: Holds the total/partial offset of a cell in the stack segment, or holds 32-bit or 16-bit operands or the result of the operation.
[2] Variable address registers
Esi/edi: Holds the total/partial offset of the main memory operand, also can store 16 bits operand and result, in most cases the function can be interchangeable. But the function cannot be exchanged in the string operation instruction, the source
Operands must provide an offset with ESI, and the destination operand must provide an offset with EDI.
[3] Data registers
The data registers can be used as 4 32-bit registers, as well as 8 16-bit registers, as well as 16 8-bit registers.
In a program, a data register is used to hold an operand, a result of an operation, or other information.
Data registers in many instructions require that the use of, but also implied or specific use, please refer to the details of the relevant information.
Second, 4 control registers CR0~CR3
[1] CR0: From 80286 of the MSW registers evolved, and added 2 bits, Linux most value his PG bit--pg=0, allow paging; pg=1, paging is not allowed.
[2] CR1: Not used
[3] CR2: Page fault address register, 32-bit linear address for the failed page
[4] CR3: The page directory base address registers the base addresses of the page catalog tables.
Finally take a look at the flag register fr
Fr is used to record the state of the execution of a program, that is, the state of two operands after the ALU:
[1] Carry sign bit CF (Carry Flag)
[2] odd-even sign-bit pf (Parity Flag)
[3] auxiliary carry flag bit AF (Auxiliary Carry Flag)
[4] 0-value flag ZF (Zero Flag)
[5] Sign-bit SF (Sign Flag)
[6] Overflow flag bit of (Overflow Flag)
[7] Step flag bit TF (Trace Flag)
[8] Interrupt sign bit if (interrupt-enable Flag)
[9] Directional sign-bit DF (Direction Flag)
2 su Module
Here's a look at the SU parts. This part is also used by Linux, but Linux uses it not to follow the Intel manual to virtualize addresses, but to use it to switch between user and kernel states. And to address the virtualization, it is through the PU unit, that is, the paging mechanism to achieve, the specific content will be in the memory management topics elaborated.
First, look at the architecture diagram of the SU module:
The processor provides 6 segment registers, and the only purpose of the segment registers is to store the selector (16-bit). These segment registers are called CS, SS, DS, ES, FS, and GS. Although there are only 6 segment registers, the program can use the same segment register for different purposes by first storing its value in memory and then resuming it.
Of the 6 registers, 3 have special uses:
cs--The code Snippet register, pointing to the segment that contains the program instruction.
ss--stack segment register, pointing to the segment containing the current program stack.
ds--a data segment register that points to a segment that contains static data or global data.
The other three segment registers are used for general purposes and can point to arbitrary data segments.
Each segment is represented by a 8-byte descriptor (Segment descriptor), which describes the characteristics of the segment. The descriptors are placed in the Global descriptor table, the GDT, or in the Local Descriptor table (Regional descriptor table, LDT), which is in memory, as shown in the figure. In the case of multiple CPUs, each CPU defines a GDT, and each process can have its own LDT in addition to the segments in GDT if additional segments need to be created. The base address and size of the GDT in main memory are stored in GDTR processor registers, and the LDT addresses and sizes currently being used are placed in LDTR processor registers.
The virtual address consists of 16-bit selectors and 32-bit offsets, and the segment registers only hold the selector. The CPU Segment Unit (SU) performs the following actions:
[1] First examine the TI field of the selector to determine which descriptor table the descriptor corresponds to. The TI field indicates whether the descriptor is in the GDT (in which case the segmented unit obtains the GDT linear base address from the GDTR register) or in the activated LDT (in which case the segmented unit obtains the LDTR linear base address from the LDT register).
[2] computes the address of the descriptor from the index field of the selected child, the value of the index field is multiplied by 8 (the size of a descriptor, in fact, is to block the end of the three-bit indicator of the privilege level of CPL and indicator of the field), which is added to the contents of the GDTR or LDTR registers.
[3] The corresponding descriptor is copied from memory to the cache of the CPU, so that the contents of the cache are modified only if the selection of the child changes.
[4] The linear address is obtained by adding the offset of the logical address to the value of the Description child base field in the hidden cache.
Note that, thanks to the programmable hidden cache associated with segment registers, the first three actions need to be performed only if the contents of the segment register are changed.
LDT is used very little in Linux, we don't dwell on him, it's about the same as the IDT we're talking about here.
The interrupt descriptor (Interrupt descriptor Table,idt) is a system table that is associated with each interrupt or exception vector, and each vector has an entry address for the corresponding interrupt or exception handler in the table. The kernel must properly initialize the IDT before allowing interrupts to occur.
The format of the IDT is very similar to that of the GDT and LDT, with each item in the table corresponding to an interrupt or exception vector, each of which consists of 8 bytes. Therefore, a maximum of 256x8=2048 bytes is required to store IDT (Linux has 256 interrupt vectors).
The IDTR register allows the IDT to be anywhere in memory, specifying the IDT's linear base address and its size (maximum length). The IDTR must be initialized with the Lidt assembly instruction before the interrupt is allowed.
IDT contains three types of descriptors, the following figure shows the meaning of 64 bits in each descriptor. It is particularly noteworthy that the value of the Type field in the 40~43 bit represents the type of descriptor.
The problem here is that the TSS technology is a very outdated technology, and Linux does not store the TSSD (Task door) in IDT as Intel requires, but in the global descriptor GDT. Each CPU's TR register contains the TSSD selector for TSS (which can be programmed), and contains two hidden non-programming fields: the base field of the TSSD and the Limit field as the hidden cache, so that The processor is able to address TSS directly without retrieving TSS addresses from the GDT. TSS is mainly used to save portions of the CPU registers when the process is switched (essentially those registers that are used when the stack switches). Linux prepares only one TSS data structure for each CPU--tss_struct, which is used to store part of the current process, and does not prepare a TSS data structure for each process, as recommended by Intel, and holds all content. So, as I understand it, the thread_struct structure of each process holds the contents of those registers that need tss_struct to remember when the process is executed.
When an instruction is executed, the CS and EIP register contains the logical address of the next instruction to be executed. Before the instruction is processed, the control unit checks whether an interrupt or exception has occurred when the previous instruction was run. If an interrupt or exception occurs, the control unit performs the following actions:
1. Determine the vector I (0≤i≤255) associated with the interrupt or exception.
2. Read the first item in the IDT table pointed to by the IDTR register.
3. Obtain the base address of the GDT from the GDTR register and look it up in GDT to read the segment descriptor identified by the selector in the IDT table entry. This descriptor will be an interrupt door or a trap door containing the base address of the segment where the interrupt or exception handler is specified.
4. Be sure that the interrupt was issued by the source of the authorization (interrupt) occurrence. First, compare the privilege level DPL of the current privileged CPL (stored in the lower two bits of the CS register) with the segment descriptor (stored in GDT), and if the CPL is less than DPL, a "general protection" exception is generated, Because the privileges of an interrupt handler cannot be lower than the program that caused the interrupt. For programming exceptions, do further security checks: Compare CPL with the door descriptor in the IDT DPL, if DPL is less than Cpl, a "general protection" anomaly. This last check prevents user applications from accessing special trap doors or interrupting doors.
5. Check whether the privilege level changes have occurred, that is, whether CPL is different from the DPL of the selected segment descriptor. If so, the control unit must start using the stack associated with the new privilege level. Do this by performing the following steps:
I. Read the TR register to access the TSS segment of the running process.
II. load the SS and ESP registers with the correct values for the stack segments and stack pointers associated with the new privilege level. These values can be found in TSS.
Iii. Save the previous values of SS and ESP in the new stack, which define the logical address of the stack associated with the old privilege level.
6. If the fault has occurred, use the instruction address that caused the exception to load the CS and EIP registers, so that the instruction can be executed again.
7. Save the contents of EFlags, CS and EIP in the stack.
8. If an exception produces a hardware error code, it is saved on the stack.
9. Loading CS and EIP registers, the values are the segment selector and offset field of the first gate descriptor in the IDT table respectively. These values give the logical address of the first instruction of the interrupt or exception handler.
The last step that the control unit performs is to jump to an interrupt or exception handler. In other words, when the interrupt signal is processed, the command executed by the control unit is the first instruction of the selected handler.
After the interrupt or exception is processed, the corresponding handler must produce a iret instruction that transfers control to the interrupted process, which forces the control unit:
1. Load the CS, EIP or eflags registers with the values stored in the stack. If a hardware error code has been pushed into the stack, and above the EIP content, then you must first eject the hardware error code before executing the iret instruction.
2. Check that the CPL of the handler equals the lowest two-bit value in CS (this means that the interrupted process runs at the same privilege level as the handler). If so, the iret terminates execution, otherwise, move to the next step.
3. Load the SS and ESP registers from the stack, so return to the stack associated with the old privilege level.
4. Check DS, ES, FS and GS section register contents, if one of the registers contains a selector is a segment descriptor, and its DPL value is less than CPL, then, clear the corresponding segment registers. The control unit does this to prohibit user-state programs (CPL=3) from using the segment registers (dpl=0) previously used by the kernel. If these registers are not clear, a malicious user-state program might use them to access the kernel address space.
3 PU Module
The purpose of the paging Unit PU module is to convert a linear address into a physical address. One of the key tasks is to compare the requested access type with the access rights of the linear address, and if this memory access is invalid, a page fault exception is generated.
The paging unit regards all main memory as a piece of the page frame (sometimes called a physical page). Each page box is a fixed size (the biggest difference to the segment, typically 32-bit processor is 4k, and the 64-bit processor is 64k) contains a page (page).
A data structure that maps a linear address to a physical address is called a page table, which is stored in main memory, and must be properly initialized by the kernel before the paging unit is enabled. Starting with 80386, all 80x86 processors support paging, which is enabled by setting the PG flag of the CR0 register. When pg=0, a linear address is interpreted as a physical address.
From 80386, the paging unit of the Intel processor handles 4KB of pages. The 32-bit linear address is divided into 3 fields:
Directory (directory)-Maximum 10 digits
Page table (table)--10 bits in the middle
Offset-Minimum 12 digits
When a process is running, you must have a page directory assigned to it, each of which points to the address of a page table. However, it is not necessary to allocate memory immediately for all page tables of the process. Linux allocates RAM to the page table when the process actually needs a page table to improve efficiency.
The physical address of the page directory being used is stored in the control register CR3. The top 10-bit (directory field) within a linear address determines the entry of a table of contents in the page directory, and the directory entry points to the appropriate page table. The middle 10-bit (table field) of the address in turn determines the table entry in the page table, and the table entry contains the physical address of the page box on which the page is located. The minimum 12-bit (offset field) determines the relative position within the page box (see figure). Because it is 12 bits long, each page contains 4096 bytes of data.
The page catalog entry has the same structure as the page table entry, and the content of each item consists primarily of the index of the corresponding page (the page table is also a page) and the status of the corresponding page, which we will cover in detail in the section Linux subsection paging mechanism blog in storage management.
Next to discuss the paging hardware protection scheme, paging unit and Segmented unit protection scheme is different. Although the 80x86 processor allows one segment to use four possible privilege levels, there are only two privilege levels associated with page and page tables, which are controlled by user/supervisor flags with the same structure as page table entries and page tables. If this flag is 0, the page can be addressed only if the CPL is less than 3 (which means that the processor is in the kernel State for Linux) and the page is always addressed if the flag is 1.
In addition, unlike the three access rights (read, write, execute) of a paragraph, there are only two types of access rights (read, write). If the Read/write flag for a page Catalog item or page table item is equal to 0, the corresponding page table or page is read-only, otherwise it is writable.
4 Cache
Today's microprocessor clock frequencies are close to several GHz, while dynamic RAM (DRAM) chips have access times hundreds of times times the clock cycle. This means that the CPU can wait a long time when an operand is taken from RAM or when an instruction such as a result is stored in RAM.
To this end, a new unit called row (line) is introduced into the 80x86 architecture. Rows are made up of dozens of consecutive bytes, which are transmitted in pulse burst mode (burst mode) between slow dram and fast on-chip static RAM (SRAM) for caching.
The specifics of the cache implementation details are too complex for me to simply say the principle: when accessing a RAM storage unit, the CPU extracts the index number of the subset from the physical address and compares the label of all rows in the subset to the higher of the physical address. If the label of a row is found to be the same as the high level of the physical address, the CPU hits a cache (cache hit), otherwise the cache is not hit (cache miss).
When a cache is hit, the cache controller operates differently depending on the type of access. For read operations, the controller selects data from the cache line and sends it to the CPU register; The RAM is not accessed and the CPU time is saved, so the cache system plays its part. For write operations, the controller may use one of the following two basic strategies, known as Writethrough and writeback (writeback), respectively. In pass-write, the controller always writes both RAM and cache lines, shutting down the cache in order to increase the efficiency of the write operation. The write-back method only updates the cache line and does not change the contents of the RAM, providing a faster effect. Of course, after the writeback is over, Ram will eventually have to be updated. The cache controller writes the cache line back into RAM only if the CPU executes an instruction that requires the cache table entry to be refreshed, or when a flush hardware signal is generated (usually after a cache misses the occurrence).
When the cache is not hit, the cache line is written back into memory, and if necessary, the correct line is removed from RAM and placed in the cached table entry. It's complicated. We should be happy because all of this is done at the hardware level and the kernel doesn't need to be concerned.
Cache technology is fast moving forward. For example, the first generation of Pentium chips contains a chip cache called L1-cache. The recent chip also contains additional capacity, slower, called L2-cache,l3-cache cache on the chip. The consistency between multilevel caches is implemented by the hardware. Linux ignores these hardware details and assumes that there is only one single cache.
The CD flag bit for the processor's CR0 registers is used to enable or disable the cache circuit. The NW flag in this register indicates whether the cache is using a write-through or write-back policy.
In addition to the general-purpose hardware cache, the 80x86 processor also includes a cache called a translation fallback buffer or a TLB (translation lookaside buffer, which is also called "associative memory" in some books) to speed up the conversion of linear addresses. When a linear address is used for the first time, the corresponding physical address is computed by slow access to the page table in RAM. At the same time, the physical address is stored in a TLB table entry (TLB entry) so that subsequent references to the same linear address can be converted quickly, as shown in the figure.
For example, the CPU gives a valid address (D,P,W), which sends the page number P into the input register, then immediately compares it to the page number of the TLB units, such as matching the page number in a cell, and then the block number B in the cell is fed into the output register. In this way, you can use (D,B,W) to access the corresponding main deposit.
In a multiple-processing system, each CPU has its own TLB, which is called the local TLB of the CPU. In contrast to the hardware cache, the corresponding entries in the TLB do not have to be synchronized because processes running on existing CPUs can associate the same linear address with a different physical address.
When the CPU's CR3 control register is modified, the hardware automatically invalidates all entries in the local TLB, because the new set of page tables is enabled and the TLB points to the old data.
Reprint: http://blog.csdn.net/yunsongice/article/details/5478032