Linux Kernel entry (5)-required hardware knowledge

Last Update:2018-12-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

To understand how the Linux kernel works, you must understand basic hardware knowledge. Here we will mainly introduce the functions of several registers in the core components of Intel 80 x86 series CPU protection mode. These registers play a vital role in Linux kernel running. For other hardware devices, we will introduce specific drivers when explaining device drivers. First, let's take a look at the main architecture of the CPU:

Eu (general-purpose registers, receivers, and controllers) execution part: completes the features required by the instruction.
SU (segment register, segment converter) segment part: completes the address request of the Execution Unit and converts the virtual address to a linear address.
Pu (TLB, page converter) Paging part: converts a linear address to a physical address.
BIU (Bus Interface) interface components: complete command prefetch requests and data access requests of execution units. data access requests take precedence over command prefetch requests.
IPU (control logic and prefetch Queue) prefetch component: A 16-byte command prefetch queue that initiates a prefetch request.
IDU (command decoding, 6-byte command queue) decoding component: completes the command decoding function.
FPU (in-chip integration with floating point coprocessor): A processing component dedicated to floating point operations.

Next, we will give a detailed description of the EU, Su, AND Pu modules. Other modules will not be introduced for the moment. The corresponding Linux topics will be covered.

1 EU module

The EU module is the core and most important component of the CPU. The current Pentium CPU has been developed for several years, but the most effective is the addition unit ALU, a group of General registers, a flag and control logic.

First, eight 32-bit General registers are divided into three types by usage: pointer register, address register, and data register.

[1] pointer register: mainly provides full or partial offset

ESP: the offset of the top unit of the stack in the stack segment.
EBP: stores the full/partial offset of a unit in a stack segment, and can also store 32-bit or 16-bit operands or operation results.

[2] address change register

ESI/EDI: stores the full/partial offset of primary operations, and also stores 16-bit operations and results. In most cases, functions can be exchanged. However, the functions in the string operation commands cannot be exchanged.

The operands must use ESI to provide the offset, and the destination operands must use EDI to provide the offset.

[3] data register

◆ Data registers can be either 4 32-bit registers, 8 16-bit registers, or 16 8-bit registers.
◆ In a program, data registers are used to store operations, calculation results, or other information.
◆ Data registers must be specified in many instructions, but they are also implicitly or specifically used. For details, refer to relevant information.

Secondly, four control registers (Cr0 ~ 303.

[1] Cr0: evolved from the MSW register of 80286, and added 2 bits. in Linux, PG bit -- PG = 0 is the most important, and paging is allowed. Pg = 1, pagination is not allowed.
[2] CR1: unused
[3] Cr2: page fault address register, which stores the 32-bit linear address of the faulty page
[4] Cr 3: page Directory base address register, which stores the base address of the page Directory table.

Finally, let's take a look at the flag register Fr.

FR is used to record the state when the program is executed, that is, the State after two operands pass through ALU:

[1] Carry flag CF (carry flag)
[2] parity flag PF (parity flag)
[3] auxiliary carry flag AF (Auxiliary carry flag)
[4] Zero flag ZF (zero flag)
[5] symbol flag SF (sign flag)
[6] overflow flag)
[7] single-step flag TF (trace flag)
[8] If (Interrupt-enable flag)
[9] Direction Flag DF (Direction Flag)

2 su Module

The following describes the su components. This part is also used by Linux, but Linux uses it not to virtualize the address according to the intel manual, but to switch between the user State and the kernel state. Address virtualization is implemented through the Pu unit, namely the page splitting mechanism. The specific content will be elaborated in the memory management topic.

First, let's take a look at the architecture of the su module:

The processor provides six segment registers. The unique purpose of segment registers is to store the Selection Sub-registers (16-bit ). These segment registers are called CS, SS, DS, es, FS, and Gs. Although there are only six segment registers, the program can use the same segment register for different purposes by saving the value in the memory and restoring it after it is used up.

Three of the six registers have special purposes:
CS-code segment register, pointing to the segment containing program instructions.
SS-stack segment register, pointing to the segment containing the current program stack.
DS-data segment register, pointing to a segment that contains static or global data.

The other three segment registers can point to any data segment for general purposes.

Each segment is represented by an 8-byte segment descriptor, which describes the features of the segment. The description is stored in the Global Descriptor Table (gdt) or Local Descriptor Table (LDT). These tables are in the memory ,. For multiple CPUs, each CPU defines a gdt, and each process can have its own LDT if it needs to create additional segments in addition to the segments stored in the gdt. The base address and size of gdt in the primary memory are stored in the GDTR Processor register, and the currently used LDT address and size are placed in the ldtr Processor register.

The virtual address consists of a 16-bit selector and a 32-bit offset. The segment register only stores the selection subscriber. The segment unit (SU) of the CPU performs the following operations:
[1] first check the Ti field of the child to determine which Descriptor Table the child corresponds. The Ti field specifies whether the description is in gdt (in this case, the segment Unit obtains the linear base address of gdt from the GDTR register) or in the activated LDT (in this case, segment units obtain the linear base address of LDT from the ldtr register ).
[2] calculate the sub-address of the description from the index field selected, and multiply the value of the index field by 8 (the size of a sub-description, in fact, it is to block the three CPL at the end and the Ti field at the privileged level.) This result is added to the content in the GDTR or ldtr register.
[3] copy the corresponding description sub-from the memory to the hidden cache of the CPU. In this way, the content in the cache will be modified only when the sub-change is selected.
[4] The linear address is obtained by adding the offset of the logical address and the value of the sub-base field described in the hidden cache.

Note that, thanks to the unprogrammable hidden cache related to the segment register, the first three operations are required only when the segment register content is changed.

LDT is rarely used in Linux, so we will not elaborate on it. It is similar to the IDT described below.

The Interrupt Descriptor Table (IDT) is a system table that is associated with every interrupt or exception vector, each vector has an entry address for the corresponding interrupt or exception handling program in the table. Before the kernel can interrupt, it must initialize the IDT properly.

The IDT format is very similar to the gdt and LDT formats. Each item in the table corresponds to an interrupt or exception vector, and each vector consists of 8 bytes. Therefore, up to 256 × 8 = 2048 bytes are required to store IDT (Linux has 256 interrupt vectors ).

The idtr Register enables the IDT to be located anywhere in the memory. It specifies the linear base address and its size (maximum length) of the IDT ). Before interruption is allowed, the LIDT Assembly command must be used to initialize idtr.

IDT contains three types of descriptors, showing the meaning of 64-bit in each descriptor. In particular ~ The value of the 43-bit type field indicates the type of the descriptor.

The TSS technology is an outdated technology. Linux does not store tssd (Task gate) in IDT as Intel requires, it is stored in the Global Descriptor gdt. The TR register of each CPU contains the tssd selector corresponding to the TSS (this selector can be programmed) and two hidden non-programming fields: the base field and limit field of tssd are used as the hidden cache, so that the processor can directly address TSS without retrieving the TSS address from gdt. Tss is mainly used to save some CPU register content during process switching (in fact, it is mainly the registers used during stack switching ). Linux only prepares a TSS data structure for each CPU-tss_struct, which is only used to store part of the register content of the current process and does not prepare a TSS data structure for each process as recommended by Intel, and store all the content. Therefore, according to my understanding, the thread_struct structure of each process stores the content of those registers that need to be remembered by tss_struct when the process is executed.

After an instruction is executed, the Cs and EIP pairs contain the Logical Address of the next instruction to be executed. Before processing the command, the control unit checks whether an interruption or exception has occurred while running the previous command. If an interruption or exception occurs, the Control Unit performs the following operations:

1. Determine the vector I (0 ≤ I ≤ 255) associated with the interrupt or exception ).
2. Read the I entry in the IDT table directed by the idtr register.
3. Obtain the base address of gdt from the GDTR register and search for it in gdt to read the segment descriptor identified by the selector in the IDT table. This descriptor will be an interrupt door or a trap door, which contains the base address of the specified segment of the interrupt or exception handling program.
4. Make sure that the interruption is initiated by the authorized (interrupted) source. First, compare the current privileged CPL (the lower two digits of the CS register) with the descriptor privileged DPL of the segment descriptor (stored in gdt). If CPL is smaller than DPL, A general protection exception occurs because the privileges of the interrupt handler cannot be lower than those of the program that causes the interruption. For programming exceptions, perform a further security check: Compare CPL with the DPL of the door descriptor in IDT. If DPL is smaller than CPL, a general protection exception is generated. This last check prevents user applications from accessing special traps or interrupting the door.
5. Check whether the privilege level changes, that is, whether CPL is different from the DPL of the selected segment descriptor. If yes, the control unit must start to use the stack associated with the new privileged level. Follow these steps:

I. Read the tr register to access the TSS segment of the running process.

Ii. Load SS and ESP registers with the correct values of stack segments and stack pointers related to the new feature level. These values can be found in TSS.

Iii. Save SS and ESP values in the new stack. These values define the Logical Address of the stack associated with the old privileged level.

6. If the fault has occurred, use the abnormal command address to load the Cs and EIP registers so that the command can be executed again.
7. Save the content of eflags, Cs, and EIP in the stack.
8. If an exception generates a hardware error code, save it in the stack.
9. Load the Cs and EIP registers. Their values are the segment selection and offset fields of the I-gate descriptor in the IDT table. These values provide the Logical Address of the first instruction of the interrupt or exception handling program.

The last step of the control unit is to jump to the interrupt or exception handling program. In other words, after the interrupt signal is processed, the Command executed by the control unit is the first command of the selected processing program.

After an interrupt or exception is handled, the corresponding handler must generate an iret command to forward control to the interrupted process, which forces the control unit:

1. Load the CS, EIP, or eflags registers with the values stored in the stack. If a hardware error code has been pushed into the stack and on the EIP content, the hardware error code must be displayed before executing the iret command.
2. Check whether the CPL of the processing program is equal to the minimum two values in CS (this means that the interrupted process and the processing program run at the same privileged level ). If yes, iret terminates the execution; otherwise, it is transferred to the next step.
3. Load the SS and ESP registers from the stack, and therefore return to the stack associated with the old privilege level.
4. check the content of DS, es, FS, and GS segment registers. If one register contains a segment descriptor and Its DPL value is smaller than CPL, clear the corresponding segment register. The control unit is used to prohibit user-state programs (CPL = 3) from using the segment registers previously used by the kernel (DPL = 0 ). If these registers are not clear, malicious user-state programs may use them to access the kernel address space.

3 Pu Module

The purpose of the paging unit Pu module is to convert linear addresses into physical addresses. One of the key tasks is to compare the requested access type with the access permission of the linear address. If this memory access is invalid, a page exception is generated.

The paging unit regards all primary storage as one block and calls it a page frame (sometimes called a physical page ). The size of each page box is fixed (the biggest difference from the segment, usually 32-bit processor is 4 K, 64-bit processor is 64 K) to contain a page ).

The data structure that maps a linear address to a physical address is called a page table, which is stored in the primary storage. Before enabling a page unit, the kernel must initialize the page table properly. Starting from 80386, all 80x86 processors support paging, Which is enabled by setting the PG flag of the Cr0 register. When Pg = 0, the linear address is interpreted as a physical address.

Starting from 80386, Intel processor paging units process 4 kb pages. A 32-bit linear address is divided into three fields:
Directory-up to 10 Characters
Page table -- 10 digits in the middle
Offset -- minimum 12 bits

When a process is running, there must be a page Directory allocated to it, and each of its directory items points to the address of a page table. However, there is no need to allocate memory to all the page tables of the process immediately. In Linux, Ram is allocated to a page table to improve efficiency when a process actually needs a page table.

The physical address of the page Directory in use is stored in the control register S3. The maximum 10 characters (directory Field) in a linear address determine the directory items in the page Directory, And the directory items point to the appropriate page table. The 10-digit (Table field) in the middle of the address determines the table items in the page table in sequence, and the table items contain the physical address of the page box where the page is located. The minimum 12-bit offset field determines the relative position in the page (see figure ). Because it is 12-bit long, each page contains 4096 bytes of data.

The page Directory items have the same structure as the page table items. The content of each item mainly includes the index of the corresponding page (the page table is also a page) and the status of the corresponding page, we will introduce the multipart Linux paging mechanism in detail in the storage management blog.

Next, let's talk about the paging hardware protection scheme. The paging unit and segmented unit protection scheme are different. Although the 80x86 processor allows one segment to use four possible privileged levels, there are only two privileged levels related to pages and page tables, controlled by the user/supervisor flag with the same structure as the page Directory items and page table items. If the flag is 0, the page can be addressable only when CPL is smaller than 3 (which means that the processor is in kernel state for Linux). If the flag is 1, the page can always be addressable.

In addition, unlike the three access permissions (read, write, and execute) of segments, the page has only two access permissions (read and write ). If the read/write flag of a page Directory item or page table item is equal to 0, the corresponding page table or page item is read-only, otherwise it can be read and written.

4. High-speed cache

Today's microprocessor clock frequency is close to several GHz, and the dynamic RAM (Dram) chip's access time is hundreds of times the clock cycle. This means that the CPU may wait for a long time when an operand is obtained from Ram or a command such as the result is stored in Ram.

Therefore, a new line unit is introduced in the 80x86 architecture. Rows are composed of dozens of consecutive bytes, which are transmitted between slow DRAM and fast on-chip static RAM (SRAM) in Pulse burst mode for high-speed cache.

The specific implementation details of High-speed cache are too complex. I will simply talk about the principle: when accessing a ram storage unit, the CPU extracts the index number of the subset from the physical address and compares the labels of all rows in the subset with the high-level of the physical address. If the label of a row is found to be the same as that of the physical address, the CPU hits a cache (Cache hit); otherwise, the cache does not hit (cache miss ).

When a cache is hit, the operations of the cache controller vary depending on the access type. For read operations, the Controller selects data from the cache row and sends the data to the CPU register. Ram is not accessed and saves CPU time. Therefore, the cache system plays its due role. For write operations, the Controller may adopt one of the following two basic policies: writethrough and writeback ). In write-through, the Controller always writes both Ram and high-speed cache rows. To improve the write efficiency, the high-speed cache is disabled. The write-back method only updates high-speed cache rows without changing the ram content, providing faster performance. Of course, after the write-back ends, Ram must be updated. Only when the CPU executes a command to refresh the cache table item, or when a flush hardware signal is generated (usually after the cache miss ), the cache controller writes the cache lines back to ram.

When the cache does not hit the cache, the cache row is written back to the memory. If necessary, extract the correct row from Ram and put it in the cache table item. Complicated, right? We should be excited, because all of this is handled at the hardware level, and the kernel does not need to be concerned.

The cache technology is moving forward. For example, the first-generation Pentium chip contains an on-chip cache called a L1-cache. The recent chip also contains another larger capacity, slow speed, called L2-cache, L3-cache on-chip high-speed cache. Multi-level high-speed cache consistency is achieved by hardware. Linux ignores the hardware details and assumes that there is only one separate high-speed cache.

The CD flag of the Cr0 register of the processor is used to enable or disable the cache circuit. The NW flag in this register indicates whether the cache uses a write-through or write-back policy.

In addition to high-speed caching of general hardware, the 80x86 processor also contains another buffer called translation backup buffer or TLB (translation lookaside buffer, which is also called "Lenovo memory" in some books ") the cache is used to accelerate linear address conversion. When a linear address is used for the first time, the corresponding physical address is calculated through slow access to the page table in Ram. At the same time, the physical address is stored in a TLB entry, so that the reference to the same linear address can be quickly converted ,.

For example, if the valid IP Address Provided by the CPU is (d, p, W), it sends the page number P to the input register and immediately compares it with the page number of each TLB unit, if it matches the page number in a unit, the block number B in the unit is sent to the output register. In this way, you can use (D, B, W) to access the corresponding primary storage unit.

In a multi-processing system, each CPU has its own TLB, which is called the local TLB of the CPU. In contrast to the hardware high-speed cache, the corresponding items in TLB do not need to be synchronized, because processes running on the existing CPU can associate the same linear address with different physical addresses.

When the CPU's control register is modified, the hardware automatically invalidates all items in the local TLB because a new set of page tables are enabled and TLB points to the old data.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Linux Kernel entry (5)-required hardware knowledge

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support