[Windows Kernel Principle and Implementation] Reading Notes (2)

Source: Internet
Author: User
Tags apc

Content: Pages 43-51

Processor Mode

On the intelx86 processor, the segment descriptor has a two-bit privileged level: 0 indicates the highest privileged level, and 3 indicates the lowest privileged level. Windows only uses 0 and 3 privileged levels. Privileged level 0 indicates that the CPU is in kernel mode, and 3 indicates user mode. The processor has many commands that can only be used in privileged-level 0 mode, such as I/O commands and commands that manipulate internal registers (such as gdt, IDT, and MSR. In Windows, when the processor is in user mode, the processor can only access the address space of the current process. In kernel mode, the processor can not only access the address space of the current process, but also the system address space.

When a command stream (thread) is executed, mode switching may occur in the following situations:

1. the user mode code triggers an exception;

2. User mode code is interrupted during execution.

3. Execute special mode switch commands.

Memory Management

In Windows, 0 ~ 2 GB is the process address space, 2 GB ~ 4 GB is the system address space. To effectively manage 2 GB of system address space, Windows divides the 2 GB space into some fixed areas during initialization, and each area has a special purpose. The Windows kernel uses a set of global variables to record the boundaries of each region. The system address Initialization is to initialize these global variables and initialize each region accordingly. Windows boot options and system configurations (in the Registry) may affect the location and size of some regions.

The main areas of system space include kernel module images, PFN databases, page feed memory pools, non-page feed memory pools, session space, system cache areas, system PTE areas, system views, and page tables.

Windows uses the level-2 or multi-level page table mechanism of intelx86 to access virtual memory. The translation process involves querying the page Directory and page table. If a page indicated by the page table item is not in the physical memory, a page error (page fault) is triggered. The Virtual Memory Manager re-brings the data or code that has been swapped out of the disk into the physical memory for access through a page error exception. Windows uses the page table mechanism to implement a flexible page exchange algorithm, which supports one or more page exchange files, and also implements copy-on-write for memory pages) features.

In system space, different regions use different Memory Page Management Algorithms:

1. Non-Page Swap memory pool: the memory pool has been mapped to the physical page during initialization. The idle page is linked according to different granularities. The application and release page operations are actually for idle linked lists.

2. Page Swap memory pool: Windows uses bitmap to manage page allocation.

3. System PTE region: This part of the memory region is not stored in PTE, but only indicates that this part of the address range is managed in the form of PTE and is managed as a resource.

Based on the memory region management of these systems, the Windows executor provides a set of smaller memory pools (multiples of 8b, the memory pool includes the execution body swap memory pool and the execution body non-Page Swap memory pool. These memory pools record the idle memory blocks on each applied page through the idle linked list. When the memory is released, it is automatically merged with adjacent idle memory blocks to form larger idle memory blocks. Other kernel components use these memory pools by using the API functions exposed by the execution body (such as exallocatepoolwithtag and exfreepoolwithtag.

 

The process address space is created along with the process. The process address space is managed based on whether the virtual address is allocated or retained, the user mode code uses the Windows API functions virtualalloc and virtualfree to apply for the address range to be released. The virtual memory manager in the kernel uses a balanced binary search tree to manage the usage of the process address space. Each node in the tree, vad (Virtual Address Descriptor, Virtual Address Descriptor), describes a continuous range of addresses.

In vad, an important node type is a section object, which is a common way to share memory between two or more processes on Windows. Memory zone objects can be mapped to system page files, executable files, or other data files, or to physical memory. A memory area object represents a physical storage resource.

 

Another important function of the memory manager is to manage effective physical memory. In the Windows system address space, a region called PFN (page frame number database, page frame number database) is retained. Each physical page corresponds to an item in the PFN database, which describes the status of the page. Windows supports eight states: active, standby, modified, modified, but not written, transferred, idle, zero, or bad. Activity means that this is being used by a process or system space, and a corresponding PTE points to this page. A hardware error is detected on the physical page.

 

The working set manager must address how the Memory Manager allocates limited physical pages to processes that need to use memory when processes in the system need to use a large amount of memory. The working set here refers to the set of physical pages currently used by a process. Each process has a working set linked list, where each item not only records the number of the physical page, but also records other attributes, including its age. The working set manager can select the process to be trimmed based on some policies. For the selected process, select which pages are swapped out to the disk, and then the physical page is vacated. The working set manager runs in a thread that becomes the balance set manager and is triggered every 1 s. When the available memory is too low, it is also triggered. The balance set manager also regularly triggers the process/stack switch (process/stack Swapper ). The process/stack switch is another independent thread. Once awakened, the process and stack meeting specific conditions will be swapped into or out of memory.

 

Process and thread management

Windows priority is divided into 0-. 0 indicates the system priority, which is the lowest priority. It is only used for zero page threads. 1-15 indicates the dynamic priority. In some cases, the dynamic priority of threads can be fine-tuned within this range; 16-31 is a real-time priority for processing some real-time processing tasks.

A job is a kernel object supported by the execution body. It allows one or more processes to be managed and controlled as a whole.

Fiber is a user thread that is invisible to the kernel and implemented by kernel32.dll.

 

Interruptions and exceptions

Interruption is an important way for the processor to deal with external devices. Exceptions are some special events that occur during the execution of the normal instruction stream of the processor. An emergency is required to continue the original instruction flow. They all interrupt a normal command stream, but the difference is that the interruption is not essentially related to the current command stream, and the exception is the direct result of the current command flow execution. The interrupt is asynchronous and the exception is synchronous.

Intelx86 uses the same trap mechanism to handle interruptions and exceptions. It uses IDT to associate each interruption or exception with a service routine that handles the interruption or exception. Based on this hardware mechanism, Windows provides a more flexible software mechanism that allows the driver program to add its interrupt service routine (ISR, interrupt service routine) to a specific interrupt vector ). An interrupt vector allows you to connect multiple interrupt objects (Interrupt objects). Here, the interrupt object is a kernel object that encapsulates the interrupt service routine. Therefore, when an interrupt occurs, the interrupt service routines in these interrupt objects have the opportunity to handle the interrupt. At the same time, multiple hardware devices can share the same hardware interrupt vector.

The Interrupt Controller (such as APIC) allows you to set the priority of each hardware interrupt, but does not use the Interrupt Controller in windows. Instead, it specifies a set of software interrupt priorities, called the interrupt request level (IRQL, interrupt request level ). In the intelx86 system, Windows uses 0 ~ 31 indicates IRQL. A greater value indicates a higher priority. The processor always has a current IRQL at runtime. If an interrupt occurs and the IRQL of the interrupt source is equal to or lower than the current level, the interrupt is blocked until the IRQL of the processor falls down. IRQL = 0 indicates a common thread, called passive_level or passive level. It has the lowest priority and can be interrupted by any other level of interruption. IRQL = 1 indicates the asynchronous process call (APC, asynchronous procedure call), which is called apc_level. Inserting an APC object in a thread can interrupt the execution of the thread. IRQL = 2 indicates that the processor is doing one of the following two tasks: thread scheduling; processing the second half of a hardware interruption (less urgent) is called a latency process call (DPC ). Therefore, IRQL = 2 is also called the dispatch/DPC level, that is, dispatch_level. 3 ~ 26 is the device IRQL, 27 ~ 31 is a special hardware interruption, including clock interruption and inter-processor interruption.

DPC is an important concept. It is often used to execute less urgent tasks than the current high-priority tasks. The hardware interrupt service routine can put some non-urgent tasks into a DPC object for processing, this reduces the time spent on high IRQL processors. In the Windows Kernel, a typical usage of DPC is the implementation of timer. In the clock interrupt service routine, DPC is responsible for processing the interrupt time, system time, and the time information of the current thread, the system checks whether a timer expires in the timer array. If yes, the system sends a dispatch_level soft interrupt request. Timer expiration is handled during DPC delivery (deliver), and the timer is delivered and executed as a special DPC object.

APC is a thread-related routine that can only be executed in a specific thread environment, so it must be executed in a specific address space. When a thread obtains the execution right, its APC routine will be executed immediately. This feature makes APC very suitable for implementing various asynchronous notification events. For example, the completion notification of I/O can be implemented using APC.

In the intelx86 processor, exceptions are also distributed by IDT. In the IDT table, 0 ~ The interrupt vectors between 0x1 F are reserved by Intel, except that the 2 interrupt vectors are retained to NMI (unshielded interrupt, other defined interrupt vectors are applicable to exceptions caused by various conditions. Windows provides exception processors for all exceptions. Some exceptions are directly handled by the system kernel, such as page errors (0x0e exceptions), which are taken over by the Virtual Memory Manager, some exceptions must be handled by the code of the current thread or Windows subsystem.

In kernel mode, the exception distributor first submits the exception to the kernel debugger for processing. If the exception is not handled by the kernel debugger or the kernel debugger does not, it tries to distribute the exception to a frame-based exception processor, it associates the exception processor with the stack frame. Therefore, when an exception occurs, the exception distributor searches for the exception processor associated with the stack frame in the current stack. If such an exception processor is not found, the exception distributor submits the exception to the kernel debugger again. If the exception is not handled, it is considered a serious error and the system crashes.

In user mode, the exception distributor first checks whether the debugging port of the process is valid. If it is valid, it sends a message to the debugging port and waits for a response. Otherwise, the exception is handed over to the kernel debugger. If not yet processed, the control is switched to user mode, and a frame-based exception processor is searched by the user-mode exception distributor (keuserexceptiondispatcher function in Ntdll. If not found, control back to kernel mode again. This time, the kernel-mode exception processor first tries to debug the port. If the exception is not handled, then try the exception port of the current process again. The Windows subsystem is used to connect to the abnormal process port. If the Windows subsystem still has the opportunity to handle the exception of the process to which it belongs at the last time of exception handling, and if it cannot handle the exception, the process is terminated.

 

Synchronization

In modern operating systems, due to the existence of various concurrency factors such as multi-processor, multi-core, or interrupt, the same code may be developed and executed, and data may be accessed concurrently. In these cases, perform necessary synchronization for data that may be accessed concurrently. Depending on whether IRQL in the execution environment is greater than apc_level or passive_level, the synchronization mechanism can be divided into two categories: "synchronization mechanism independent of Threads" and "synchronization mechanism based on Thread Scheduling.

When IRQL is greater than apc_level, Windows provides the following typical synchronization mechanism:

  • Improve IRQL
  • Lock Operation
  • Lockless single-chain table
  • Spin lock (busy waiting)

The other is synchronization between threads on passive_level. Windows defines a unified mechanism to support various thread synchronization primitives: the dispatcher object, whose data structure header is dispatch_header. Windows Server 2003 implements the following distributor objects:

  • Event
  • Mutant
  • Semaphores
  • Queue object
  • Process object
  • Process object
  • Timer object
  • Door object

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.