Windows NT/2000 paging mechanism (http://webcrazy.yeah.net)

Source: Internet
Author: User
Windows NT/2000 paging Mechanism
WebSphere (tsu00@263.net)

Memory Management is the most important part of the operating system, which determines the performance of the operating system. Intel x86 adopts the segmented and paging memory mechanism. Windows NT/2000 makes full use of this advanced mechanism. Segment IA-32 system uses the page Directory and page table (page table, SoftICE in the page command can display the page Directory and page table, etc.) to form a 4G address addressing capability. Windows NT/2000 with Intel 32-bit processor is discussed in this article, and Windows NT/2000 is mentioned only. In Windows, no matter the code segment, data segment, and stack Segment Base Address of the user State and core State are both 0, the logical address mentioned in this article (composed of the segment base address and offset) it is equal to the linear address value. Since linear addresses are visible to users, if I do not specifically specify physical addresses, the addresses are also linear addresses.

The page Directory (PDE) consists of 1024 items, each pointing to a page table (PTE), each page table also consists of 1024 pages, the size of each page of the IA-32 system is 4 K, therefore, the addressable range is 4G (1024*1024*4 K ). In Windows, each process has its own process address space, that is, its own page Directory and page table. Each process uses a linear address c0300000h to point to the address of its specific page directory, and each item in the page Directory (that is, the page table) is arranged at the linear address c0000000h in sequence, each page table occupies 4 K (1024*4) bytes. For example, the first page table is located at c0000000h, and the second page table is located at c0000000 + 1000 h (4 K), that is, c0001000, similarly, the formula is c0000000h + page Directory offset value (10 digits higher than the linear address) x 1000 h. I will use this formula below. Of course, the premise of the above description is that each page table is in the physical memory (specified by the P bit in each item in the page Directory), which is why the IA-32 uses two levels of page table, otherwise, each process requires 4 MB (4*1024*1024) memory in addition to its code and data.

The above mechanism implements physical address addressing, in Windows NT/2000, the physical address and linear address are converted to each other (although the CPU only needs to convert the linear address to the physical address when operating on the memory, however, we still need to convert physical addresses into linear addresses when analyzing program code ). Take a look at SoftICE's analysis as an example:

// After the ADDR explorer command, the following operations will only be performed in the private process space of process explorer.
: ADDR Explorer

// Display the physical address of the directory on the explorer process page, that is, the value in pdbr (MCM) after the process switches to Explorer
: ADDR
32a LDT base: Limit kpeb addr pid name
.
.
.
00c10000 ff9fc920 036c Explorer
.
.
.

/*

Linear address format: Offset of 1 page (4096 bytes) corresponding to 0-11 bits
12-21-bit addressing to 1024 items in the current page table
2-31 characters addressing the page Directory
The 20-bit high (12-31 digits) is also called bytes.
According to the formula mentioned above, the physical address height is 20 bits, and the linear address page table offset is used as the physical address
The linear address is converted to a physical address, expressed:

@ (C0000000h + PVDF * 1000 h + PTE * 4) & 0fffff000h + Po
= @ (C0000000h + 4 * (PVDF * 400 h + PTE) & 0fffff000h + Po
= @ (C0000000h + (PVDF> 10d + PTE) <2d) & 0fffff000h + Po
= @ (C0000000h + (La> 12d) <2d) & 0fffff000h + Po
= @ (C0000000h + (La & 0xfffff000)> 10 d) & 0fffff000h + Po

In the above formula, the offset values of the page Directory and page table are represented by PDDE and PTE, respectively, and La is used to represent the given linear
Address. Po represents the 12-bit lower of LA, h and D represent the 16/10 hexadecimal notation respectively, and @ represents the content in the address pointer after obtaining.
After this analysis, the page Directory corresponding to the linear address c0300000 is 300 h, the page table is 300 h, and the offset is 0.
Then c0000000h + Pd * 1000 h + Pt * 4 + Po = c0000000 + 300 h * 1000 h + 300 h * 4 + 0
*/

: Dd c0000000 + 300*1000 + 300*4 L 4
0010: c0300c00 00c10063 01a31063 00000000 0141f163 c.
|
| _ 12-bit low (0-11) 063 is the attribute bit, Intel Reserved Bit and System (OS) usage bit

// Display the c0300000 physical address (00c10000)
DWORD (@ (c0000000 + 300*1000 + 300*4) & fffff000 + c0300000 & 00000fff
00c10000

// Use SoftICE for verification
: Phys DWORD (@ (c0000000 + 300*1000 + 300*4) & fffff000 + c0300000 & 00000fff
C0300000

: Page c0300000
Linear Physical attributes
C0300000 00c10000 P d a s RW

In fact, the last command above can implement the functions of all other commands. Below I will list the implemented code segments:

// Linear address-> physical address
// The page command in SoftICE can implement this function
// A linear address corresponds to a unique physical address.
// If this function returns 0, this linear address does not correspond to a physical address.

Ulong linearaddresstophysicaladdress (ulong laddress)
{
Unsigned int * paddr;
Unsigned int * pagedirectoryentry = (unsigned int *) 0xc0300000;
Unsigned int * pagetableentry = (unsigned int *) 0xc0000000;

// Determine whether the page Directory is valid. The 0th-bit (P) is an existing bit. For more information, see related books.
If ((! (Pagedirectoryentry [laddress> 22] & 0xfffff000 ))
&&(! (Pagedirectoryentry [laddress> 22] & 0x00000001 )))
Return 0;

// @ (C0000000h + (La & 0xfffff000)> 10 d) & 0fffff000h + Po
Paddr = (int *) (INT) pagetableentry + (laddress & 0xfffff000)> 10 ));
If (* paddr) & 1)
Return (* paddr) & 0xfffff000) | (laddress & 0x00000fff );
Return 0;
}

So how can we convert physical addresses to linear addresses in reverse order? Although there is no relationship between them, because you know the specific location of the page Directory and the page table, you can use the search method directly in this range. Theoretically, this range is 1024*1024*4 (4 m) at most. However, because many page Directory items do not exist in the physical memory, the search range is much smaller. This also leads to a problem that may result in a blue screen (the address that does not exist for kernel-mode access ). So the following code checks the p bits of each item in the page Directory.

// Physical address-> linear address
// Equivalent to the Phys command in SoftICE
// Search all valid page tables to find the specified physical address
// Multiple linear addresses may point to the same physical address at the same time.
// If no result is output for this function, it indicates that no linear address is mapped to this physical address.

Void physicaladdresstolinearaddress (ulong paddress)
{
Unsigned int * paddr;
Unsigned int * pagedirectoryentry = (unsigned int *) 0xc0300000;
Unsigned int * pagetableentry = (unsigned int *) 0xc0000000;
Int I, J;
Dbuplint ("/N ");
For (I = 0; I <1024; I ++)
If (pagedirectoryentry [I] & 0xfffff000) & (pagedirectoryentry [I] & 0x00000001 ))
For (j = 0; j <1024; j ++ ){
Paddr = (int *) (INT) pagetableentry + I * 4096 + J * 4 );
If (* paddr) & 0x00000001)
If (* paddr) & 0xfffff000) = (paddress & 0xfffff000 ))
Dbuplint ("% 08x/N ",
(I * 4*1024*1024 + J * 4*1024) & 0xfffff000) | (paddress & 0x00000fff ));
}
}

The above two program sections involve memory access within the 2-G 4G Range (linear address c0000000 or above), which cannot be implemented by common user-state programs. In Windows, use the device driver to make it run correctly in the core state.

In Windows, how does one utilize the paging mechanism to efficiently and reasonably utilize the limited physical memory? Jeffrey Richter's classic book <programming applications for Microsoft Windows, Fourth Edition> describes the memory management mechanism of windows. For more information, see this book! The following shows the linear relationship between the two running instances of the same program (mspaint.exe) and the physical address, to illustrate the memory paging mechanism of windows.

// The following figure shows the address of each segment after mspaint.exe is installed into the memory (the ing addresses of the two mspaint.exe processes at the same time are consistent)
: Map32 mspaint
Owner OBJ name OBJ # address size type
Mspaint. Text 0001 001b: 01001000 0003a500 code Ro
Mspaint. Data 0002 0023: 0103c000 00002670 idata RW
Mspaint. rsrc 0003 0023: 0103f000 00020.c8 idata Ro
-------------
|
| _ Logical Address

// Relationship between the linear address and the physical address of the first running instance of mspaint.exe

Linear address range physical address range attributes
----------------------------------------------------
00010000-00010fff 03a8b000-03a8bfff 047
00020000-00020fff 03bcc000-03 bccfff 047
0006d000-0006 dfff 018bc000-018 bcfff 047
...
...
...
// The. Text Segment of the first instance of mspaint.exe
01001000-01001fff 00596000-00596fff 005
01002000-01002fff 03f97000-03f97fff 005
01003000-01003fff 03d58000-03d58fff 005
...
...
...
// The. Data Segment of the first instance of mspaint.exe
0103c000-0103 cfff 0225f000-0225 FFFF 047
0103d000-0103 dfff 03620000-03620fff 047
0103e000-0103 efff 03c1e000-03c1efff 047
...
...
...
// The. rsrc segment of the first instance of mspaint.exe
0103f000-0103 FFFF 01652000-01652fff 025
01040000-01040fff 02653000-02653fff 005
01041000-01041fff 003d4000-003d4fff 005
...
...
...
// The table in the page Directory of the first instance of mspaint.exe
C0300000-c0300fff 030fd000-030 fdfff 063
C0301000-c0301fff 017fe000-017 fefff 063
C0303000-c0303fff 0141f000-0141 FFFF 163
...
...
...
Ffd0f000-ffd0ffff 000ff000-000 fffff 023
Ffdf0000-ffdf0fff 0026a000-0026 AFFF 163
Ffdff000-ffdfffff 00269000-00269fff 163

// Relationship between the linear address and the physical address of the second running instance of mspaint.exe

Linear address range physical address range attributes
----------------------------------------------------
00010000-00010fff 03a6a000-03a6afff 047
00020000-00020fff 0352b000- 0352 bfff 067
0006d000-0006 dfff 03413000-03413fff 047
...
...
...
// The. Text Segment of the second instance of mspaint.exe
01001000-01001fff 00596000-00596fff 005
01002000-01002fff 03f97000-03f97fff 005
01003000-01003fff 03d58000-03d58fff 005
...
...
...
// The. Data Segment of the second instance of mspaint.exe
0103c000-0103 cfff 030df000-030 dffff 047
0103d000-0103 dfff 009a0000-009a0fff 047
0103e000-0103 efff 02089000-02089fff 047
...
...
...
// The. rsrc segment of the second instance of mspaint.exe
0103f000-0103 FFFF 01652000-01652fff 005
01040000-01040fff 02653000-02653fff 005
01041000-01041fff 003d4000-003d4fff 005
...
...
...
// Mspaint.exe table in the page Directory of the second instance
C0300000-c0300fff 037c9000-037c9fff 063
C0301000-c0301fff 02f8a000-02f8afff 063
C0303000-c0303fff 0141f000-0141 FFFF 163
...
...
...
Ffd0f000-ffd0ffff 000ff000-000 fffff 023
Ffdf0000-ffdf0fff 0026a000-0026 AFFF 163
Ffdff000-ffdfffff 00269000-00269fff 163

Zookeeper is obtained directly from the page Directory and page table of two instances running simultaneously in mspaint.exe. In fact, you only need to understand the mutual conversion between physical and linear addresses, you can obtain the above information by slightly modifying the two code segments mentioned above (the physical memory of a 32-bit X86 platform Windows 2000 Server build 2195 is 64 MB, (040000000h ).

In Windows, each process has its own private linear address, in the first 2g of the linear address (user space, Windows 2000 runtime. text and. the rsrc segments all point to the same physical address space. data segments point to different physical spaces. This is determined by the role and nature of different segments and is not difficult to understand. In the last 2 GB (kernel space, mmsystemrangestart of ntoskrnl.exe of Windows 2000 Server points out the starting position of its linear address), most of the physical space is shared by two instances, in fact, all the running routines of different programs share these 2 GB. Of course, except for other special operations such as page Directory (c0300000h) and page table (c0000000h), we can basically see this rule in the above example. Of course, the contents and tables also have a project that refers to the same physical environment, this kind of real-time sharing of physical memory, face mspaint.exe two instances page Directory C0303000-C0303FFF mapped linear address 00800000-00bfffff (4 m) pointing to the same physical area.

This only discusses how Windows uses memory efficiently between programs. In fact, Windows has many page-granting mechanisms, such as copy on write. text segments can also point to different physical addresses when needed. Typically, you can use a user-mode debugger (Microsoft Visual C ++'s ide debugging environment) to debug your application. Of course, windows also provides a way to share data in the same physical memory area, that is, using the Microsoft connector (Link) Section switch to assign specific segment sharing (s) attributes.

In <Analysis of Windows NT/2000 environment switching> (nsfocus magazine 12), I have introduced in detail the code for switching the page Directory base address (CR) after Windows NT/2000 environment switching, in Windows NT/2000, how does one allocate a page Directory and a page table to an application from scratch? Because page Directory and page table allocation are only involved when a new process is created, let's look at the createprocessw code in kernel32.dll (createprocessa indirectly calls createprocessw ).

You can simply use the following code to display the process:

Kernel32! Createprocessw
.
. (Some error routines, such as the existence of process files and Kernel Object Security Check)
.
; Open the file
001b: 77e7ddd2 call [NTDLL! Ntopenfile]
.
. (Mainly some parameter pressure stack code)
.

; Assign virtual addresses to executable files
001b: 77e7de0a call [NTDLL! Ntcreatesection]
.
.
.
; Close the file
001b: 77e7de1e call [NTDLL! Ntclose]
.
.
.
; Call ntcreateprocess to create a process
001b: 77e7df83 call [NTDLL! Ntcreateprocess]
.
.
.

In fact, the four processes of NTDLL. dll in Windows 2000 Server build 2195 are system services with the service ID of 64 h, 2bh, 18 h, and 29h. For more information about System Service, see <Windows NT/2000 Internal data structure> (nsfocus magazine 11 ).

Let's continue to look at the ntcreateprocess process:

: U ntdll! Ntcreateprocess // user State, which is commonly referred to as the native API
Ntdll! Ntcreateprocess
001b: 77f92d2c mov eax, 00000029
001b: 77f92d31 Lea edX, [esp + 04]
001b: 77f92d35 int 2E // use the interrupt gate to enter the core State
001b: 77f92d37 RET 0020

: Ntcall
Service table address: 804704d8 number of services: 000000f8
.
.
.
0029 0008: 804ad948 Params = 08 ntoskrnl! Seunlocksubjectcontext ++ 0514
|
| _ ENTRY address of System Service (ntcreateprocess) with ID 29h
.
.
.
: U 8: 804ad948
0008: 804ad948 55 push EBP
0008: 804ad949 8bec mov EBP, ESP
0008: 804ad94b 6aff push FF
0008: 804ad94d 6890354080 push 80403590
0008: 804ad952 682ccc4580 push ntoskrnl! _ Effect_handler3
0008: 804ad957 64a00000000 mov eax, FS: [00000000]
0008: 804ad95d 50 push eax
0008: 804ad95e 64892500000000 mov FS: [00000000], ESP
.
.
.
In the EBP-30, kpeb of the new process is stored at this time. The following statements are used to clear the 0a2h * 4 (648) bytes after kpeb.
0008: 804adaf5 b9a2000000 mov ECx, 201700a2
0008: 804 adafa 33c0 XOR eax, eax
0008: 804 adafc 8b7dd0 mov EDI, [EBP-30]
0008: 804 adaff f3ab repz stosd
.
.
.
0008: 804ad5e7 55 push EBP
0008: 804ad5e8 8bec mov EBP, ESP
Kpeb and process context are passed in as the first and fourth parameters of the Process respectively (EBP + 8 and EBP + 14 h)
0008: 804ad5ea 8bda-8 mov eax, [EBP + 08]
0008: 804ad5ed 8d4808 Lea ECx, [eax + 08]
0008: 804ad5f0 c60003 mov byte PTR [eax], 03
0008: 804ad5f3 89480c mov [eax + 0C], ECx
0008: 804ad5f6 c640021b mov byte PTR [eax + 02], 1B
0008: 804ad5fa 8909 mov [ECx], ECx
0008: 804ad5fc 8a4d0c mov Cl, [EBP + 0C]
0008: 804ad5ff 884862 mov [eax + 62], Cl
0008: 804ad602 8b4d10 mov ECx, [EBP + 10]
0008: 804ad605 89485c mov [eax + 5C], ECx
0008: 804ad608 8a4d18 mov Cl, [EBP + 18]
0008: 804ad60b 884864 mov [eax + 64], Cl
0008: 804ad60e 8b4d14 mov ECx, [EBP + 14]
0008: 804ad611 8b11 mov edX, [ECx]
; EdX stores the process context (that is, the physical address of the page Directory)
Similar to the contextalgorithm of the process, it is not only related to several variables in ntoskrnl.exe, but also related to the execution environment. If you are interested, use SoftICE to follow
; Put the process context in the new kpeb
0008: 804ad613 895018 mov [eax + 18], EDX; 18h is the offset of process context to kpeb
.
.
.
The following describes how to insert the newly created process kpeb into the two-way linked list of the system kpeb.
0008: 804add22 a184a14680 mov eax, [8046a184]

/*
8046a180 the output result can be easily understood by using the following two SoftICE commands
(@ 8046a180)-A0
Fe4e1d60 4266532192 (-28435104) "comment '"
@ Psinitialsystemprocess // display the kpeb of the system process
Fe4e1d60 4266532192 (-28435104) "comment '"
That is, the new kpeb chain is inserted to the end of the existing linked list.
After kpeb is inserted, the system can schedule the process according to the page directory provided above (that is, the new process has a new private process space)
*/

0008: 804add27 8b4dd0 mov ECx, [EBP-30]
0008: 804add2a c781a000000080a14680 mov dword ptr [ECx + 000000a0], 8046a180
0008: 804add34 8b4dd0 mov ECx, [EBP-30]
0008: 804add37 8981a4000000 mov [ECx + 000000a4], eax
0008: 804add3d 8b4dd0 mov ECx, [EBP-30]
0008: 804add40 81c1a0000000 add ECx, 000000a0; can we find the offset between the chain structure and kpeb?
0008: 804add46 8908 mov [eax], ECx
0008: 804add48 8b45d0 mov eax, [EBP-30]
0008: 804add4b 05a0000000 add eax, 000000a0
0008: 804add50 a384a14680 mov [8046a184], eax

To be more intuitive, the above Code is listed according to the system execution process (relative to the actual disk storage order ). in fact, at the beginning of the process creation, the system first uses obcreateobject to create a section Kernel Object (section object does not allocate physical memory, Windows 2000 DDK partition process analysis, after all, IDA provides a clear understanding of the Code process. For the process context and system kpeb two-way linked list mentioned above, see <Analysis of Windows NT/2000 environment switching>, where I have provided a detailed description.

After in-depth analysis of Windows APIs related to memory operations (such as virtualallocex, createfilemapping, and heapalloc), you can find many important information. For example, tracking ntoskrnl! Ntcreatesection (Windows finally calls this function when loading executable files and createfilemapping), you can find out how the mechanism such as copy on write is implemented, and so on. Windows 2000 supports multiple page files (named pagefile. sys), which involves many mechanisms such as the prototype PTE (expressed by protopte in sofice) and the PTE used in paging files, we also need to have a good understanding of FSD in the hierarchical driver of Windows 2000. simply put, we will only discuss the case where the P bit in PTE is 1. Okay, or that sentence, the above analysis, if there is a mistake, also hope to point (tsu00@263.net )!

References:
1. Jeffrey Richter
<Programming applications for Microsoft Windows, Fourth Edition>
2. Intel Corp <intel architecture software developer's manual, Volume 3>
3. <unauthenticated ented Windows NT> with source code
4. Windows 2000 DDK documentation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.