Explore Windows NT/2000 copy on write mechanism (http://wecrazy.yeah.net)

Source: Internet
Author: User
Explore the Windows NT/2000 copy on write Mechanism
WebSphere (http://wecrazy.yeah.net)

The copy on write mechanism is a typical Implementation of lazy evaluation. This mechanism is widely used in modern operating systems such as Windows NT/2000 and memory management in Unix/Linux. This article makes an in-depth analysis on the copy on write machine in Windows NT/2000, aiming to explore several important data structures of the Windows NT/2000 kernel-mode Memory Manager, before continuing the following discussions, you must understand certain terminologies such as PDP/PTE and VaD (see my previous Windows NT/2000 paging mechanism and Analysis of Windows NT/2000 heap memory and Virtual Memory organization). ), in addition, I will discuss several other terms related to the memory subsystem, such as control area, subsection, working set list, and page frame database. The code I used to analyze the copy on write mechanism in this article is listed below. All the descriptions in this article are based on this code.

// Cow. c
// Writed by chenchengqin (tsu00@263.net)

# Include <stdio. h>
# Include <string. h>

# Define bufsize 10

# Pragma data_seg (". seg_cow ")
Char data [bufsize] = {'A', 'A ', 'A', '/0 '};
# Pragma data_seg ()
# Pragma comment (linker, "/section:. seg_cow, RWC ")

Void main (INT argc, char * argv [])
{
Int I;
If (argc> 1 ){
// Memset (data, 'B', bufsize );
_ Asm int 3;
For (I = 0; I <bufsize; I ++)
Data [I] = 'B ';
Data [BUFSIZE-1] = '/0 ';
}
Printf ("% s", data );

Getchar ();
}

A very simple piece of code is compiled using the following command:

CL/Zi/FA cow. c

Run an instance (do not exit ):
C:> cow
Aaaaaaaaaa

Use softiceto view the distribution of cow.exe images in the memory:

: ADDR cow
: Map32 cow
Owner OBJ name OBJ # address size type
Cow. Text 0001 001b: 00401000 listen 6bb8 code Ro
Cow. RDATA 0002 0023: 00408000 000005c8 idata Ro
Cow. Data 0003 0023: 00409000 running 2ec4 idata RW
Cow. idata 0004 0023: 0040c000 00000633 idata RW
Cow. seg_cow 0005 0023: 0040d000 0000010c idata RW
Cow. reloc 0006 0023: 0040e000 000006b2 idata Ro

// View the PTE at the first address of the. seg_cow segment (virtual address 0x40d000). For details, refer to "Windows NT/2000 paging mechanism".
: Dd c0000000 + 1*1000 + D * 4 L 10
0023: c0001034 003f9225 00000000 00000000 00c3c067% .?......... G...
--------
|
| _Cow.exe process. seg_cow field, first address of the Pte

The page command of SoftICE can dump the attributes of this PTE, as shown below:

: Page 40d000
Linear Physical attributes
0040d000 003f9000 p a u r

However, the attributes listed here does not indicate. the copy on write attribute of seg_cow, because the low 12-bit (0-11) of the x86 PTE, that is, the attribute bit does not indicate this attribute, but Microsoft (of course not only Microsoft) the 12-bit Reserved System (OS) bit is used. The following figure shows the specific format of the DWORD Value in Windows 2000:

Struct _ hardware_pte_x86 (sizeof = 4)
+ 0 bits0-0 valid
+ 0 bits1-1 write
+ 1 bits2-2 owner
+ 0 bits3-3 writethrough
+ 0 bits4-4 cachedisable
+ 0 bits5-5 accessed
+ 0 bits6-6 dirty
+ 0 bits7-7 largepage
+ 0 bits8-8 global
+ 0 bits9-9 copyonwrite
+ 0 bits10-10 prototype
+ 0 bits11-11 Reserved
+ 0 bits12-31 pageframenumber

The read-only and copyonwrite attributes of 0x40d000 can be seen from the above PTE value 003f9225. The read-only attribute is the guarantee for copyonwrite, so that x86 can try to write data in another cow instance. in the seg_cow segment, raise a 0eh (page fault) interrupt (TRAP) and let windows handle this copy on write operation. Based on this principle, I am running another cow instance and tracking the copy on write mechanism:

C:> cow 1 (set a parameter so that cow tries to update the. seg_cow segment)

In the code, I set the INT 3 command to enable SoftICE to activate when the i3here is set to on. Then, use the bpint e command to enable SoftICE to capture 0eh interruptions. I386 saves the virtual address with a page fault in the Cr2 register when raise 0e is interrupted. The entry to troubleshooting Windows 2000 page, kitrap0e, is to create a cow process based on this address. seg_cow, replace the working list of the second cow instance, which only refers to the original page of the process working set, which does not involve the system working set. I am not going to list the assembly code of kitrap0e here. I just want to talk about the steps for Windows to find the properties of the address specified by CR2, which is implemented by miqueryaddressstate in Windows 2000. Kitrap0e also calls miqueryaddressstate.

1. First, check the pde of 0x40d000 (specified by CR2. Implemented by midoespdeexistandmakevalid.
2. Check the Pte.
3. query the page frame database (PFN, page frame database, specified by the kernel structure array Variable matrix database)
4. Search for the working set list entry (the base address of the working set is indicated by kernel Variable _ mmwsystemic) based on the working SET index in PFN. This step is completed by the kernel routine milocatewsystemic.
5. Search for the matrix value Array Based on the attributes of wsystemic (which is also the first 12 digits) to obtain the format that the user State can understand. That is, page_writecopy and page_readwrite defined in winnt. h.

Steps 3 to 5 actually convert physical addresses to linear addresses. Of course, this is the premise of this address present. This is why Windows 2000 uses such a complex and cumbersome structure to manage memory subsystems. Specifically, I didn't talk about pteaddress members of PFN (which can be seen in the i386kd output below). These are all pagefiles. sys, implemented through protopte, that is, the prototype PTE), the basis of shared memory blocks, etc. David Solomon's book "Inside Windows NT, 2nd Edition" has four PFN forms of specific structure and many other detailed descriptions.

The following is the analysis of i386kd. In fact, according to some kernel variables I have given above, SoftICE can also see something. However, it is far easier than i1_kd.

// Cow.exe process ID (clientid) = 4ac
Kd>! Process 4ac
! Process 4ac
Searching for process with cid = 4ac
Process ff605d60 sessionid: 0 CID: 04ac peb: 7ffdf000 parentcid: 04a0
Dirbase: 017bc000 objecttable: ff610f88 tablesize: 12.
Image: cow.exe
Vadroot ff624168 clone 0 private 37. Modified 0. Locked 0.
.
.
.
// Convert 40d000 to the physical address and obtain PFN.
// Use the page command in SoftICE
Kd>! Vtop 017bc000 40d000
--------
| _ Dirbase (see! Process output)
! Vtop 017bc000 40d000
PDI 1 pti d // output the PDE and PTE
0040d000 00a6e000 PFN (00a6e) // output PFN

// Query pfndatabase
// In SoftICE:
// Dd @ matrix database + a6e * 18 (each PFN occupies 0x18 bytes)
Kd>! PFN a6e
! PFN a6e
PFN 00000a6e at address ffb8ea50
Flink 00000097 blink/share count 00000001 pteaddress e151e1b4
Reference count 0001 color 0
Restore PTE 056b04b0 containing page 002af3 active P
Shared

/*
Check mmwlupus
In SoftICE:
: Dd @ mmwsystemic + 97*4 L 10
0023: c05028fc 0040df29 000006a0 00510c09 000009b0 .....
--------
| _ 0040d000 is PFn the virtual address of a6e?
*/
Kd>! Wlupus 4ac
! Wlupus 4ac

// Vmworkingsetlist member in kpeb (0f0 offset in Windows 2000 Server build 2195, with the value c0502000) indicates the working set list
// See "Windows 2000 kernel kpeb/kteb detailed structure"

Working Set @ c0502000
Quota: 2f firstfree: 22 firstdynamic: 4
Lastentry ad nextslot: 16 lastinitialized 257
Nondirect 1E hashtable: c06f3000 hashtablesize: 200
.
.
.

The above five steps have been clearly indicated by these i386kd commands. However, to understand copy on write, you must also have a basic understanding of the Section (User-mode filemapping) object. When ntcreatesection/ntopensection (createfilemapping/openfilemapping indirectly uses these routines) is used, Windows 2000 usually inserts a node in the self-balancing Binary Tree of VAD. (I have introduced VAD in detail in Analysis of Windows NT/2000 heap memory and Virtual Memory Organization ). SoftICE uses the query command dump VAD tree when there is a member mmci (memory management structure), which points to the control area. As follows:

// SoftICE output
: Query cow
Address range flags mmci PTE name
00010000-00010000 c4000001
.
.
.
00400000-0040e000 07100005 ff62fd48 e11ebf80 cow.exe
--------
| _ Control Area
.
.
.

Actually, we use i386kd! Memusage can dump all control areas in the system:

Kd>! Memusage
Loading PFN Database
Loading (99% complete)
Zeroed: 15 (60 KB)
Free: 0 (0 KB)
Standby: 1274 (5096 KB)
Modified: 686 (2744 KB)
Modifiednowrite: 1 (4 KB)
Active/valid: 14380 (57520 KB)
Transition: 11 (44 KB)
Unknown: 0 (0 KB)
Total: 16367 (65468 KB)
Building kernel Map
Finished building kernel Map

Control valid standby dirty shared locked pagetables name
.
.
.
Ff62fd48 32 0 0 0 0 0 mapped_file (cow.exe)
--------
| _ Control area is consistent with the above SoftICE mmci
.
.
.

The CA command of i386kd shows a control area structure:

Kd>! CA ff62fd48

Controlarea @ ff62fd48
Segment: e11ebf48 flink 0 Blink: 0
Section Ref 1 PFN ref 8 mapped views: 1
User ref 2 Subsections 7 flush count: 0
File object ff684288 modwritecount 0 system views: 0
Waitfordel 0 paged usage A0 nonpaged usage 120
Flags (0000a0) image file haduserreference

File:/desktop/hack/cow.exe

Segment @ e11ebf48:
Base Address 0 total ptes f nonextendptes: F
Image commit 5 controlarea ff62fd48 sizeofsegment: f000
Image Base 0 committed 0 PTE template: 54f1c30
Based ADDR 400000 protoptes e11ebf80 image info: e11ebfc0

Subsection 1. @ ff62fd80
Controlarea: ff62fd48 starting sector 0 Number of sectors 8
Base PTE e11ebf80 PTES in subsect 1 unused PTES 0
Flags 15 sector offset 0 protection 1
Readonly copyonwrite
.
.
.

The output result of the CA command shows that this control area has seven subsections. I have deleted some output results for limited space, you can compare all the results with the map32 command output of SoftICE. In control area, all subsection structures are organized in a linear structure. Each subsection occupies 0x20 bytes in Windows 2000 Server build 2195. So SoftICE can easily analyze all of these.

It should be noted that not only does the section object use the control area, but it is also used by the section_object_pointers structure in Windows 2000:

Typedef struct _ section_object_pointers {
Pvoid datasectionobject; // Control Area
Pvoid sharedcachemap;
Pvoid imagesectionobject; // Control Area
} Section_object_pointers;

Each file_object has a section_object_pointers member (see ntddk. h ). This mechanism is used in Windows 2000 to load executable files and file I/O operations (Do you see datasectionobject and imagesectionobject ?) . Only when you are familiar with these structures can you understand the copy on write mechanism. The rest is only for you to study more.

With regard to cow.exe, I still have two points to note:
1. Cow. c Use # pragma comment (linker, "/section :. seg_cow, RWC ") explicitly pointed out. the copy on write attribute of the cow_seg segment exists by default in Windows NT/2000. The ing of executable files and the readable and writable data are set to the copy on write attribute by default. You can use the vmmap validation Statement of Jeffrey Richter.
2. in Windows NT/2000, the executable file name (imagename) is used to identify multiple instances of this program using the copy on write shared memory. For executable files with different file names, even if the content is completely consistent, this mechanism does not work. Instead, replace it with the ing section object.

None of the above discussions found Microsoft's full ented. All of the above discussions were just obtained after my preliminary analysis of Windows 2000. What technical issues need to be discussed, welcome to enlighten me (tsu00@263.net )!

References:
1. David solomom inside Windows NT, 2nd Edition

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.