Discussion on Windows 2000/XP pagefile Organization Management

Source: Internet
Author: User

From: http://www.cnblogs.com/Sonic2007/archive/2008/07/08/1238167.html

At any time, the system memory resources are inferior to the disk space. Because of the virtual memory mechanism, we can have relatively rich Address Resources (generally 32bit virtual addresses can have 4G addressing space ), these resources are generally more than enough for the physical memory. Therefore, in modern operating systems, some policies are always used when the system is relatively tight, such as FIFO and LRU, to place some pages in the physical memory into relatively inexpensive disk space resources. A common UNIX system uses a single partition, namely, swap partition. In this regard, Windows only uses a common file, usually named pagefile. sys, which is located in the root directory of each partition. Because it is restricted by the PTE used for pagefile (4 bit is used in PTE to identify pagefile operations), Windows supports up to 16 pagefiles. sys.

As described above, pagefile. sys itself is a special file, and its size is scalable according to the system situation. Generally, we can set it using the "System" Applet of "Control Panel. Due to its special nature, Windows will perform operations on each pagefile in the startup phase. sys creates the corresponding file_object and sets the sharedread field to false. In the system process, each file_object has a handle pointing to each other, which only allows the system to operate on it, avoid misoperation such as deletion.

To manage pagefile. sys, Windows has an array of 16 characters used to organize pagefile. sys. Each member corresponds to a pagefile. This array is pointed to by the system Variable matrix file. Each member is a structure pointing to the matrix file. The structure is in the following format:

+ 0x000 size: uint4b
+ 0x004 maximumsize: uint4b
+ 0x008 minimumsize: uint4b
+ 0x00c freespace: uint4b
+ 0x010 currentusage: uint4b
+ 0x014 peakusage: uint4b
+ 0x018 hint: uint4b
+ 0x01c highestpage: uint4b
+ 0x020 entry: [2] ptr32 _ mmmod_writer_mdl_entry
+ 0x028 bitmap: ptr32 _ rtl_bitmap
+ 0x02c file: ptr32 _ file_object
+ 0x030 pagefilename: _ unicode_string
+ 0x038 pagefilenumber: uint4b
+ 0x03c extended: uchar
+ 0x03d hintsettozero: uchar
+ 0x03e bootpartition: uchar
+ 0x040 filehandle: ptr32 void

With this structure, we can easily get the usage of the corresponding pagefile (maximumsize, minimumsize, freespace, currentusage, peakusage, see windbg! VM command), and its corresponding file_object. In addition, through the deviceobject and VPB fields of file_object, we can know the partition of the pagefile and the file system used by the partition. Here we will detail the bitmap members.

Bitmap is an rtl_bitmap structure, which is defined in ntddk. h:

typedef struct _ rtl_bitmap {
ulong sizeofbitmap; // Number of BITs in Bit Map
Pulong buffer; // pointer to the bit map itself
}rtl_bitmap;

Similar to the page box database (PFN database) and the virtual memory (x86 Platform page_size 4 K), Windows also splits pagefile into 4 K blocks, which are called one page, the status of each page is specified by 1 bit corresponding to bitmap. 1 is occupied, and 0 is idle. You can use functions such as rtlfindclearbits or rtlfindclearbitsandset to operate bitmap to find unused pages of these files. Although bitmap indicates that Windows is usually 4 K in use, to improve performance, Windows usually writes pagefile in 64 K (mmmodifiedwriteclustersize pages) at a time ). There is also mmmod_writer_mdl_entry, which will be explained later when I mention the relevant content below.

Use windbg to digest the above discussion:

Kd> dd MMP agingfile L 10 // The output result shows that two pagefiles are set on the machine.
80547020 80d2af80 feec1548 00000000 00000000
80547030 00000000 00000000 00000000 00000000
80547040 00000000 00000000 00000000 00000000
80547050 00000000 00000000 00000000 00000000
Kd> dd @ $ p l 40 // The first pagefile.
80d2af80 00006400 0000c800 00006400 00000c38
80d2af90 running 57c7 running 57c7 00000000 00000000
80d2afa0 feea1cb8 feea1c6 fecbb000 feddc428
.
.
.

kd> dd feddc428 L 4 // you can easily obtain the file object (offset 0x2c) from the above matrix file ).
feddc428 00700005 80ecf2f0 80ecf268 fee66c10

Kd>! Devobj 80ecf2f0 // The structure of afile_object is given in ntddk. h. The third DWORD is device_object.
Device object (80ecf2f0) is:
Harddiskvolume2 \ driver \ ftdisk driverobject 80d97030
Current IRP 00000000 refcount 1316 type 00000007 flags 00001150
VPB 80ecf268 DACL e13d1484 devext 80ecf3a8 jwbjext 80ecf490 dope 80ecf210 devnode 80d95bd0
Extensionflags (0000000000)
Attacheddevice (upper) 80d954b8 \ driver \ volsnap
Device queue is not busy.

In addition, the fourth DWORD (fee66c10) of file_object is the VPB structure. You can use it! I will not list the analysis of VPB here.

Through the above windbg analysis, we have basically learned a little about pagefile. Next we will transfer it to the memory sub-series and I/O sub-systems (call FSD) to organize and manage pagefile.

Generally, a process is visible to a virtual address, and a virtual address is accessed. For non-existing addresses (for x86, that is, the p bit of its Pte is 0 ), by triggering hardware interruptions (x86 is int e), software will parse these PTE, such as the prototype PTE (I will introduce it in detail in Windows 2000/XP prototype PTE ), or transition PTE (transition PTE, some pages become usable pages due to work set trimming and other reasons, but the content of these pages is still valid for these processes and can be reused at any time, therefore, Windows uses the transition term to distinguish it from the pure free or zeroed list. I mentioned the PFN list in resolution winndows 2000/XP physical memory management, in fact, there is also a corresponding PTE pointing to the corresponding pagefile. sys to complete the parsing (miresolvepagefilefault) and handle page errors (through iopageread, which will be described below ).

So before continuing the discussion, we will introduce pagefile PTE in the following format:

valid: POS 0, 1 bit
pagefilelow: POS 1, 4 bits
Protection: POS 5, 5 bits
prototype: POS 10, 1 bit
transition: POS 11, 1 bit
pagefilehigh: POS 12, 20 bits

For prototype PTE and transition PTE, there is always 1 bit used to identify the corresponding Pte. For the prototype field above, but for pagefile PTE, there is no corresponding recognition bit. In fact, midispatchfault (called by kitrap0e ), callback is called only after prototype PTE (miresolveprotoptefault), transition PTE (conflict), and Failover are parsed. Of course, during the miresolveprotoptefault process, miresolvepagefilefault is finally called.

Assume that we access a page that currently resides in pagefile. After midispatchfault is used and the control is transferred to miresolvepagefilefault, it will index the matrix Array Based on the pagefilelow of Pte to determine which pagefile the page is located. in sys, because pagefilelow is 4 bits, Windows supports up to 16 pagefiles. SYS. In this way, the memory subsystem extracts the file_object of the pagefile from the page file structure described in the matrix File Based on the index (as described above ). With the Offset Value of pagefile. sys specified by pagefilehigh, miresolvepagefilefault notifies midispatchfault to call iopageread to obtain this page by returning a special ntstatus value of 0xc0033333. The iopageread prototype is as follows (defined in ntifs. h ):

Ntkernelapi
Ntstatus
Iopageread (
In pfile_object fileobject,
In pmdl memorydescriptorlist,
In plarge_integer startingoffset,
In pkevent event,
Out pio_status_block iostatusblock
);

Of course, before calling iopageread, the memory manager must allocate a physical page and call miremoveanypage to free up space when necessary. Then, call miinitializereadinprogresspfn to set this page to readinprogress, then, point the MDL parameter memorydescriptorlist required by iopageread to this page. The virtual address field of MDL is the virtual address mapped to the page read by iopageread, that is, the page error that meets our previous assumptions.

Iopageread actually uses allocate as an IRP, direct_io (the MDL we provide), and then sets a complete routine to cancel the readinprogress status before reading the page, call the IO sub-system to call the corresponding file system driver through iocalldriver (usually determined by the VPB parameter of file_object). How does FSD read pagefile. sys, we will not discuss it here. ntifs provides fastfatSource codeIs the learning direction.

It should be noted that iopageread is a synchronization operation, that is, it can be processed only after the page is read. This is also the main reason that midispatchfault can only run under dispatch_level IRQL. Iopageread synchronizes data by the irp_synchronous_paging_io flag of the IRP allocated by the device. In addition, it also sets the irp_paging_io and irp_nocache flags for special communication requirements with FSD.

Due to the need for working set trimming, some pages are put into pagefile through the mimodifiedpagewriter (MPW) thread. MPW uses an entry of the _ mmmod_writer_mdl_entry type in the structure of Matrix. _ mmmod_writer_mdl_entry is used not only by mimodifiedpagewriter, but also by mimappedpagewriter (for mapped file ), therefore, the _ mmmod_writer_mdl_entry structure not only contains MDL members, but also control area. I will not list the structure of the table. Through ioasynchronouspageread, MPW writes a page to pagefile according to the mmmodifiedwriteclustersize described above. The IRP flag used by ioasynchronouspageread is irp_paging_io and irp_nocache, indicating that it is asynchronous. It can also be seen from his name that, unlike another related process provided by windows, iosynchronouspagewrite, is synchronous.

Here, we should have some basic impressions on organization management of page files. It should be pointed out that for iopageread, it is not only for pagefile, but also for mapped file and mimodifiedpagewriter. If it is not to avoid deadlocks, it will not distinguish mimappedpagewriter, in fact, the Windows internal memory manager uses the same management for pagefile and mappedfile, while FSD is only slightly different. Therefore, in combination with the concepts I previously introduced, such as control area, you can refer to this article to understand mapped file. In the same sentence, I hope to give you some advice on the error.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.