Find out the pool allocation process for Windows 2000/XP
WebSphere (http://webcrazy.yeah.net)
For Driver writers, the most annoying thing is the use of various memory buffers (when talking about buffers, you may also think of concepts such as MDL, in fact, MDL only organizes the page frame number of the pool specified by startva ). Heap is involved in the use of small pieces of sporadic memory in the user mode. Windows 2000/XP provides the same mechanism in the core mode for the memory requirements of this Part of Kernel Mode modules, which is different from the user mode, we call this a pool ). Some features of kernel modules and user modules are omitted. For example, a pool cannot be directly accessed by a user module, heap is process-related, and pool is system-related, the pool and heap are similar in terms of organization management. This article focuses on some preliminary analysis of the pool organization.
We know that Windows generally has two types of pools: Non-pagedpool and pagedpool ). Some types are distinguished based on whether aligned, mustsuccess, and so on. By ntddk. in H, pool_type is defined as an enum. For nonpagedpool, pool_type is always an even number (for example, nonpagedpool is 0), while pagedpool is an odd number (for example, pagedpool is 1 ). Generally, we allocate a memory area which is completed by the kernel routine exallocatepool (withtag). They accept a pool_type parameter and a allocation size parameter, when we allocate a space greater than page_size (more accurately, more than PAGE_SIZE-0x10, page_size is 4 K in x86) from the pool, the results of the assignment are always page aligned (located at the page boundary), and if it is smaller than page_size (more accurately less than the PAGE_SIZE-0x10 ), allocation is always on a page and always 8-byte alignment. This is critical for the following descriptions.
Windows reserves different virtual memory areas for non-Paging pool and paging pool for pool allocation. The kernel variables (from 0xe000000) specify the range of the paging pool. The space of the non-Paging pool is composed of two areas. The general area starts from mmnonpagedpoolstart and an expansion area, which is specified by mmnonpagedpoolexpansionstart and mmnonpagedpoolend. For Driver developers, we can easily determine non-Paging pool and paging pool based on these segments.
For the paging pool and non-Paging pool, Windows is defined by a pool_descriptor (in fact, Windows usually has another pool_descriptor, such as the session space session, which will not be discussed in this article ), it is defined as follows:
+ 0x000 pooltype: _ pool_type
+ 0x004 poolindex: uint4b
+ 0x008 runningallocs: uint4b
+ 0x00c runningdeallocs: uint4b
+ 0x010 totalpages: uint4b
+ 0x014 totalbigpages: uint4b
+ 0x018 threshold: uint4b
+ 0x01c lockaddress: ptr32 void
+ 0x020 pendingfrees: ptr32 void
+ 0x024 pendingfreedepth: int4b
+ 0x028 listheads: [512] _ list_entry
For non-Paging pool and paging pool pool_descriptor, which are organized by the system variable poolvector array, we can use poolvector [pool_type & 1] to obtain the corresponding pool_descriptor. Note that there is a 512-length bidirectional linked list at the end of pool_descriptor. For a pool smaller than one page, because the pool we allocate is 8 bytes aligned, and the allocation result is always on one page, therefore, an array of two-way linked lists with a length of 512 is used. Each linked list organizes the idle areas of the allocated pages in the system into two-way linked lists with a size of 8 to 4096 respectively. For consecutive idle areas on one page, the system always places the largest linked list as much as possible. For example, the 16-byte areas are always placed in listheads [1, instead of inserting two nodes into listhead [0. With such a linked list, the system can easily manage pool allocation. If the size is greater than page_size, the system may involve re-allocating system PTE, which will be described after I discuss the allocation of 4 K bytes.
Before discussing these small pools, we must first understand lookaside. The difference between lookaside and the general pool is that it is only used to allocate a fixed size memory pool. Exinitializenpagedlookasidelist and exinitializepagedlookasidelist are used to initialize lookaside. For related routines for allocating memory pools from lookasidelist, see the DDK documentation. Lookaside is much faster than the usual pool allocation because of the fact that the spinlock synchronization is not considered and a fixed size of space is allocated. It can be understood that when we use exinitializenpagedlookasidelist to initialize lookaside, we can provide allocate_function and free_function parameters for allocating large memory areas (exallocatepool and exfreepool are used by default), and then perform sporadic allocation. In view of Speed considerations, the Windows execution body creates a lookaside for nonpagedpool and pagedpool every 8 bytes from 8 to 256 bytes, located in kprcb (I analyzed the exallocatepoolwithtag, in Windows XP build 2600, they are located in the offset 0x598 and 0x698 respectively). For allocatepool smaller than (0x20 8 bytes, that is, 256 bytes ), the execution body function exallocatepool is allocated directly from these lookaside. As for the depth of lookaside, the memory manager regularly uses kiadjustlookasidedepth for adjustment.
Now we want to allocate a pool between 0x100 (256 bytes) and 0x1000 (4 K, page_size). Suppose we allocate 0 x B18 bytes, in fact, the system will allocate 0xb20 bytes, and the eight more bytes are pool_header structure, which is used for pool management. The structure of pool_header is as follows:
+ 0x000 previussize: POS 0, 9 bits
+ 0x000 poolindex: POS 9, 7 bits
+ 0x002 blocksize: POS 0, 9 bits
+ 0x002 pooltype: POS 9, 7 bits
+ 0x000 ulong1: uint4b
+ 0x004 processbilled: ptr32 _ eprocess
+ 0x004 pooltag: uint4b
+ 0x004 allocatorbacktraceindex: uint2b
+ 0x006 pooltaghash: uint2b
For the pool in this range, the system will find the corresponding pool_descriptor from poolvector Based on pool_type. We can easily figure out that the system will look for pooldescriptor. listheads [0x164] bidirectional linked list, because 0xb20/8 is 0x164. If this linked list is not empty, the system will obtain a node of this linked list, representing a byte idle pool with a size of 0xb20. If this is an empty linked list, the system locates the next two-way linked list after 0x165, and finds an idle block between 0xb28 and 0x1000. After the remaining space is allocated, the system inserts the corresponding size into the list_entry (fragment merging should be considered here) and waits for the next allocation. Of course, if the space still does not meet the corresponding conditions, the system will use miallocatepoolpages to allocate the full page memory pool.
If you execute miallocatepoolpages, the system first determines whether a paging pool is allocated or a non-Paging pool. If the non-Paging pool is used, the system first looks for the mmnonpagedpoolfreelisthead, which is the same as the listheads of pooldescriptor, mmnonpagedpoolfreelisthead is also a two-way linked list array with four elements, it represents the idle list of non-Paging pool on pages 1 to 4, respectively (the free space on pages 4 and above is also stored in the fourth array, which requires additional judgment, which is not described here ). This is a linked list maintained by the system memory manager. miallocatepoolpages first obtains the Free List from this list. If found, modify the PFN database. If not, reserve the system PTE again, which will be discussed later. The paging pool is another case. It involves another important structure of the system: mm_paged_pool_info, which is defined as follows:
+ 0x000 pagedpoolallocationmap: ptr32 _ rtl_bitmap
+ 0x004 endofpagedpoolbitmap: ptr32 _ rtl_bitmap
+ 0x008 pagedpoollargesessionallocationmap: ptr32 _ rtl_bitmap
+ 0x00c firstpteforpagedpool: ptr32 _ MMP te
+ 0x010 lastpteforpagedpool: ptr32 _ MMP te
+ 0x014 nextpdeforpagedpoolexpansion: ptr32 _ Matrix
+ 0x018 pagedpoolhint: uint4b
+ 0x01c pagedpoolcommit: uint4b
+ 0x020 allocatedpagedpool: uint4b
Firstpteforpagedpool is the first pte location of the pagedpool virtual address. It is usually the PTE location of the MMP agedpoolstart (0xe000000), that is, at (0xe000000>) & 0x3ffffc-0x40000000 = 0xc0384000. Similarly, for lastpteforpagedpool, it is easy to obtain the value through the matrix endpoint. Miallocatepoolpages are used to allocate pages through several rtl_bitmaps such as pagedpoolallocationmap. Rtl_bitmap I have introduced in detail in Windows 2000/XP pagefile organization management. I also said that rtlfindsetbitsandclear and rtlsetbits are used to find the corresponding idle bits, this method is also used for the same miallocatepoolpages.
Next, we will talk about the situation where we will call miallocatepoolpages for a pool with a size greater than page_size (0x1000 bytes, actually greater than 0xff0 bytes, and pool_block_header occupies 16 bytes), which is consistent with the above description. For the paging pool, the system has initialized the PTE in the initialization phase and points to pagefile. sys, although because of pagefile. the sys size can be automatically expanded, but this only involves the initialization of the extended matrix software (the PTE pointing to the page file. For details, refer to "Windows 2000/XP pagefile organization management"). the rest is only the rtl_bitmap bit operation. The fault operation that appears when accessing these pages is the task of int e (in x86. Therefore, I will only focus on non-Paging pools.
For non-Paging pools, we have already discussed how to get the page directly from mmnonpagedpoolfreelisthead, but if we cannot get the page from the above mentioned mmnonpagedpoolfreelisthead, the system must call mireservesystemptes to apply for system PTE, and then call mireservealignedsystemptes. After the system Pte is retained (the page size is obtained from the allocated size), we must also call michargecommitmentcantexpand, then the operations on the PFN database are involved.
Mireservealignedsystemptes is used to retain system PTE, which is actually the process of allocating system virtual addresses. He uses mmfirstfreesystempte to find the idle virtual address of the specified paging pool (for example, 0xfb2b6000) and obtains the PTE of this address relative to mmsystemptebase (in Windows XP, the value of mmsystemptebase is 0xc0000000, the result is 0xc03ecad8). Based on the number of idle pages stored by this PTE address (I don't know why, Microsoft stores this data here), assign the virtual address from the end of the virtual address. Note that this is the end address, so you do not need to update mmfirstfreesystempte. You only need to reduce the number of free pages stored in the PTE location, instead of changing the location.
After obtaining the virtual address, we need to allocate the actual physical address to meet this call. This is usually done by michargecommitmentcantexpand. If necessary, he will call miremoveanypage and fill in the system PTE that has just been retained to complete this allocation.
Actually, I have explained most of the calls to exallocatepoolwithtag (exallocatepool only transfers an exallocatepoolwithtag with the tag 'None. As for the release of the pool, that is, the exfreepool process, with this knowledge, it is not difficult to analyze it. Exallocatepoolwithtag is much more complex than other routines. Many factors provided by pool_type, such as align or mustsuccess, must be considered. In addition, for debugging purposes such as Driver Verifier or poolmon, he also needs to consider pool track (expinsertpooltracker) and special pool. I just want to outline these basic processes, but this may lead to a lot of hard-to-understand aspects, involving a lot of knowledge I have previously written about PFN databases, PTE, and so on, what's more, I am not able to grasp this part of content to the point where I can clearly explain it to everyone. This article is even a reference.