64-bit system detours

Source: Internet
Author: User

I assume that the reader is very familiar with detours. Read this article only to enhance the understanding of detours and to implement x64 hook. I will not talk more about the detours principle.

X86 kernel hook
Earlier in the year, I transplanted detours1.5 to the x86 core layer and worked well. I used it to hook some internal letters of the system.
Number, sometimes used for Hook
Export functions such as iocreatefile. It is not difficult to make detours1.5 stable at the core. C/C ++ may be difficult, but it can be solved soon. Wei
One thing to note is that detours1.5 uses virtualprotect to make the memory read_write_execute. There are two methods at the core layer. The first method is group.
Cr0 is favored by the public. The second is to call the native API on the core layer to do virtualprotect.
The detours method has some obvious advantages over the import/export method. The biggest advantage is that it can be used to hook internal functions. Since the Hook method directly modifies the function body, it is difficult to bypass the hook no matter how the caller plays tricks.
The main disadvantages of detours are as follows:
1. detours x86 cannot hook functions smaller than 5 bytes
2. detours x86 requires a complete anti-assembler and interpreter. In fact, the detours Code does not include this. Therefore, if you need to write a function to block others' hooks, you can write it like this:
Proc near
XOR eax, eax
Jeax 1
INT 3
... // Do something
Proc end
Note
This JMP here, because eax must be 0, so the int3 will not be called, and the code that has been tested by detours will probably go up to int3, in order to let the detours
If the Code does not go to int3, detours must be able to parse the meaning of the first three lines of code and correct jeax 1 to jeax.
1 + (trampoline-function ). Similar technologies can also be used to cheat detours.
3. detours x86 cannot process the following functions:
Proc near
Flag:... // The first five bytes of the Function
... // Do something
JMP flag
... // Do something
Proc end
This function has a JMP in the execution body and jumps to the first five bytes. However, after the function is replaced by detours, the first five bytes of the function are modified and changed to JMP trampoline. To enable detours to handle this operation, the entire function body must be parsed through disassembly, And the JMP flag must be modified in two ways described.

To sum up the above, the detours idea is good, but there are defects. To solve these defects, we need a complete anti-assembler.

X64 kernel hook
Recently, I have a requirement to implement a similar hook module in x64. I found detours2.1 and sent an email to ms. The reply from MS is that 64-bit detours2.1 requires 10000 USD.
So I deleted the MS email and started to do it myself. Let me give a general idea about the principles and precautions.

X64 hook and x86 hook are similar in principle and are used to modify the first address of the original function. The difference is that x64 does not exist.
JMP
64_address: In x86, it must be JMP [64_address] and the corresponding assembly code is no longer E9.
XXXXXXXX, but ff15
[XXXXXXXX], where XXXXXXXX stores a 64_address. Note that XXXXXXXX is still 32-bit, so the memory must be
Function is in the same 4G.

This limit is not a big problem for common code compilation, because few EXE files exceed 4 GB. Therefore, the code generated by the compiler still uses E9.
XXXXXXXX. For the import DLL, it is usually called
[XXXXXXXX]. It used to be like this. The difference is that [XXXXXXXX] previously pointed to a 32-bit address and now to a 64-bit address. In this way, the DLL is loaded
The location and the location of the EXE are not in the same 4G.

For detours, what is affected by the features described above is that trampoline is usually located in heap memory/nonpaged
Pool, new_function is located in the DLL/driver of the code we write, and old_function is located in the module where we need to hook. Here
There is a basic contradiction between the new_function and the old_function are usually in two different DLL or. sys, respectively, and the system may load them to a very long distance.
Far space, I .e. ABS (new_function-old_function)> 4G. In this way, E9 cannot be used.
XXXXXXXX, instead of ff15
[XXXXXXXX], and XXXXXXXX is a 32 offset, so [XXXXXXXX] cannot be located in our dll/sys.

Based on the above analysis, the following algorithm can be obtained:
1. Find the function address to hook.
2. parse the generation of at least 6 + 8 = 14 bytes starting from the start address of the Function
. The Code cannot be disconnected. The above two processes are the same as detourx86. The difference is that detoursx86 requires E9.
XXXXXXXX, that is to say, only 5 bytes are needed, and we must use ff15
[XXXXXXXX]. If the function body is smaller than 14 bytes, this means that the function cannot be detours.
However, if the function is smaller than 14 bytes in size because a call or JMP is executed, parse the Code, set the start address of the function to the address after JMP, and repeat the process.
3. Copy these 14 or 15, 16... bytes to the pre-allocated memory. We call it trampoline.
4. Change the first six bytes to ff15 [0], that is, ff15 00000000.
5. Save the start address of new_function in the subsequent 8 bytes.
6. Modify the 14-byte code in trampoline. If there are jump statements such as JMP and call, modify the offset, and then jump across 4G, follow the above method to modify it, the number of trampoline bytes may increase.
7. Insert ff15 [0] After the trampoline code, and fill in old_function + 14 in the subsequent 8 bytes.

Trampoline can be pre-allocated with a 100-byte buffer. during initialization, all of the buffer values are filled with NOP. During the 7-byte period
Enter ff, 15, 00, 00, 00, 64_bit_old_function + 14 (15, 16...) at the position of-14 ...).

The disadvantage of the above algorithm is the same as that of X86 detours. The first function cannot hook a function with a size smaller than 14 bytes.

14 bytes are quite large, and sometimes this defect is intolerable. To this end, we will introduce a more dirty approach.

When the code is loaded into the memory, there is usually a lot of waste space, that is, in these spaces, only NOP, or will never be executed. Ida can be used to find these spaces. If it can be found to be large enough
If you can save a 64-bit address space, you can only change the first five bytes to JMP.
[XXXXXXXX]. At the same time, only five bytes are copied to trampoline. The bottom 14 bytes of trampoline are the same.

The above is the detours process under x64.

There is a problem to be aware of under x64, vc8 does not support the _ ASM keyword under x64, so
_ ASM {
CLI
MoV eax, Cr0
And eax, not 1000 h
MoV Cr0, eax} cannot be reused
Instead
_ Disable ();
Uint64 Cr0 =__ readcr0 ();
Cr0 & = 0 xfffffffffffffeffff;
_ Writecr0 (Cr0 );
Of course, native APIs can still be used, but the above method is concise and popular among the masses. For functions such as _ disable, refer to the latest version of msdn.

I don't know anything about IA64.

By the way:
1. Run the em64t CPU.
However, VMWare cannot install/run win64os on the em64t CPU. While amd64
Even if Win32 OS is installed on the CPU, you can install/run win64os in VMware on it.
2. SoftICE has stopped development and does not support x64. It is only supported in virtual mode. Since it has stopped development, we recommend that you use windbg.
3. The idapro 5.0 deserializes x64 code, with hundreds of errors and troubles. Basically, U and C are required first.

Because the 14-byte limit is too large, it is always uncomfortable. Then I came up with a solution.

If the original function is old_func and the new function is new_func, some technical methods are used to limit the allocated memory and
Old_func is in the same 4G. It can be implemented through virtualalloc. The specific method can be to change the first parameter multiple times and call virtualalloc until the returned value is not
Is null.

In this way, the logic of detours is changed:

1. First copy the first five bytes of old_func to trampoline + 14, and then change it to JMP offset, that is, E9 trampoline-5-old.
2. the first 6 bytes of trampoline are ff15 [0], and the next 8 bytes are new_func_address.
3. The five bytes after trampoline + 14 + 5 are JMP (trampoline + 14 + 5 + 5-(old_func_addr + 5 ))

In this way, the JMP offset will be first executed to trampoline when old is called, and the trampoline and JMP will be directed to trampoline + 14 when new_func calls old, execute the first five bytes of the original data, and then JMP will use the original function body.

So everything is perfect.

This article from the csdn blog, reproduced please indicate the source: http://blog.csdn.net/zzw315/archive/2009/04/23/4102488.aspx

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.