Inline hook in x64 windows

Source: Internet
Author: User

I have worked on ia32 inline hook before, and now I naturally extend em64t

In x64, the virtual address is changed to 64-bit, but the address and immediate number in most commands are still 32-bit. During execution, the symbol bit is extended to 64-bit.
Therefore, you cannot simply put a near jump with an offset at the beginning of the function, because the gap between the source address and the target address is likely to exceed 2 GB, which is a range that can be expressed by a 32-bit signed integer.
You can select either of the two methods
1: mov upl64, targetaddr
Jmp gprs 64
2: Push targetaddrlow
MoV [rsp-4], targetaddrhigh
RET
The first type of instruction is short, which occupies 12 bytes, but modifies the value of a general register. The second type only depends on the stack, which occupies 14 bytes.
The GPR saved by the caller has rax, rcX, RDX, r8-r11, the first method can choose these, I am using the second method

After the transfer instruction is placed at the beginning of the target routine, the target function will be transferred to the specified routine when it is executed. To call the original routine, the overwritten instruction must be executed first, at this time, there are two ways
1: overwrite the beginning of the target routine with the original Command, and then call this routine.
2: Execute the overwritten command in another location and transfer it to the overwritten command.

Obviously, the first method can only be used for routines without re-import, otherwise it is prone to problems. Locking cannot solve the problem, because the hook routine may have allowed multiple threads to execute at the same time, and the locking changes this feature.

Therefore, select the second method. To back up complete commands, you need to know the length of each command.
In 32-bit, many hotpatch functions start
MoV EDI, EDI
Push EBP
MoV EBP, ESP
This kind of PROLOG occupies 5 bytes. If the JMP of 0xe9 is placed in exactly 5 bytes, you can directly back up the first 5 bytes and add 5 to the address of the function after execution. However, the 64-bit commands are directly 12 or 14, and the prolog of the function can no longer provide complete 12 or 14 direct commands. Therefore, we need to calculate the length of each instruction.
The initial idea is to use a single-step interrupt, but it is not feasible because it is disturbed by the side effects of branch commands and commands.
The solution is to analyze the instruction structure by yourself. The instruction code can be found in the reference manual provided by Intel. Because programming is difficult to analyze the encoding of each instruction, I copied a binary encoding table from the instruction manual (requires a few corrections ), then, convert the number in the memory to a string that matches the table to know the instruction length. The general steps are as follows:
Determine whether the prefix is a traditional command by byte at the beginning of the given address, including segment coverage, operand size coverage, address size coverage, lock, and rep until other values appear.
Then, determine whether the next byte is the Rex prefix. the W-bit of the Rex prefix affects the size of the immediate number.
Compare with the encoding table

Now you can back up the complete commands, but the backup commands can be relatively addressable.
If the target address of the relative addressing address is in the BACKUP command, no correction is required; otherwise, a correction is required.
Commands with relative addressing in x64 include direct (offset in command) Short JMP, near JMP, near call, short JCC, near JCC, the rip relative addressing specified by jcxz and Rex and modr \ m. If the displacement of the instruction is 4 bytes and the distance between the backup instruction and the instruction reference address is less than 2 GB, the displacement (displacement) in the instruction can be directly corrected ), new shift = original shift-original command address + new command address. If displacement is 1 byte, such as short JMP, short JCC, and jcxz, it must be converted to near JMP or near.
The combination of JCC, near JMP, and jcxz will lead to a change in the instruction length, which will affect the subsequent instruction location and need to be considered. X64 does not have a 2-byte offset branch instruction.

If the original instruction reference address differs from the backup instruction address by more than 2 GB, a larger change is required.
for direct short JMP and direct near JMP, you can convert it to
push targetaddrlow
mov [rsp-4], targetaddrhigh
RET
for conditional branch commands, you can convert them into a combination of the preceding commands and conditional branch commands.
for direct near call commands, can be transformed into
jmp rip + 15
NOP
...
label A: (ensure that the address of a is a multiple of 8, because x64 may cause an un-alignment exception)
target address
...
NOP
call [Rip-B + A]
Label B:
for Rip relative addressing of memory operands, I have not found a good correction method.
I wanted to convert it to
mov rax, [moffset]
the original instruction operand was changed to Rax
mov [moffset], rax (if the instruction can write memory)
In x64, only mov commands with the operation code A0, A1, A2, and A3 can accept 8-byte segment offsets, another operand is the accumulate register.
however, it is difficult for intel to determine whether the instruction is written into memory, so it is difficult to determine whether mov [moffset] should be added. In addition, Rax
, in this way, the value of the Register will be changed. You can use push backup and pop restoration, but it is not applicable to indirect branch commands.

I am currently using a method to find a suitable address for memory, and then place the BACKUP command on it. The general process is as follows:
Calculate the minimum lowref and maximum highref of the target address corresponding to rip in the original command.
Calculates lowbound = HighRef-2G, highbound = lowref + 2G
The distance between the commands in the range [lowbound, highbound] And the operands of all rip addresses in the original command is less than 2g.
If lowbound> = highbound, you cannot continue
Use ntqueryvirtualmemory to find an idle area contained by [lowbound, highbound], and then use ntallocatevirtualmemory to allocate a piece of memory to the rtlcreateheap to create a heap, record the starting address of the heap and the maximum heap length.
Then, you can use rtlallocateheap to allocate memory on the heap, place backup commands, and correct them.

In the future, when you need to back up commands, find the appropriate ones from the created heap, and create a new heap sometimes.

Source codeSee http://ishare.iask.sina.com.cn/f/24072861.html
This is only for learning materials. In formal scenarios, please use detours (x64 for money) or free N-codehook

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.