About mode transitions and jumpsThe main branch commands under arm, such as BX,BLX, can switch the instruction mode. See Arm's user manual. The main discussion here is the choice of mode and timing of switching. There is only one problem to note that when the ARM processor executes, because of the pipeline relationship, it will prefetch two instructions, so the current instruction to take the PC value is always the address of the third instruction after. For example the current instruction address is 0x8000, then the current PC value, below the thumb is 0x8000 + 2 * 2, under arm is 0x8000 + 4 * 2.
Since I did not find an easy way to know the target function instruction set of the hook, the problem was left to the user of the hook to decide. Before hooks should be informed by the reverse tool that all target functions are arm or thumb instructions.
If you want to use different compilation options for the hook function depending on the target function instruction set, it is obviously a hassle. The instruction of ARM mode is our first choice because it contains more semantics than a single instruction. It is therefore possible to consider that the master only uses arm instructions to compile the hook function and switch to arm mode while jumping.
As for the instruction of jump insertion, the jump range of the arm instruction with immediate number is only 256 bytes 4m,thumb. So the preferred LDR pc,xxxx directive is implemented. This command is easy to choose for the purpose of the ARM directive. As follows:
Ldr pc, [pc,#-4]
32-bit Jump absolute address
The instruction is a single 32-digit number: 0xe51ff004.
However, the 16-bit LDR instruction in thumb mode has no way to load the PC, and the choice is problematic. If you use only the 16-bit thumb instruction, the jump part needs to occupy a large number of bytes, and because the arm compiler often uses the value of the PC as the base address to calculate the location, the instructions are very likely to exist in the command. This part of the instruction must be corrected in the post-move code, and since the immediate number that the thumb can support is small and the jump range is small, this correction is often cumbersome and requires a few equivalent instructions to replace an instruction. After consideration, or decide to abandon support for ARMV5, directly use the thumb-2 instruction set supported by ARMV6T2. The thumb-2 supports 32-bit thumb instructions, and Ldr takes the PC as the target register:
LDR.W pc,[pc, #0]
32-bit Jump absolute address
instruction is a single 32-digit number: 0X00F0DFF8
All need to jump address, need to note is bit0 processing. If Bit0 is 1, the jump will switch to thumb instruction mode, and if Bit0 is 0, it will switch to arm mode. When the target is arm, we do not need special processing, the compiler will handle the computation of the address. However, when the target is the thumb, when jumping from the hook to the hook function, and call the original function, you need to pay attention to address bit0 processing.
About the original instructions to move out.According to the implementation of the Detours Library under Win32, the first few instructions of the hook function will be moved to a trampoline, and after these instructions, add the instructions to jump to the subsequent parts of the original code. During the moving process, address corrections are required for the moved instructions. This is also required in the process of working with the arm platform inline. But when it comes to dealing with it, it's very difficult to do that. Under ARM, it is common to generate the following instruction block that takes a PC as an address reference
Because the arm platform has a small addressing range, the compiler typically chooses to mix the data and instructions in memory. In thumb mode, because the immediate number that can be contained in the instruction is very small, the problem will appear to be unusually prominent, and often an instruction is stretched to a number of bars when corrected. So code corrections can be a lot of work. This part of the problem because it is too time consuming, I am only the arm of the inline research implementation, there is no control of the problem. If the actual project is to use the hook, this part should take more time than the simple hook jump implementation. Without considering concurrency and efficiency, when calling the original function in the hook function, you can consider temporarily recovering the hook, and then hook again after the call is complete. But it's always quite an elegant implementation.