An improvement in thunk Technology

Source: Internet
Author: User

An improvement in thunk Technology

Author: Nanfeng

Download source code

Abstract: This article introduces how to avoid writing machine code directly in thunk technology.

Keyword: thunk machine code this pointer

Thunk technology is generally consideredProgramDirectly construct executableCode(Normally, this is a compiler task ). The source of the word in Deep Exploration C ++ object model has been verified (on the 162 page of the Chinese version), saying Thunk is knuth's inverted spelling. Knuth is the author of the art of computer programming, a famous computer classic masterpiece. This book is called the "programming Bible" by programmers, along with Newton's "mathematical principles of natural philosophy, he was named "one of the ten greatest scientific books in the history of the world" (I don't know who commented on it, but I didn't find it, but this book is awesome anyway ).
Generally, the thunk technique is used to check the machine code of the instruction in advance, assign the array or struct value to the binary value of the machine code, and then jump to the first address of the array or struct. For example, the Code in reference [1:

 
Void Foo (int A) {printf ("in Foo, A = % d \ n", a);} unsigned char code [9]; * (DWORD *) & code [0]) = 0x0000044ff;/* Inc dword ptr [esp + 4] */code [4] = 0xe9;/* JMP */* (DWORD *) & code [5]) = (DWORD) & foo-(DWORD) & code [0]-9;/* jump offset */void (* PF) (INT/* a */) = (void (*) (INT) & code [0]; PF (6 );

This is a typical thunk code. The execution result is "in Foo, A = 7 ".
As you can see, it defines an array code [9], and then directly assigns the machine code of each Assembly instruction obtained in advance to the array. Define a function pointer that is equal to the first address of the array, and then call the thunk code through the function pointer. The function pointer is used to complete the call. The advantage is that the Code is clear and easy to read. You can also use the assembly code JMP or call, so that you do not need to define an additional function pointer.
Thunk code on the network is basically the same idea. If you actually write such code, you will find it very troublesome. Looking for the machine code of every assembly instruction in the textbook won't be a pleasure. In fact, let's look back and think about it. Isn't the computer the most suitable? Isn't the compiler doing this?
The above code is used as an example. Let's reconsider the entire process. Our purpose is to increase the parameter by 1 before calling the function Foo. In general, there must be no foo function to do this. Source code Or you cannot modify the source code. Otherwise, you can directly change the code of the foo function. Why bother. To simplify the call process, it is appropriate to define a function pointer. Otherwise, it is too troublesome to write assembly code JMP or call every call. In this way, the function pointer must point to the address of a code segment. But must this code segment be constructed using machine code? You can also directly write assembly code.
Of course, there is a problem here. When writing Assembly commands, we must write one command and one command. We cannot say that the instruction is half written, and then let the assembler process it. In the above Code, the First Command Inc writes the Assembly statement directly. However, the following JMP statements cannot be written directly. This is because when we write Assembly statements, the JMP jump offset is unknown and must be compiled before we know. In addition, we cannot write only JMP, but not offset, which is not compiled.
This problem can be solved in this way. When writing a JMP statement, we write a DWORD with a placeholder, and its value is set to a special value, such as 0 xFFFF (principle is like this, the actual processing needs to be rolled back, which will be explained later ). As long as this value does not appear in the thunk code. Then, before the first call, search for the value in the thunk code and replace it with the calculated dynamic value. After such processing, the machine code can be completely eliminated in the thunk code.
To generate the correct machine code, we use two functions. One template is used to generate the machine code, and the other function is used to fill in the template of the machine code with the value that needs to be dynamically calculated. The following is an example:

 
Void thunktemplate (DWORD & addr1, DWORD & addr2) // generates the machine code {int flag = 0; DWORD x1, x2; If (FLAG) {// note, the code in this bracket cannot be executed directly because it may contain meaningless digits. _ ASM {thunk_begin:; // compile the thunk code here .... thunk_end :;}__ ASM {mov X1, offset thunk_begin; // obtain the address range of the thunk code segment. moV X2, offset thunk_end;} addr1 = x1; addr2 = x2 ;}

The above function is used to generate a thunk machine code template. It is called a template because it contains meaningless digits and must be replaced with meaningful values, to execute the code. Therefore, in a function, the thunk code template is placed in an IF (0) statement to avoid thunk code execution when the function is called. In addition, in order to easily obtain the thunk code template address, here we use a function to output the first and last addresses of the thunk code.

As for the replacement percentage function, it is very easy to directly replace.

Void replacecodebuf (byte * code, int Len, DWORD old, dword x) // Replace the dynamic value. {int I = 0; for (I = 0; I <len-4; ++ I) {If (* (DWORD *) & code [I]) = old) {* (DWORD *) & code [I]) = x; return ;}}}

Two functions are used as follows:

 
DWORD addr1, addr2; thunktemplate (addr1, addr2); memset (m_thunk, 0,100); // m_thunk is an array: Char m_thunk [100]; memcpy (m_thunk, (void *) addr1, addr2-addr1); // copy the code to m_thunk. Replacecodebuf (m_thunk, addr2-addr1,-1, (DWORD) (void *) This); // replace-1 in m_thunk with the value of this pointer.

So far. The following is a complete and practical example. In Windows, callback functions are very common. For example, the window process and the timer callback function. These functions are written but never called directly. Instead, you pass the function address to the system. When the system detects some events, the system calls these functions. This is certainly good, but if you want to make an encapsulation and write all the relevant parts into a class, then the problem arises.
The problem is that the form of these callback functions has been defined in advance. You cannot make a member function of a class a callback function because the types cannot match. This cannot blame Microsoft. Microsoft cannot define a callback function as a class member function (why is this definition a class ?), The callback function can only be defined as a global function. In addition, Microsoft also provides remedial measures many times, adding a void * parameter to the callback function. This parameter is generally used to pass the this pointer of the class. In this way, we can solve this problem by providing the system with a global function as the callback function. In this function, the class object is accessed through the additional void * parameter, to directly call the class member function. In this way, your encapsulation can be completed, but you only need to call the callback function.

However, not all callback functions are so lucky. Microsoft provides them with an additional parameter. For example, the timer callback function does not exist.

 
Void callback timerproc (hwnd, // handle to window uint umsg, // wm_timer message uint_ptr idevent, // timer identifier DWORD dwtime // current system time );

Each of the four parameters has a purpose. There is no place for you to pass that this pointer. Of course, you can also do it by passing the data. For example, you can set hwnd as a struct pointer, which contains the original hwnd and a this pointer. After the hwnd is removed from the timer callback function, it is forcibly converted to a struct pointer. The original hwnd is retrieved and the this pointer is taken. Now you can use the this pointer to freely call class member functions. However, this method is not what I want. What I want is a universal and unified solution. It is usually okay to add the happend Method to the parameter. But what if a callback function has no parameters? In addition, it is originally encapsulated as a class, and the result still needs to carry a global function. Don't you feel a little uncomfortable?
This is exactly where thunk technology is. We know that the so-called class member function and the corresponding global function actually have a difference of this pointer. If we properly process this pointer before the system calls the function, the system can call the class member function correctly.
The specific idea is as follows: when the system needs a callback function address, we pass the address of the thunk code segment. This code snippet does two things:

1. Prepare the this pointer.
2. Call member functions

The key code is as follows (the complete project is included in the attachment ):

Void thunktemplate (DWORD & addr1, DWORD & addr2, int calltype = 0) {int flag = 0; DWORD x1, x2; If (FLAG) {__asm/_ thiscall {thiscall_1: mov ECx,-1; //-1 placeholder, which will be replaced with the this pointer during runtime. moV eax,-2; //-2 placeholder, which will be replaced with the ctimer: callbcak address during runtime. JMP eax; thiscall_2 :;__ ASM // _ stdcall {stdcall_1: Push dword ptr [esp]; // save (copy) return address: mov dword ptr [esp + 4],-1 in the current stack; // send this pointer to the stack, that is, mov eax,-2 in the original return address; JMP eax; // jump to the Target message processing function (class member function) stdcall_2 :;}}if (calltype = 0) // this_call {__ ASM {mov X1, offset thiscall_1; // obtain the address range of the thunk code segment. moV X2, offset thiscall_2 ;}} else {__ ASM {mov X1, offset stdcall_1; MoV X2, offset stdcall_2 ;}} addr1 = x1; addr2 = x2 ;}

The above functions need to be described in the following aspects:

1. In order to adapt to the two different member function call conventions, two pieces of code are written here. The calltype parameter is used to determine which code to copy to the buffer zone.
2. a jmp xxxx command is divided into two commands:

 
MoV eax,-2; JMP eax;

This is determined by the characteristics of the assembly language. Directly Writing JMP-2 is not acceptable (depending on the address, there may be several forms after JMP assembly. A real address must appear here so that the assembler determines the JMP type ).
3. If you do not know the knowledge of this pointer, please refer to another article in the VC knowledge base.ArticleDirectly call the class member function address.

The complete code for setting thunk code is as follows:

 
DWORD funcaddr; notify (funcaddr, & ctimer: callbcak); DWORD addr1, addr2; thunktemplate (addr1, addr2, 0); memset (m_thunk, 0,100); memcpy (m_thunk, (void *) addr1, addr2-addr1); replacecodebuf (m_thunk, addr2-addr1,-1, (DWORD) (void *) This); // replace-1 with this pointer. replacecodebuf (m_thunk, addr2-addr1,-2, funcaddr); // replace-2 with the pointer of the member function.

If you want to assign the machine code directly to the array as before (it looks cool after all, I fully understand ). You can also call thunktemplate to generate m_thunk, print the value of the array, and then assign values to the m_thunk array directly in the program, just like most thunk code on the Internet, of course, one more step before calling is to replace the occupied number. However, in any case, it is much easier to call these two functions to generate machine code than to manually find the code. If you think so, even if this article is not written in white.

References: class member message processing functions implemented based on thunk

latest comment [comment] [Article Contribution] View All comments recommended to friends Print
thank you .. Haowen .. (Bingdao68 was published at 22:57:00, 2011-3-22)
excellent article! (Wyyayy was published on 18:31:00)
clever.
I looked at it, but I only called the existing function code. I doubt whether thunk technology has practical value...

(tgl10 was published on 14:21:00)
oh, please ignore the questions I have mentioned earlier. I am wrong, if the called function is changed, the stack can be flattened. (jink1025 was published on 9:44:00)
_ ASM // _ stdcall
{< br> stdcall_1:
push dword ptr [esp]; // save (copy) The return address to the current stack
mov dword ptr [esp + 4],-1; // send this pointer to the stack, that is, the original return address
mov eax,-2;
JMP eax; // jump to the Target message processing function (class member function)
stdcall_2 :;
}

it looks wrong.
__ stdcall: the caller is responsible for recycling the stack. If you push the stack once without permission, the stack will fail after the function returns (jink1025 is published on 9:38:00)
this is the case. You can also say that (sheds was published on 22:01:00)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.