[Pin to top] Analysis and Implementation of the ATL window thunk Mechanism

Source: Internet
Author: User

The ATL window process adopts a call mechanism called thunk to solve the following problems:
To create a window by using a UI thread, register a window class, create a window, and display a window. The corresponding API functions are registerclass, createmediawex, and showwindow. In the registration window stage, you need to transmit the address of a callback function. The dispatchmessage in the message loop is responsible for distributing the message to this function. Each window has a message processing function, which is the callback function address passed in during the registration window stage. You can use the getwindowlong function to obtain this pointer. In the ATL window class, this function is a static function. Different Windows should have different window processes. Therefore, the key to creating a window-type message loop is that this common callback function must be able to obtain window objects corresponding to the window, the window object contains the real message processing code.
How can I obtain the object address from a public static class function?
A simple solution is to process the wm_nccreate message, obtain the object address placed during window creation from the lparam parameter, and call setwindowlong to put it into the window object in the kernel, in the public callback function, call getwindowlong to obtain the object address, and then call the object function.
This method is relatively simple but mediocre. In terms of efficiency, a system call is required every time a message is processed. A system call takes hundreds of cup cycles, which is not efficient enough.
Implementation of ATL:
ATL adopts a mechanism called thunk. In short, the public window process is still a static function of the class, but it is only responsible for the first message wm_nccreate of the window, the purpose is to open up a small data area on the stack, called thunk. This small data area is actually a piece of machine code. replace this address with the function address of the window process and call setwindowlongptr. Therefore, the real window process in the future is thunk. Thunk is responsible for replacing hwnd with the object address, and then dumping it to another common class static function.
Although the jump function is a static function without the this pointer, the first parameter on the stack has been replaced with the object address, the obtained object address can call the real message processing function and call the corresponding message processing process based on the message ing macro. The window handle and pointer are both 32-bit or 64-bit (64 is the platform ).
Actual implementation needs to be considered
1. Data Execution Protection (DEP)
2. How to get the object address in wm_nccreate processing.
ATL uses a unique method to solve the second problem. In fact, the simple method is to pass an object address parameter when creating a window. In addition, a better choice is to use TLS to store the required parameters. However, there is no problem with the efficiency of the ATL solution, because during the existence of the entire window, this part of the code is executed once in total.
Thunk is defined:

  #pragma pack(push,1)struct _stdcallthunk{ DWORD   m_mov;          // mov dword ptr [esp+0x4], pThis (esp+0x4 is hWnd) DWORD   m_this;         // BYTE    m_jmp;          // jmp WndProc DWORD   m_relproc;      // relative jmp BOOL Init(DWORD_PTR proc, void* pThis) {  m_mov = 0x042444C7;  //C7 44 24 0C  m_this = PtrToUlong(pThis);  m_jmp = 0xe9;  m_relproc = DWORD((INT_PTR)proc - ((INT_PTR)this+sizeof(_stdcallthunk)));  // write block from data cache and  //  flush from instruction cache  FlushInstructionCache(GetCurrentProcess(), this, sizeof(_stdcallthunk));  return TRUE; } //some thunks will dynamically allocate the memory for the code void* GetCodeAddress() {  return this; } void* operator new(size_t) {        return __AllocStdCallThunk();    }    void operator delete(void* pThunk)    {        __FreeStdCallThunk(pThunk);    }};#pragma pack(pop)

The above code is from ATL. A new beta_thunk is defined in this structure. Because it is a machine code, the bytes must be aligned.

ATL reloads the new operator and actually calls the _ allocstdcallthunk function allocation. Unfortunately, this function is only declared. But we can guess its implementation.
The size of thunk only needs 13 bytes, even 16 bytes. To execute code on the stack, the Code must be readable and writable and executable to prevent Data Execution Protection. The key point is that this part of data must be separated from the usual memory allocation space. For example, if I do not reload new, but directly create a space as a thunk and initialize it, the problem will occur. First, DEP cannot be used. Second, if there is a data overflow error in other parts of the Code, for example, and the Thunk is accidentally overwritten, this will inevitably cause a crash and it is difficult to locate.
A simple solution is to directly call the lower-layer memory management function virtualalloc, allocate a readable and writable executable area, act as a thunk area, and initialize it. But there is a problem in doing so. In R3, the allocation granularity of this function is 64 KB, which is too inefficient.
I think the best solution is to allocate 64kb during program initialization and use 56kb as the thunk heap. In this way, 56 KB/16 bytes can theoretically support 3584 windows at the same time. The start page and the last page of 64 kB are set to inaccessible to prevent overflow errors.

This solution is actually a thunk heap manager. The Code section is as follows:

Class cyycriticalsection {public: cyycriticalsection () Throw () {memset (& m_sec, 0, sizeof (critical_section ));}~ Cyycriticalsection () Throw () {} hresult Init () Throw () {hresult hres = e_fail ;__ try {initializecriticalsection (& m_sec); hres = s_ OK ;} // structured exception may be raised in low memory situations _ exist t (status_no_memory = getexceptioncode () {hres = e_outofmemory;} return hres;} hresult term () Throw () {deletecriticalsection (& m_sec); Return s_ OK;} hresult lock () Throw () {entercriticalsection (& m_sec); Return s_ OK;} Hresult unlock () Throw () {leavecriticalsection (& m_sec); Return s_ OK;} PRIVATE: critical_section m_sec;}; Template <typename tlock> class beta_ccomcritseclock {public: beta_ccomcritseclock (tlock & CS );~ Beta_ccomcritseclock () Throw (); hresult lock () Throw (); void unlock () Throw (); // implementationprivate: tlock & m_cs; bool m_blocked ;}; template <typename tlock> inline beta_ccomcritseclock <tlock>: beta_ccomcritseclock (tlock & CS): m_cs (CS), m_blocked (false) {} template <class tlock> inline beta_ccomcritseclock <tlock> ::~ Beta_ccomcritseclock () Throw () {If (m_blocked) {unlock (); m_blocked = false ;}} template <class tlock> inline hresult beta_ccomcritseclock <tlock>: Lock () throw () {hresult hr; atlassert (! M_blocked); HR = m_cs.lock (); If (failed (HR) {return (HR) ;}m_blocked = true; Return (s_ OK );} template <class tlock> inline void beta_ccomcritseclock <tlock >:: unlock () Throw () {atlassume (m_blocked); m_cs.unlock (); m_blocked = false ;} # define page_size0x1000 # define beta_64kb 0x10000 # define page_round_up (x) \ (ulong_ptr) (x) + PAGE_SIZE-1 )&(~ (PAGE_SIZE-1) # define round_down (n, align) \ (ulong_ptr) (N ))&~ (Align)-1) # define round_up (n, align) \ round_down (ulong_ptr) (N) + (align)-1, (align )) static unsigned char bitmapmask [8] = {0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80 };# Pragma pack (push, 1) struct beta_stdcallthunk {DWORD m_mov; // mov dword ptr [esp + 0x4], pthis (esp + 0x4 is hwnd) // DWORD m_this; // byte m_jmp; // JMP wndprocdword m_relproc; // relative JMP void * getcodeaddress () {return this ;} Bool Init (dword_ptr proc, void * pthis) {m_mov = 0x0000044c7; // C7 44 24 0C m_this = ptrtoulong (pthis); m_jmp = 0xe9; m_relproc = DWORD (int_ptr) proc-(int_ptr) This + sizeof (beta_stdcallthunk ))); // write block from data cache and // flush from instruction cache flushinstructioncache (getcurrentprocess (), this, sizeof (beta_stdcallthunk); Return true ;};# Pragma pack (POP) // special processing on large pages // class myheap {Public: myheap (): m_heap (null), m_heapbk (null), _ HANDLE (null), m_bf (true), _ blpage (false), _ bitcount (448 ), _ pagesize (0x400000) {memset (_ bitmap, 0,448); system_info sysinfo;: getsysteminfo (& sysinfo); _ pagesize = sysinfo. dwpagesize; If (sysinfo. dwpagesize> page_size) {_ blpage = true ;}} char * getheap () {If (m_bf & null = m_heapbk) {m_heap = (char *):: virtualalloc (null, beta_64kb, mem_commit | mem_reserve, page_execute_r Eadwrite); If (m_heap) {# ifdef _ debug // Protection Code to ensure that the Code is correctly allocated under any circumstances, but you can remove // assert (page_round_up (m_heap) = (ulong_ptr) m_heap); # endifm_heapbk = m_heap; // fangyuxing daimam_bf = false; char * TEM = m_heap; char * tpend = (char *) (round_up (m_heap + page_size), beta_64kb); tpend-= page_size; m_heap = (char *) (page_round_up (m_heap); If (m_heap> TEM) {// never coding // defensive code, which should not be executed here tpend = TEM + beta_64kb; Tpend = (char *) (round_down (tpend, page_size); tpend-= page_size;} // protectmemorydword lpfloldprotect;: virtualprotect (m_heap, page_size, page_noaccess, & lpfloldprotect);: virtualprotect (tpend, page_size, page_noaccess, & lpfloldprotect); m_heap + = page_size; _ bitcount = (long) (tpend-m_heap )) /(_ granularity * 8); # ifdef _ debug // protection codeassert (_ bitcount = 448); # endif} return m_heap;} Char * myheapallo C (); void myheapfree (char * PV );~ Myheap () {If (m_heapbk) {: virtualfree (m_heapbk, 0, mem_release | mem_decommit); m_heapbk = NULL; m_heap = NULL;} If (_ HANDLE ){:: heapdestroy (_ HANDLE); _ HANDLE = NULL;} m_bf = true;} // If the page is large, use the handle lgetheap () function instead () {If (m_bf & null ==_ handle) {_ HANDLE =: heapcreate (0, _ pagesize, _ pagesize); If (_ HANDLE) m_bf = false ;} return _ HANDLE;} bool islpage () {return _ blpage;} PRIVATE: bool m_bf; bool _ blpage; char * m_heap; char * m_he Apbk; unsigned char _ bitmap [448]; static int _ granularity; int _ bitcount; // bytecount; handle _ HANDLE; int _ pagesize ;}; // <= 448 char * myheap: myheapalloc () {beta_ccomcritseclock <cyycriticalsection> lock (beta_winmodule.m_cswindowcreate); int I; Int J = 0; for (I = 0; I <448; ++ I) {If (_ bitmap [I] = 0xff) {} elsebreak;} if (I >=_ bitcount) return NULL; for (j = 0; j <8; ++ J) {If (0 = (_ bitmap [I] & bitmapmask [J]) {break ;}} Assert (j <8); _ bitmap [I] = _ bitmap [I] | bitmapmask [J]; Return (I * 8 + J) * _ granularity + m_heap ;} void myheap: myheapfree (char * PV) {atlassert (unsigned long (Pv-m_heap)> = 0); unsigned long region = static_cast <unsigned long> (Pv-m_heap ); unsigned long T = region/_ granularity; Assert (Region % _ granularity) = 0); int bytepos = T/8; int bitpos = T % 8; // synchronize beta_ccomcritseclock <cyycriticalsection> lock (beta_winmo Dule. m_cswindowcreate); If (_ bitmap [bytepos] & bitmapmask [bitpos]) {_ bitmap [bytepos] = _ bitmap [bytepos] & (~ Bitmapmask [bitpos]); return;} assert (0 );}

Analysis: the above Code creates a thunk heap manager. Thread Synchronization must be considered. It is only correct when multiple threads are simultaneously accessed. At the same time, this heap manager takes into account the large page. What is a big page? In a few windows systems, the page granularity is set to 1 MB or 4 MB, and special processing is performed for Windows.

You can set a global variable in the program. If it is a large page, call lgetheap to obtain the handle, call heapalloc to allocate thunk, and then set the access mode to read, write, and execute.

 

 

 

 

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.