PE file format and API HOOK

Source: Internet
Author: User

For low-layer windows programming, API Interception is always an exciting thing. Is it more interesting to use your own code to change the behavior of other programs? In addition, in the process of implementing API Interception, we also have the opportunity to familiarize ourselves with many things that are rarely touched by in the RAD programming environment, such as DLL Remote injection, memory management, and PE file format. Many commercial software, such as Kingsoft dictionary software, a variety of real-time Chinese software, and even some plug-ins of online games use this technology, more or less of which need to be used in various debugging tools.

One way to implement API Interception is to modify the input address table in the PE file. In 32-bit windows. EXE file, or. DLL files use the PE file format. The PE File Format stores the address information of all API functions called by the program in the input address table, while in the program code, the address used for API calling is not the address of the API function, but the address corresponding to the API function in the address table. You only need to modify the function address in the input address table to intercept the API. First, let's get familiar with the PE file format. Because the PE file format is complicated and involves many data types, we will only introduce some content here. I have drawn a picture that roughly depicts the PE file format. Some of the data in the structure is an RVA, and all such data has been specified in the figure.

PE files start with a DOS file header, followed by a DOS stub, which is actually a complete DOS program, the main purpose of providing them in PE files is to ensure compatibility. If we execute a win32 program in DOS, this DOS program will display statements such as "This program can not run in dos mode. These two parts are not important because the size of each DOS stub is not the same, therefore, we must use e_lfanew, a member in the DOS file header, to locate the PE file header. The DOS file header is defined as the IMAGE_DOS_HEADER structure. Its member e_lfanew contains the "relative virtual address" (RVA) of the PE file header ).

Here we want to explain RVA (relative virtual address). This term is often seen in PE files. The so-called RVA refers to the offset relative to the starting address of the module, therefore, you must add the starting address of the module to the RVA to obtain the real address. It is called "virtual" because RVA makes no sense before a PE file is loaded into the memory. RVA makes sense only when the PE file is loaded into the memory.

 

Example: As shown in:

Assume that the virtual address (VA) for loading a PE file is 400000 h, and the value of member e_lfanew in the DOS header of this PE file is 40 h (RVA, the virtual address (VA) of the PE File Header referred to by it is 400040 h.

After the DOS stub is the PE File Header we are interested in. It is defined as the IMAGE_NT_HEADERS structure, which contains the information of the entire PE file. Its definition is as follows: (this is defined in assembly language, in winnt. h is defined in C language)

IMAGE_NT_HEADERS STRUCT

Signature dd?

FileHeader IMAGE_FILE_HEADER <>

OptionalHeader IMAGE_OPTIONAL_HEADER32 <>

IMAGE_NT_HEADERS ENDS

In this structure, the last OptionalHeader related to API Interception is defined as the IMAGE_OPTIONAL_HEADER32 structure, which has 31 fields, the definition is as follows: (some of them are irrelevant to API Interception)

IMAGE_OPTIONAL_HEADER32 STRUCT

...

...

NumberOfRvaSizes dd?

DataDirectory IMAGE_DATA_DIRECTORY 16 dup (<>)

IMAGE_OPTIONAL_HEADER32 STRUCT

What we need is the final DataDirectory domain, which is called a "data directory" and is an array consisting of 16 IMAGE_DATA_DIRECTORY structures, each array stores an important data structure information of the PE file. The second element is called "Import table ", the "Import table" stores the DLL and external function information called by the PE file, including the DLL name of the introduced function, the introduced function name, And the introduced function address. The method for implementing API Interception is to change the introduced function address in the "introduced table" to our own function address. IMAGE_DATA_DIRECTORY is defined as follows:

IMAGE_DATA_DIRECTORY STRUCT

VirtualAddress dd?

Isize dd?

IMAGE_DATA_DIRECTORY ENDS

VirtualAddress is the relative virtual address of the data structure, and isize contains the size of the data structure pointed to by VirtualAddress. For example, in the IMAGE_DATA_DIRECTORY structure of "Import table", VirtualAddress contains the RVA of "Import table. With this RVA, We can find "Import table ".

"Import table" is an array consisting of IMAGE_IMPORT_DESCRIPTOR structure. Each IMAGE_IMPORT_DESCRIPTOR element in the array contains information about the DLL referenced by a PE file, therefore, the number of elements in the array is related to the number of DLL referenced by the PE file. This array ends with a completely 0 IMAGE_IMPORT_DESCRIPTOR structure. Let's take a look at the definition of the IMAGE_IMPORT_DESCRIPTOR structure:

IMAGE_IMPORT_DESCRIPTOR STRUCT

Union

Characteristics dd?

OriginalFirstThunk dd?

Ends

TimeDataStamp dd?

ForarderChain dd?

Name1 dd?

FirstThunk dd?

IMAGE_IMPORT_DESCRIPTOR ENDS

Not every Member in this structure is related to the API Interception we are discussing, but it is really interesting, so here we will introduce some of its members.

The first member is a union sub-structure. This sub-structure only adds an alias to OriginalFirstThunk, which contains the RVA pointing to an IMAGE_THUNK_DATA structure array.

So what is IMAGE_THUNK_DATA? It is defined as follows:

IMAGE_THUNK_DATA STRUCT

Union u1

ForwarderString dd?

Function dd?

Ordinal dd?

AddressOfData dd?

Ends

IMAGE_THUNK_DATA ENDS

Although it looks complicated, it is actually a DWORD variable. Generally we regard it as a RVA pointing to the IMAGE_IMPORY_BY_NAME structure. As for the IMAGE_IMPORY_BY_NAME structure, it stores information about an introduced function. Definition:

IMAGE_IMPORT_BY_NAME STRUCT

Hint dw?

Name1 db?

IMAGE_IMPORT_BY_NAME ENDS

The Hint indicates the index number of the function in the DLL "Picking table", and Name1 contains the function name. (The original definition of this Member should be Name, but Name is a pseudo-instruction of the assembly language, so it should be replaced by Name1. Note that Name1 itself contains the function Name, which is not an RVA .)

FirstThunk is really related to the topic API Interception we are discussing. It also points to the RVA of an IMAGE_THUNK_DATA structure array. The IMAGE_THUNK_DATA and the IMAGE_THUNK_DATA pointed by the aforementioned OriginalFirstThunk are not the same array, but they are related, before the PE file is not loaded into the memory, the content of the two arrays is identical. However, after the PE file is loaded into the memory, the content of the IMAGE_THUNK_DATA structure array pointed to by OrigianalFirstThunk remains unchanged, it still points to the IMAGE_IMPORT_BY_NAME structure, and the content of the IMAGE_THUNK_DATA structure array pointed to by FirstThunk is changed to the actual address of the introduced function, this structure array is called the input Address Table IAT (Import Address Table ). The key to implementing the API is to modify the data in IAT and change it to the address of our own function.

After reading the above introduction, do you know the implementation method of our API Interception? Right, we first get the starting address of the module, and then use the e_lfanew field in the IMAGE_DOS_HEADER structure to locate the IMAGE_NT_HEADER structure, obtain the data directory address in the OptionalHeader structure, take the second member of the data directory, and extract the value of its VirtualAddress. In this way, we get the IMAGE_IMPORT_DESCRIPTOR structure array, that is, "Import table ". The key code is as follows:

Mov eax, hMoudle; hMoudle is the starting address of the module.

Mov esi, eax

Assume esi: ptr IMAGE_DOS_HEADER; suppose esi points to an IMAGE_DOS_HEADER Structure

Add esi, [esi]. e_lfanew; in this case, esi points to PE header

Assume esi: ptr IMAGE_NT_HEADERS; suppose esi points to an IMAGE_NT_HEADERS Structure

Mov ebx, [esi]. OptionalHeader. DataDirectory [sizeof IMAGE_DATA_DIRECTORY]. VirtualAddress; obtain the RVA of the imported table

Add eax and ebx; the actual address of the introduced table is obtained by adding the module start address to RVA.

Mov esi, eax

Assume esi: ptr IMAGE_IMPORT_DESCRIPTOR; suppose esi points to an IMAGE_IMPORT_DESCRIPTOR structure.

We traverse every IMAGE_IMPORT_DESCRIPTOR structure in this array and check the IAT table directed by FirstThunk. If the function address is the same as the API function address we want to intercept, modify it.

Invoke GetModuleHandle, addr DllName; get the name of the DLL where the API is to be intercepted

Invoke GetProcAddress, eax, addr ApiName

Mov ProcAddr, eax; gets the address of the API we want to intercept and stores it in ProcAddr.

. While! ([Esi]. originalFirstThunk = 0 & [esi]. timeDateStamp = 0 & [esi]. forwarderChain = 0 & [esi]. name1 = 0 & [esi]. firstThunk = 0); The imported table is ended by an IMAGE_IMPORT_DESCRIPTOR of all 0.

Mov edi, hMoudle

Add edi, [esi]. FirstThunk; get the starting address of the IAT table

Assume edi: ptr IMAGE_THUNK_DATA; assume that edi is directed to IMAGE_THUNK_DATA

. While [edi]! = 0; check each item in the IAT table. If it is equal to the API address we want to intercept, modify

Mov ebx, [edi]; since the IMAGE_THUNK_DATA array stores the address of the introduced function, the function address in ebx is

. If ebx = ProcAddr; if it is the same as the API address we want to intercept

Invoke GetCurrentProcess

Mov ProcHandle, eax; get the handle of the current process and put it in ProcHandle

Invoke VirtualProtectEx, eax, edi, sizeof DWORD, PAGE_READWRITE, addr Old; modify memory attributes

Mov eax, offset NewExitProcess; NewExitProcess is our own API implementation function

Mov NewAddr, eax

Invoke WriteProcessMemory, ProcHandle, edi, addr NewAddr, sizeof DWORD, NULL; rewrite

. Endif

Add edi, sizeof IMAGE_THUNK_DATA

. Endw

Add esi, sizeof IMAGE_IMPORT_DESCRIPTOR

. Endw

Find the IAT table address from the module's start address as follows:

 

In the IMAGE_IMPORT_DESCRIPTOR structure, Name1 contains the RVA pointing to the DLL name. You can use it to list which DLL is referenced by a PE file.

Well, now we know the key to implementing API Interception, but there are still some problems that have not been solved.

Let's talk about the first problem. Because Windows does not allow a process to access the memory space of another process, we cannot use one process to modify the IAT table of another process, to modify the IAT table of a process, you can only make it by yourself. Of course, a program that has already been written will not modify its IAT table properly, however, we can inject our own DLL into its process space. Once the DLL is injected into the memory space of a process, it becomes part of the process, it can access all the memory space of the process, and of course it can modify its IAT table. There are many ways to inject a DLL into a target process, but considering the compatibility, it is best to use windows hooks provided by windows to complete DLL injection. We can use SetWindowsHookEx to install a system hook. The usage of this API is as follows: HHOOK SetWindowsHookEx (

Int idHook, // hook type. In this example, it is set to the WH_GETMESSAGE hook. For other types, see MSDN

HOOKPROC lpfn, // callback message function of the hook.

HINSTANCE hMod, // specify the DLL handle of the callback message function.

DWORD dwThreadId // The thread handle monitored by the hook. In this example, set it to 0 because the system range hook is required.

);

The main purpose of installing a system hook is to use it to inject our DLL into other processes. Therefore, the callback message function of the hook is not important, you just need to call CallNextHookEx to pass the hook back. You can call UnhookWindowsHookEx to uninstall a system hook. It only requires one parameter: hook handle.

The second problem is when the DLL is modified after it is injected into the memory space of the target process? This requires the DLL entry point function. Each DLL has an entry point function. When the DLL is loaded into the memory or is detached from the memory, the entry point function is automatically executed, originally, the entry point function mainly performs initialization or some final work. Here our API Interception code is the most appropriate. Because a single process space is composed of one executable module and several DLL modules, while a program is running, after the loader loads the executable module into the memory space, it then loads all the DLL modules of the process. When the injected DLL module is loaded, the entry point function is automatically executed, modify the IAT table. At this time, the main thread of the process has not started running. After all the DLL files in the process are loaded into the memory, the main thread starts to execute and the application starts to run. At this time, we have modified its IAT table, when it calls an API with the address modified by us, its call will be transferred to our own function, thus implementing API Interception. The DLL entry point function is written as follows:

DllEntry proc hInstDll: HINSTANCE, reason: DWORD, reserved1: DWORD; DLL entry point function

. If reason = DLL_PROCESS_ATTACH; called when the DLL is installed for the first time

Push hInstDll

Pop DllhInst; Save the DLL handle in the variable DllhInst

.........

. If reason = DLL_PROCESS_DETACH; called when the DLL is detached from the process space

.........

DllEntry endp

However, windows generally does not allow us to modify code segments dynamically, because code segments generally only have execution attributes but not read/write attributes. If we write a memory space that does not have write attributes, A protective error occurs in windows. Therefore, we must make the memory address we want to modify have the read/write Attribute before modification. This can be done using VirtualProtectEx. The specific parameters are described in MSDN. There is a saying that the memory can be modified directly using WriteProcessMemory, which is not necessarily correct. If you do not need VirtualProtectEx to modify the Memory attribute in advance, WriteProcessMemory cannot always be successfully modified. The Code is as follows:

Mov ProcHandle, eax; get the handle of the current process and put it in ProcHandle

Invoke VirtualProtectEx, eax, edi, sizeof DWORD, PAGE_READWRITE, addr Old; modify memory attributes

Mov eax, offset NewExitProcess; NewExitProcess is our own API implementation function

Mov NewAddr, eax

Invoke WriteProcessMemory, ProcHandle, edi, addr NewAddr, sizeof DWORD, NULL; rewrite

In addition, if our DLL is detached from the memory for some reason, the address in the IAT of the target process will become an invalid value, if the process calls the intercepted API at this time, it will crash. Therefore, when the DLL is detached from the memory space of the process, we must recover the data in the IAT table. Of course, this restoration work is also placed in the DLL entry point function, because it is automatically executed when the DLL is detached.

Another question is how to obtain the starting address of the module. RVA is used in PE files. The real memory address can be obtained only by adding the starting address of the module to RVA. As we mentioned above, the address space of a process is composed of an executable module and several DLL modules. The DLL module also has its own import table, the APIS we want to intercept may be called in the executable module or in the DLL module, so in order to properly intercept them, we must enumerate all modules in the process space and modify their IAT tables. Here we will introduce several required APIs: createconlhelp32snapshot to create a process snapshot. It has two parameters: Specify the first parameter as TH32CS_SNAPMODULE, and the second parameter as 0, at this time, this API returns a snapshot handle, and then use the Module32First and Module32Next APIs to list all the module addresses in this process. Note that the DLL we are modifying is also a module in the process, and the IAT table of this module must have an intercepted API, this module cannot be modified. Therefore, before modifying the module in the process, you must first determine whether this module is the DLL itself, we can use VirtualQuery to obtain the starting address of the DLL for modification. We can use this starting address to determine whether the obtained module is itself. The Code is as follows:

Invoke VirtualQuery, offset Modify, addr MemBaseinform, sizeof MemBaseinform; obtain the information of the module where the DLL is located

Invoke createconlhelp32snapshot, TH32CS_SNAPMODULE, NULL; creates a process snapshot and returns a snapshot handle.

Mov snapshot, eax

Mov module. dwSize, sizeof MODULEENTRY32; set the module size before calling Module32First. Otherwise, the call will fail.

Invoke Module32First, snapshot, addr module; get the information of the first module in the process

. While eax = TRUE; check every module in the process space

Mov ebx, MemBaseinform. AllocationBase; ebx stores the starting address of our own DLL.

. If module. hModule! = Ebx

Invoke Modify, module. hModule;. module. hModule specifies the starting address of the modified module.

. Endif

Invoke Module32Next, snapshot, addr module; take the next module

. Endw

The whole source code is written in macro assembly, because the assembly language is the most direct programming language than other languages, and it can be used to make the problem clearer. In my example, the api I intercept is ExitProcess. Of course, I won't write an ExitProcess myself. I just added a piece of music before ExitProcess, in this way, the process will put a piece of music when calling ExitProcess to exit. For the sake of simplicity, some content in the Code is not implemented. For example, when the hook is detached, the IAT table is restored when the DLL is detached. You can add the content yourself.

DLL part: apidll. asm

(Omitted)

DEF file of the DLL file: apidll. def

LIBRARY apidll

EXPORTS MouseProc

EXPORTS InstallHook

Assembly command: ml/c/coff apidll. asm

Connection command: link/subsystem: windows/section:. bss, RWS/dll/def: apidll. def apidll. obj

Above is the DLL part, we must also need a program to install the system hook. The following code is used to install the system HOOK:

Installer: me. asm

(Omitted)

Assembly command: ml/c/coff me. asm

Connection command: link/subsystem: windows me. obj

Okay. After the program is compiled and connected, you can run the program. This installer only provides the hook installation function, but does not provide the hook uninstall function. You can add it and run the program by yourself, click the command button to mount the system hook to the system. At this time, the API Interception is started. Because we have installed a system-wide hook, all processes in the system will be affected. You can try a program. Because this article is entered in word 2000, try word 2000 and run word 2000. It seems that there is no response, this is because ExitProcess is intercepted and word 2000 is disabled. How did you hear the music?

Reprinted from: http://blog.csdn.net/benny5609/archive/2008/04/25/2326849.aspx

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.