[Learning] Windows PE file Learning (1: Export tables), pe Export
Today, I made a small program to read the exported table from the PE file for learning.
I have referenced the book "Windows PE authoritative guide.
First, the full name of the PE file is Portable Executable. Executable files that can be transplanted. Common EXE, DLL, OCX, SYS, and COM files are PE files.
We know that in a Windows program, almost all the functions it implements are called the API functions provided by the system DLL. To use the function provided by any DLL, We need to import it, that is, use the import table. However, for DLL programs that provide exported functions, they must use the export table to export functions before other programs can use them. Whether it is a standard DLL provided by the system or a self-compiled DLL, you must create an export table if you want to provide your own functions for others. Generally, you can use any development environment to write programs with the export function. The export tables are automatically created by the linker. The programmer only needs to specify the name or serial number of the function to be exported.
The export table is usually displayed in the. edata section of the DLL file.
Knowing the location of the exported table, we can get the address of the exported function and Hook these functions. Our current goal is to learn the structure of exported tables in PE files, so it is necessary to understand the structure of PE files.
1. Basic Concepts
Note: The following references are from the network.
The following table describes some concepts throughout this article:
Name |
Description |
Address |
It is a "virtual address" instead of a "physical address ". Why is it not a "physical address? Because the data location in the memory is often changing, this can save memory costs, avoid the wrong memory location and other advantages. At the same time, the user does not need to know the specific "real address", because the system will prepare the memory space for the program itself (as long as the memory is large enough) |
Image File |
Contains "executable files" represented by EXE files and "Dynamic Link Library" represented by DLL files ". Why is "image" used "? This is because they are often "copied" directly to the memory, which means "image. It seems that Westerners are quite imaginative. ^ 0 ^ |
RVA |
The full name is Relatively Virtual Address. Offset (also known as "relative virtual address "). The offset relative to the base address of the image. (Sometimes it is not necessarily the base address of the image. It may also take the first address of a structure as the base address) |
Section |
Section is the basic unit of code or data in a PE file. In principle, the Section can only be divided into "code section" and "data section ". (The file section size is usually aligned with a physical sector of the disk, that is, B. If an image file is loaded into the memory, it is aligned with the size of a memory page. The size of a 32-bit file is 4 K, 64-bit 8 K) |
VA |
The full name is Virtual Address. Virtual Address (normal address in the virtual memory, no need to convert) |
There are special sections, whether in files or in memory, the alignment granularity is different from that of other sections. For example, the resource bytecode is aligned in double words.
2 PE file structure
Overall structure of the PE file: three heads and bodies are shown in the image. The three headers are Dos headers, Nt headers, and node tables (node headers). Each header is a data structure and can be found in winnt. in the h header file, find the corresponding struct definition (the Nt header is divided into 32-bit and 64-bit ).
Because the PE file is compatible with the DOS system of Windows NT, any current PE file can run on the DOS system. However, most of the files may only have one sentence: "This program cannot be run in DOS mode ". This is determined by the Dos header in the structure of the PE file.
Open any image file with notepad. The first two bytes must be the string "MZ", short for Mark zbikoski, one of the first MS-DOS designers. Then there are some parameters in the MS-DOS that are needed to run the program under the MS-DOS. At the end of these parameters, that is, the file offset 0x3C (60th bytes) is a 4-byteOffset address of the PE file Signature. This address has a special name named "E_lfanew ". This signature is "PE00" (the letter "P" and "E" are followed by twoNULL bytes). Followed by E_lfanew is a MS-DOS program. It is a legal application that runs under the MS-DOS. When an executable file (usually an exe or com file) runs under a MS-DOS, the program displays "This program cannot be run in DOS mode (This program cannot run in DOS mode) this message. You can also change the program on your own, as some restoration software does. At the same time, this is why some programs can run both DOS and Windows. The total DOS header size of Notepad.exe is 224 bytes. This value applies to most Win32 files that cannot be run in DOS. The MS-DOS program is dispensable, if you want to make the file size as small as possible can save the MS-DOS program, while clearing the previous parameters 0.
3 Nt header IMAGE_NT_HEADERS
The more complex part of the PE file is here.
The signature "PE \ 0 \ 0" pointed to by DosHeader-> E_lfanew mentioned in 2 is the first member of the Nt header, in programming we get Nt header method is also doing this, because the Dos header of the second part of the MS-DOS part of the size can be changed, with the entire Dos is not long, only E_lfanew points to its own end.
The Nt header is also divided into two parts (four bytes except the signature ):
The definition in winnt. h is given.
1 typedef struct _ IMAGE_NT_HEADERS {2 DWORD Signature; // 4 bytes PE Header flag: (e_lfanew)-> 'pe \ 0 \ 0' 3 IMAGE_FILE_HEADER FileHeader; // 20 bytes PE file physical distribution information 4 IMAGE_OPTIONAL_HEADER32 OptionalHeader; // 224 bytes PE file Logical Distribution Information 5} IMAGE_NT_HEADERS32, * PIMAGE_NT_HEADERS32;
The IMAGE_FILE_HEADER is called the file header, and the IMAGE_OPTIONAL_HEADER32 is called the optional image header (I usually call it the option header ). The option header is the most important and complex part of a PE file, but it is optional ..
At the same time, we can see that the option header structure is different in the 32-bit and 64-bit PE files. Note that it is only different. In general, there is no difference. However, in programming, we must consider it. Because the option header is different, the Nt header will also be different.
Typedef struct _ IMAGE_FILE_HEADER {WORD Machine; // running platform WORD NumberOfSections; // number of file blocks DWORD TimeDateStamp; // file creation date and time DWORD PointerToSymbolTable; // point to the symbol table (mainly used for debugging) DWORD NumberOfSymbols; // number of characters in the symbol table WORD SizeOfOptionalHeader; // character structure size WORD Characteristics; // File Attribute} IMAGE_FILE_HEADER, * PIMAGE_FILE_HEADER;
However, the file header is clear, and the common members are Machine and Characteristics. The Machine indicates the target platform for running PE files, that is, the platform on which the instruction set CPU is to be loaded, generally, it can be used to determine whether the PE file is 64-bit or 32-bit. Characteristics uses the flag bit to determine a lot of information about the PE file. The most important thing is to determine whether the file is a dll, and.
# Define IMAGE_FILE_RELOCS_STRIPPED 0x0001 // Relocation info stripped from file. # define IMAGE_FILE_EXECUTABLE_IMAGE 0x0002 // File is executable (I. e. no unresolved external references ). this indicates whether it can run independently. For example, a dll must let other modules load itself, however, exe and sys are loaded and run by themselves # define IMAGE_FILE_LINE_NUMS_STRIPPED 0x0004 // Line nunbers stripped from file. # define IMAGE_FILE_LOCAL_SYMS_STRIPPED 0x0008 // Local symbols stripped from file. # define running 0x0010 // Aggressively trim working set # define IMAGE_FILE_LARGE_ADDRESS_AWARE 0x0020 // App can handle> 2 gb addresses # define running 0x0080 // Bytes of machine word are reversed. # define IMAGE_FILE_32BIT_MACHINE 0x0100 // 32 bit word machine. # define IMAGE_FILE_DEBUG_STRIPPED 0x0200 // Debugging info stripped from file in. DBG file # define IMAGE_FILE_REMOVABLE_RUN_FROM_SWAP 0x0400 // If Image is on removable media, copy and run from the swap file. # define IMAGE_FILE_NET_RUN_FROM_SWAP 0x0800 // If Image is on Net, copy and run from the swap file. # define IMAGE_FILE_SYSTEM 0x1000 // System File. # define IMAGE_FILE_DLL 0x2000 // File is a DLL. important # define IMAGE_FILE_UP_SYSTEM_ONLY 0x4000 // File shoshould only be run on a UP machine # define IMAGE_FILE_BYTES_REVERSED_HI 0x8000 // Bytes of machine word are reversed. # define IMAGE_FILE_MACHINE_UNKNOWN 0 # define IMAGE_FILE_MACHINE_I386 0x014c // Intel 386.32 bits # define IMAGE_FILE_MACHINE_R3000 0x0162 // MIPS little-endian, 0x160 big-endian # define IMAGE_FILE_MACHINE_R4000 0x0166 // MIPS little-endian # define IMAGE_FILE_MACHINE_R10000 0x0168 // MIPS little-endian # define limit 0x0169 // MIPS little -endian WCE v2 # define IMAGE_FILE_MACHINE_ALPHA 0x0184 // Alpha_AXP # define limit 0x01a2 // SH3 little-endian # define limit 0x01a3 # define limit 0x01a4 // SH3E little-endian # define limit 0x01a6 // SH4 little-endian # define bytes 0x01a8 // SH5 # define IMAGE_FILE_MACHINE_ARM 0x01c0 // ARM Little-Endian # define bytes 0x01c2 // ARM Thumb/Thumb-2 Little-Endian # define running 0x01c4 // ARM Thumb-2 Little-Endian # define running 0x01d3 # define IMAGE_FILE_MACHINE_POWERPC 0x01F0 // IBM PowerPC Little-Endian # define running 0x01f1 # define running 0x0200/Intel 64 64-bit # define IMAGE_FILE_MACHINE_MIPS16 0x0266 // MIPS # define IMAGE_FILE_MACHINE_ALPHA64 0 0x0284 // ALPHA64 # define limit 0x0366/MIPS # define limit 0x0466/MIPS # define prepare kernel # define IMAGE_FILE_MACHINE_TRICORE 0x0520 // Infineon # define IMAGE_FILE_MACHINE_CEF 0x0CEF # define prepare 0x0EBC // EFI Byte Code # define limit 0x8664 // AMD64 (K8) 64-bit # define IMAGE_FILE_MACHINE_M32R 0x9041 // M32R little-endian # define IMAGE_FILE_MACHINE_CEE 0xC0EE
Next, we will focus on the option header IMAGE_OPTIONAL_HEADER.
Offset (32/64) |
Size |
English name |
Chinese name |
Description |
0 |
2 |
Magic |
Magic number |
This unsigned integer indicates the status of the image file. 0x10B indicates that this is a 32-bit image file. 0x107 indicates that this is a ROM image. 0x20B indicates that this is a 64-bit image file. |
2 |
1 |
MajorLinkerVersion |
Main version number of the linker |
The main version number of the linker. |
3 |
1 |
MinorLinkerVersion |
The minor version number of the linker. |
The minor version number of the linker. |
4 |
4 |
SizeOfCode |
Code section size |
It is generally placed in the ". text" section. If there are multiple code sections, it is the sum of all code sections. It must be an integer multiple of FileAlignment, which is the size of the file. |
8 |
4 |
SizeOfInitializedData |
Size of initialized quantity |
It is generally placed in the ". data" section. If there are multiple such sections, it is the sum of all these sections. It must be an integer multiple of FileAlignment, which is the size of the file. |
12 |
4 |
SizeOfUninitializedData |
Size of uninitialized data |
It is generally placed in the. bss section. If there are multiple such sections, it is the sum of all these sections. It must be an integer multiple of FileAlignment, which is the size of the file. |
16 |
4 |
AddressOfEntryPoint |
Entry Point |
RVA is the entry point when an executable file is loaded into the memory. Generally, a program image is the startup address. If the value is 0, it starts from ImageBase. Dll files are optional. |
20 |
4 |
BaseOfCode |
Code base address |
When an image is loaded into the memory, it starts with RVA. It must be an integer multiple of SectionAlignment. |
24 |
4 |
BaseOfData |
Data Base Address |
When the image is loaded into the memory, the beginning of the Data Section RVA. (In a 64-bit file, this file is incorporated into the ImageBase following it .) It must be an integer multiple of SectionAlignment. |
28/24 |
4/8 |
ImageBase |
Image Base Address |
The preferred address of the image's 1st bytes when it is loaded into the memory. It must be a multiple of 64 K. The default DLL value is 10000000 H. By default, Windows CE's EXE is 00010000 H. Windows series EXE is 00400000 H by default. |
32 |
4 |
SectionAlignment |
Memory alignment |
The alignment value when it is loaded into memory (in bytes ). It must be ≥filealignment. The default value is the page size of the system. |
36 |
4 |
FileAlignment |
File alignment |
Alignment factor (in bytes) used to align the raw data in the section of the image file ). It should be the power of 2 between 512 and 64 K (including the two boundary values ). The default value is 512. IfSectionAlignmentSmaller than the page size of the system, FileAlignment must be equal to SectionAlignment. |
40 |
2 |
MajorOperatingSystemVersion |
Master version number of the Master System |
The version number of the operating system can be seen from "My Computer"> "help". Windows XP is 5.1. 5 is the primary version, 1 is the secondary version |
42 |
2 |
MinorOperatingSystemVersion |
Minor version number of the primary system |
44 |
2 |
MajorImageVersion |
Primary version number of the image |
46 |
2 |
MinorImageVersion |
Minor version number of the image |
48 |
2 |
MajorSubsystemVersion |
Main version number of the subsystem |
50 |
2 |
MinorSubsystemVersion |
Minor version number |
52 |
2 |
Win32VersionValue |
Reserved, must be 0 |
56 |
4 |
SizeOfImage |
Image Size |
The size when the image is loaded into the memory, including all file headers. Round up to a multiple of SectionAlignment. |
60 |
4 |
SizeOfHeaders |
Header size |
Total size of all headers, rounded up to a multiple of FileAlignment. The offset of the First Section of the PE file. |
64 |
4 |
CheckSum |
Checksum |
Image File checksum. The algorithm for calculating the checksum is merged into Imagehlp. DLL. The following programs are validated during loading to determine if they are legal: all drivers, any DLL loaded at boot, and the DLL loaded into key Windows processes. |
68 |
2 |
Subsystem |
Subsystem type |
Subsystem required to run this image. Refer to the "Windows subsystem" section. |
70 |
2 |
DllCharacteristics |
DLL ID |
Refer to the "DLL features" section. |
72 |
4/8 |
SizeOfStackReserve |
Stack retention size |
MaxStackSize.CPU Stack. The default value is 1 MB. |
76/80 |
4/8 |
SizeOfStackCommit |
Stack commit size |
The size of the stack that is initially submitted. The default value is 4 kb. |
80/88 |
4/8 |
SizeOfHeapReserve |
Heap retention size |
MaxHeapSize.The. The default value is 1 MB. |
84/96 |
4/8 |
SizeOfHeapCommit |
Stack traffic |
The size of the local heap space that is initially submitted. The default value is 4 kb. |
88/104 |
4 |
LoaderFlags |
Reserved, must be 0 |
92/108 |
4 |
NumberOfRvaAndSizes |
Number of directory items |
The number of data directory items. Because of the previous Windows NT release, it can only be 16. |
96/112 |
8*16 |
DataDirectory |
Data Directory |
Array of directory items, including 16 directory items |
This is the complete option header structure. Here, we only mention Magic and DataDirectory. As for the base address and redirection issues during image loading, this article will not introduce them, because PE File Parsing does not need to attach the image to our own program, we only need to map it to the memory and parse its content.
Determine the Magic domain to identify whether the file is 64-bit or 32-bit, so now we have two methods to distinguish.
The main character of this article-the export table is pointed out by the Directory items in DataDirectory [0], as follows:
typedef struct _IMAGE_DATA_DIRECTORY { DWORD VirtualAddress; DWORD Size;} IMAGE_DATA_DIRECTORY, *PIMAGE_DATA_DIRECTORY;
From this we can know that DataDirectory does not direct to the export table directly. The truth is: DataDirectory is an array, and each item is the same. IMAGE_DATA_DIRECTORY, each item is composed of an address and a size, which tells us the base address and size of the exported table (Don't underestimate this size, we will use it ).
After getting the address and size of the exported table, we can do something (23333 ~).
Typedef struct _ IMAGE_EXPORT_DIRECTORY {DWORD Characteristics; DWORD TimeDateStamp; WORD MajorVersion; WORD MinorVersion; DWORD Name; // This is the module Name of the PE file DWORD Base; DWORD NumberOfFunctions; // these two fields are literally understood as the total number of exported functions DWORD NumberOfNames; // This is the number of functions with names, because some export functions do not have names, only the serial number DWORD AddressOfFunctions; // RVA from base of image is called EAT. Export The Address Table DWORD AddressOfNames; // Add the base address of the RVA from base of image Nt header and the array obtained by this offset to store all the name strings DWORD AddressOfNameOrdinals; // The base address of the RVA from base of image Nt header added with this offset to store all function numbers in the array, which is not necessarily continuous, however, it generally corresponds to the exported address table.} IMAGE_EXPORT_DIRECTORY, * PIMAGE_EXPORT_DIRECTORY;
This is the structure of the exported table. The important fields are marked in red.
The information I checked online is clear:
Export Address Table (EAT)The format of the exported address table is one of the two formats described in the following table. If the specified addressNoLocated in the Export Section (whose address and length are given by the NT header), the domain is an Export RVA; otherwise, the domain is a Forwarder RVA, it gives the name of a symbol located in another DLL.
Offset |
Size |
Domain |
Description |
0 |
4 |
Export RVA |
When loaded into the memory, the export function RVA. |
0 |
4 |
Forwarder RVA |
This is a pointer to an ASCII code string ending with NULL in the export section. This string must be within the range specified by the Export Table data directory. This string shows the name of the DLL where the export function is located and the name of the export function (for example, "MYDLL. expfunc), or the DLL name and the sequence value of the export function (for example, "MYDLL. #27 "). |
Forwarder RVA exports the functions defined in other images to make them look like they are exported from the current image. Therefore, for the current image, this symbol is both an import function and an export function.
For example, for the Kernel32.dll file in Windows XP, the exported "HeapAlloc" is forwarded to "NTDLL. RtlAllocateHeap ". In this way, the application can use the Ntdll. dll module in Windows XP without actually containing any related import information. The application import table is only related to Kernel32.dll.
The value of the exported address table is sometimes 0, indicating that no function is exported here. This is to be compatible with previous versions, saving the trouble of modification.
Export Name index table
The export name index table is an array consisting of the address (RVA) of the string in the export name table. Binary to facilitate search.
The export name is defined only when the export name index table contains a pointer to an export name.In other words, the export name index table value may be 0 to be compatible with the previous version.
Export sequence table
The export sequence table is an array consisting of the index of the export address table. Each sequence has 16 characters in length. Must be from the ordinal valueMinusThe value of the Ordinal Base field is the real index of the exported address table. Note that the real index of the exported Address Table starts from 0. From this we can see that Microsoft is troubled by Ordinal Base. The value of the export sequence table and the index value of the export address table are all unsigned numbers.
The export name index table and the export name sequence table are two parallel arrays to separate them so that they can follow their respective boundaries (the former is 4 bytes, the latter is 2 bytes) aligned. During the operation, the Export Name Pointer column gives the name of the export function, and the export number column gives the sequence number corresponding to the export function. The members of the Export Name and index table are associated with the members of the export sequence table through the same index.
Export Name Table (Export Name Table, ENT)
The structure of the exported name table is a variable string of ASCII codes ending with NULL. The export name table contains the Strings actually pointed to by the export name index table. The RVA of this table is determined by the 1st values of the export name index table. The strings in this table are all function names. You can use them to call other files.
Note that when traversing the export address table, you may not get an address (or not the address of the target function), but a string. So this is the case of function forwarding. The method for determining whether the pointer is within the scope of the exported table is described above.
Learning PE files may be difficult to imagine the data structure organization. Because it is complicated, I suggest you check the structure of PE files on the Internet.