PE Study Notes (1)

Source: Internet
Author: User

PE Study Notes

PE means portable executable (portable execution body ). Overall hierarchical distribution of PE file structure:
 
--------------
| Dos MZ header |
| -------------- |
| Dos Stub |
| -------------- |
| PE Header |
| -------------- |
| Section Table |
| -------------- |
| Section 1 |
| -------------- |
| Section 2 |
| -------------- |
| Section... |
| -------------- |
| Section N |
--------------
 
1. Summary of PE file formats

1.1. Dos MZ header:
All PE files (or even 32-bit DLLs) must start with a simple dos MZ header. With it, once the program is executed in DOS, DOS can identify this as a valid execution body and then run the Dos Stub following the MZ header.

1.2. Dos Stub:
Dos Stub is actually a valid MS-DOS. EXE or. com Program (if the file format is incorrect, an error will be reported). In an operating system that does not support the PE file format, it will display the string "this program cannot run in DOS mode" by simple calls to interrupt 21h Service 9 or implement the complete dos code based on the programmer's own intent. Its size is generally unknown. Use the/stub: Filename option of the linker to replace this program.

1.3. PE headers:
Followed by the Dos Stub is the PE Header. The PE Header is short for the image_nt_headers of the pe-related structure. It contains important fields used by many pe loaders. When the execution body is executed in an operating system that supports the PE file structure, the PE Loader finds the start offset of the PE Header from Dos MZ header (image_dos_header. Therefore, the real file header PE Header is located directly without Dos Stub.

1.4 Section Table:
The following Array Structure Section Table (Section Table) of PE Header ). If the PE file contains five sections, there are five members in the section table structure array. Each member contains the attributes of the corresponding section, file offset, and virtual offset.

1.5. sections:
The real content of a PE file is divided into blocks, which are called sections ). The name of each standard section starts with a dot. Sections is arranged based on its starting address, rather than in alphabetical order. The following are common node names and functions:
 
Node name Function
Alpha Architecture Information)
. BSS uninitialized data
. Crt c read-only data during runtime
. Data initialized data
. Debug debugging information
. Didata delayed input file name table
. Edata file export table
. Idata import file table
. Pdata exception information)
. RDATA read-only initialization data
. Reloc relocation table information
. Rsrc Resources
Executable code of. Text. EXE or. DLL files
Local Memory Of The. TLS thread
. Xdata Exception Handling table
 
The Section is divided based on the common attributes of each group of data, rather than the logical concept. Each section is a piece of data with common attributes, such as code/data and read/write. If the data/code in the PE file has the same attributes, they can be included in the same section. The section name is just a symbol that distinguishes different sections. For example, the name of "data" and "code" is only for easy identification. The attribute setting of the section determines the features and functions of the Section.

1.6 Main steps for loading a PE file:

1. When the PE file is executed, the PE Loader checks the PE Header offset in the dos mz header. If it is found, it will jump to the PE Header.
2. the PE Loader checks the validity of the PE Header. If valid, it will jump to the end of the PE Header.
3. The section table that follows the PE Header is followed. The PE Loader reads the section information, maps these sections to the memory using the file ing method, and pays the section attributes specified in the preceding section table.
4. After the PE file is mapped to the memory, the PE Loader will process the logic section similar to the import table in the PE file.

Ii. Dos MZ header and PE Header

2.1. Dos MZ header is defined as the image_dos_header (64 bytes ). The structure is defined as follows:

Typedef struct _ image_dos_header {// dos. EXE Header
Word e_magic; // magic number
Word e_cblp; // bytes on last page of File
Word e_cp; // pages in file
Word e_crlc; // relocations
Word e_cparhdr; // size of header in paragraphs
Word e_minalloc; // minimum extra paragraphs needed
Word e_maxalloc; // maximum extra paragraphs needed
Word e_ss; // initial (relative) SS value
Word e_sp; // initial sp value
Word e_csum; // checksum
Word e_ip; // initial IP value
Word e_cs; // initial (relative) Cs value
Word e_lfarlc; // file address of relocation table
Word e_ovno; // overlay number
Word e_res [4]; // Reserved Words
Word e_oemid; // OEM identifier (for e_oeminfo)
Word e_oeminfo; // OEM information; e_oemid specific
Word e_res2 [10]; // Reserved Words
Long e_lfanew; // file address of New EXE Header
} Image_dos_header, * pimage_dos_header;

The e_lfanew Member of the image_dos_header structure is the RVA pointing to the PE Header. E_magic contains the string "MZ ".

2.2. the PE Header is actually an image_nt_headers structure. Definition:

Typedef struct _ image_nt_headers {
DWORD signature;
Image_file_header fileheader;
Image_optional_header optionalheader;
} Image_nt_headers, * pimage_nt_headers;

Image_nt_headers structure member description:

1. Signature: a dword type with a value of 50 h, 45 h, 00 h, and 00 H (PE/0/0 ). If the signature field value of image_nt_headers is equal to "PE/0/0", it is a valid PE file. Microsoft defines the constant image_nt_signature for our use. The definition is as follows:

# Define image_dos_signature 0x5a4d // MZ
# Define image_os2_signature 0x454e // ne
# Define image_os2_signature_le 0x454c // le
# Define image_vxd_signature 0x454c // le
# Define image_nt_signature 0x00004550 // pe00

2. fileheader: This structure field contains information about the physical distribution of PE files, such as the number of sections and the File Execution Machine.

3. optionalheader: This structure field contains information about the Logical Distribution of PE files. Although the domain name has the word "optional", this structure always exists.

2.3 The steps for verifying the validity of PE files are summarized as follows:

1. Check whether the value of the first word in the file header is equal to image_dos_signature. If yes, the dos mz header is valid.
2. Once the file's dos header is valid, e_lfanew can be used to locate the PE Header.
3. Compare whether the value of the first word of the PE Header is equal to image_nt_header. If both values match, the file is regarded as a valid PE file.

The following uses a VC ++ 6.0 example to check the validity of the PE file:

First, call the getopenfilename dialog box to open a file and map it to memory (such as createfile, createfilemapping, and mapviewoffile ), obtain the target file size (m_buffer = new unsigned char [m_size];). Then obtain the first two bytes (unsigned short *) m_buffer) [0];) of the target file to see if it is "MZ ". If they are the same, obtain the location of the PE Header of the target file (unsigned int *) (2 * m_buffer + 0x3c);), and compare it with 0x00004550 (PE. Verify the PE validity.

Iii. File Header)

File Header (image_file_header) is contained in PE Header (image_nt_headers). Its structure definition is as follows:

Typedef struct _ image_file_header {
Word machine;
Word numberofsections;
DWORD timedatestamp;
DWORD pointertosymboltable;
DWORD numberofsymbols;
Word sizeofoptionalheader;
Word characteristics;
} Image_file_header, * pimage_file_header;

Description of the image_file_header structure member:

1. Machine: the CPU required for running the file. For Intel Platform, the value is image_file_machine_i386 (14ch ). We tried luevelsmeyer's pe.txt statement 14dh and 14eh, but Windows cannot execute it correctly.

Definition of some CPU identifiers:

Intel i386 0x14c
Intel i860 0x14d
MIPs r300 0x162
MIPs r400 0x166
DEC Alpha AXP 0x184
Power PC 0x1f0 (little endian)
Motorola 68000 0x268
Pa Proteus 0x290 (precision Architecture)

# Define image_file_machine_unknown 0
# Define image_file_machine_i386 0x014c // The intel 386.
# Define image_file_machine_r3000 0x0162 // MIPS little-Endian, 0x160 big-Endian
# Define image_file_machine_r4000 0x0166 // MIPS little-Endian
# Define image_file_machine_r10000 0x0168 // MIPS little-Endian
# Define image_file_machine_wcemipsv2 0x0169 // MIPS little-Endian wce v2
# Define image_file_machine_alpha 0 0x0184 // alpha_axp
# Define image_file_machine_powerpc 0x01f0 // IBM PowerPC little-Endian
# Define image_file_machine_sh3 0x01a2 // SH3 little-Endian
# Define image_file_machine_sh3e 0x01a4 // sh3e little-Endian
# Define image_file_machine_sh4 0x01a6 // sh4 little-Endian
# Define image_file_machine_arm 0x01c0 // arm little-Endian
# Define image_file_machine_thumb 0x01c2
# Define image_file_machine_ia64 0 0x0200 // intel 64
# Define image_file_machine_mips16 0x0266 // MIPS
# Define image_file_machine_mipsfpu 0x0366 // MIPS
# Define image_file_machine_mipsfp160. 0x0466 // MIPS
# Define image_file_machine_alpha64 0x0284 // alpha64
# Define image_file_machine_axp64 image_file_machine_alpha64

2. numberofsections: number of file sections. If we want to add or delete a section in the file, we need to modify this value.

3. timedatestamp: file creation date and time. The format is the total number of seconds since p.m. on January 1, December 31, 1969. According to my calculation, 0xffffffffh is 136.19251950152207001522070015221 years.

4. pointertosymboltable: the offset position of the coff symbol table. This field is only useful for coff debugging information.

5. numberofsymbols: Number of symbols in the coff symbol table.

6. sizeofoptionalheade: indicates the size of the optional Header (image_optional_header) structure followed by the current structure. It must be a valid value.

7. chracteristics: Mark the information in this file. Some important properties are as follows:

0x0001 file not relocated (relocation)
0x0002 file is an executable program EXE (that is, it is not OBJ or Lib)
0x2000 files are DLL files, not EXE files.

Generally, numberofsections is required if you want to traverse the section table. Other fields do not play a major role.
 

4. optional Header

4.1. RVA and its related concepts:

Rav indicates the relative virtual address. RVA is a distance from the virtual space to the reference point. RVA is something similar to the file offset. Of course, it is an address relative to the virtual space, rather than the file header. For example, if the PE file is loaded at 1000 h of the virtual address (VA) space and the process starts to execute from the virtual address (401000h), we can say that the starting address of the Process execution is rva h. Each RVA is relative to the initial VA of the module. Primary address (VA) 0x401000 H-base address (BA) 0x400000 H = RVA 0x1464 H. Base address is used to describe the starting position of the EXE or DLL mapped to the memory.

Why does the PE file format need to use RVA? This is to reduce the burden on the PE Loader. Because each module may be overloaded to any virtual address space, it is a nightmare to have the PE Loader correct each relocation item. Conversely, if RVA is used for all relocation items, the PE Loader does not have to worry about that: It only needs to relocate the entire module to the new starting va. This is like the concept of relative path and absolute path: RVA is similar to relative path, and VA is like absolute path.

Most of the addresses in the PE file are rvas, and rvas is meaningful only when the PE file is loaded into the memory by the PE Loader. If you map files directly to the memory instead of loading them through the PE Loader, you cannot directly use those rvas. The rvas must be converted to the file offset first.

4.2. The optional Header structure is the final member in image_nt_headers. Contains the Logical Distribution of PE files. This structure has 31 fields, some of which are critical and others are not commonly used. Its structure definition:

Typedef struct _ image_optional_header {
Word magic;
Byte majorlinkerversion;
Byte minorlinkerversion;
DWORD sizeofcode;
DWORD sizeofinitializeddata;
DWORD sizeofuninitializeddata;
DWORD addressofentrypoint;
DWORD baseofcode;
DWORD baseofdata;
DWORD imagebase;
DWORD sectionalignment;
DWORD filealignment;
Word majoroperatingsystemversion;
Word minoroperatingsystemversion;
Word majorimageversion;
Word minorimageversion;
Word majorsubsystemversion;
Word minorsubsystemversion;
DWORD win32versionvalue;
DWORD sizeofimage;
DWORD sizeofheaders;
DWORD checksum;
Word subsystem;
Word dllcharacteristics;
DWORD sizeofstackreserve;
DWORD sizeofstackcommit;
DWORD sizeofheapreserve;
DWORD sizeofheapcommit;
DWORD loaderflags; DWORD numberofrvaandsizes;
Image_data_directory datadirectory [image_numberof_directory_entries];
} Image_optional_header, * pimage_optional_header;

Image_optional_header:
 
1. Magic: used to define the image state

0x0107 (image_rom_optional_hdr_magic): a ROM image
0x010b (image_nt_optional_hdr_magic): A normal (general) EXE image. Most PE files contain this value.

2. majorlinkerversion and minorlinkerversion: the version of the linker that generates the PE file. It is represented in decimal instead of hexadecimal notation. For example, version 2.23.

3. sizeofcode: total size of all code sections. Most programs only have one code section, so this section is usually the size of. text section.
4. sizeofinitializeddata: the total size of all the sections S (but not the code section) that contain the initialization content. It does not seem to include initialized data sections.

5. sizeofuninitializeddata: the total size of all the sections that require the PE Loader to assign the memory address space to it but do not occupy the hard disk space. These sections s do not need special content when the program starts, so the uninitialized data is called. The content for initialization is usually placed in. BSS section.

6. addressofentrypoint: the location where the PE file starts to run. This is an RVA that usually falls in. text section. This field applies to EXE or DLL.

7. baseofcode: An RVA that indicates where the code section in the program starts. The Code section is usually before the data section and after the PE Header. In the exes generated by the Microsoft linker, this value is usually 0x1000. Borland's tlink32 usually specifies this value as 0x10000. By default, tlink uses 64 K as the alignment granularity, while Ms uses 4 K.

8. baseofdata: An RVA that indicates where the data section in the program starts. Data section is generally located after the code section and PE Header.
 
9. imagebase: The base address of the PE file ). For example, if the value is 400000 H, the PE Loader will try to mount the file to H of the virtual address space. The word "Priority" indicates that if the address area is occupied by other modules, the PE Loader selects other idle addresses.

10. sectionalignment: the granularity of node alignment in memory. For example, if the value is 4096 (1000 h), the starting address of each section must be a multiple of 4096. If the first section starts from H and the size is 10 bytes, the next section must start from H, even if there is a lot of space between H and H is not used.

11. filealignment: the granularity of section alignment in the file. For example, if the value is (200 h), the starting address of each section must be a multiple of 512. If the first section starts from the file offset of 400 h and the size is 10 bytes, the next section must be at the offset of 512 H, even if there is still a lot of space between the offset of 1024 and is not used or defined. The default value is 0x200 h.

12. majoroperatingsystemversion/minoroperatingsystemversion: the minimum version of the operating system that uses this executable program. These two regions of Win32 programs are usually set to 1.0.

13. majorsubsystemversion/minorsubsystemversion: Win32 subsystem version. If the PE file is specially designed for Win32, the subsystem version must be 4.0. Otherwise, the dialog box will not have a three-dimensional stereoscopic effect.

14. majorimageversion/minorimageversion: User-Defined domain, allowing you to have different versions of EXE or DLL. You can use the/version option of the linker to set its value. For example, link/version: 2.0 myobj. obj.

15. reserved1: it seems that it is always 0.

16. sizeofimage: size of the entire PE image in memory. It is the size of all headers and sections after the section alignment processing. That is, starting from the image base until the last section. The end of the last section must be a multiple of sectionalignment.

17. sizeofheaders: The size of all headers + section tables, which is equal to the file size minus the size of all sections in the file. The offset of the First Section of the PE file.

18. checksum: a crc checksum of this program. In PE, this domain is often ignored and set to 0. However, all driver DLLs, all DLLs loaded at startup, and server DLLs must have a valid checksum. The algorithm can be obtained in imagehlp. dll. The imagehlp. DLL code can be found in Win32 SDK.

19. subsystem: used to identify the subsystem of the PE file. For most Win32 programs, there are only two types of values: Windows GUI and Windows Cui (console ). Winnt. H is defined as follows:

# Define image_subsystem_unknown 0 unknown subsystem.
# Define image_subsystem_native 1 does not need sub-systems (such as drivers)
# Define image_subsystem_windows_gui 2 run in the Windows GUI Subsystem
# Define image_subsystem_windows_cui 3 run in the Windows Character Mode subsystem (that is, the console application)
# Define image_subsystem_os2_cui 5 run in OS/2 character mode subsystem (that is, OS/2 1.x Application)
# Define image_subsystem_posix_cui 7 run in POSIX Character Mode Subsystem
# Define image_subsystem_native_windows 8 one Win9x driver
# Define image_subsystem_windows_ce_gui 9 run in the Win CE Subsystem

20. dllcharacteristics: a set of flags used to indicate the environment in which the DLL initialization function (such as dllmain) is called. This value is always 0, but the operating system will call the DLL initialization function in four cases. The four values of this value have the following meanings:

0x0001: When a DLL is loaded into the address space of a process
0x0002: When a thread ends
0x0004: but when a thread starts
0x0008: When the DLL exits
0x2000: one WDM Driver

21. sizeofstackreserve: the size of the initial stack of the thread. However, not all of these memories are specified by the system. This value defaults to 0x100000 (1 MB ). If your program calls createthread and specifies that its stack size is 0, the obtained thread has a stack of the same size as this value.

22. sizeofstackcommit: the number of memory allocated to the execution thread's initial stack at the beginning. Microsoft's linker defaults this value to 0x1000 (a page), and Borland's tlink32 sets it to 0x2000 (two pages ).

23. sizeofheapreserve: Number of virtual memory reserved for the initial process heap. The heap handle can be obtained using getprocessheap. Not all of these memories are specified.

24. sizeofheapcommit: The amount of memory that is specified to the process heap at the beginning. This value defaults to 0 x bytes (tuples ).

25. loaderflags: used for debug. Possible functions:
A. Why does it cause an interruption before starting this process?
B. After a process is loaded, will it cause a debugger to execute?

26. numberofrvaandsizes: Number of member structures in the datadirectory (next field) array. Currently, the tool always sets this value to 16.

27. datadirectory [image_numberof_directory_entries]: an array of image_data_directory structures. Each structure provides an important data structure RVA. The first element of the array represents the address and size of the exported function table (if any), and the second element represents the address and size of the imported function table, and so on. Below is a complete list of their order:

// Directory entries
# Define image_directory_entry_export 0 // export directory
# Define image_directory_entry_import 1 // import directory
# Define image_directory_entry_resource 2 // Resource Directory
# Define image_directory_entry_exception 3 // exception directory
# Define image_directory_entry_security 4 // security directory
# Define image_directory_entry_basereloc 5 // base relocation table
# Define image_directory_entry_debug 6 // DEBUG directory
# Define image_directory_entry_copyright 7 // description string
# Define image_directory_entry_globalptr 8 // machine value (MIPs GP)
# Define image_directory_entry_tls 9 // TLS directory
# Define image_directory_entry_load_config 10 // load configuration directory
# Define image_directory_entry_bound_import 11 // bound import directory in Headers
# Define image_directory_entry_iat 12 // import Address Table

96/112 8 Export table export table address and size.
104/120 8 Import table import table address and size
112/128 8 Resource table resource table address and size.
120/136 8 exception table address and size.
128/144 8 Certificate Table Attribute Certificate Table address and size.
136/152 8 base relocation table address and size.
144/160 8 debug data starting address and size.
152/168 8 architecture-specific data address and size.
160/176 8 Global PTR relative virtual address of the value to be stored in the global pointer register. Size member of this structure must be set to 0.
168/184 8 TLS table Thread Local Storage (TLS) Table address and size.
176/192 8 load config table load configuration table address and size.
184/200 8 bound import table address and size.
192/208 8 IAT import Address Table address and size.
200/216 8 delay import descriptor address and size of the delay import descriptor.
208/224 8 COM + runtime header address and size
216/232 8 Reserved

Rivershan was originally created on 2003.1.18

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.