PE file structure learning notes

Source: Internet
Author: User
Tags microsoft c

PE file structure

Author: Jiang
E-mail: jznsmail@163.net
Blog: http://blog.csdn.net/jznsmail/
QQ: 457283

PE file Layout




PE Header)
The PE Header contains important information such as program code, data area size, location, applicable operating system, and initial stack size. The PE Header is not at the beginning of the file.
The first several hundred units of a file are Dos Stub: A very small dos program used to output information such as "this program cannot be run in DOS mode. When the Win32 loader maps a PE file to the memory, the first unit of the memory ing file (memory mapped file) corresponds to the first unit of the Dos Stub. In the Dos Stub header, you can find the real PE Header through a structure.
Pntheader = dosheader + dosheader-> e_lfanew;
E_lfanew is a relative offset pointing to the real PE Header.
The dosHeader is the base address of the image.
Note: The memory should grow up, so the offset should be added instead of the Offset.
The PE Header is the entire IMAGE_NT_HEADERS. This structure has a DWORD and two sub-structures:
DWORD Signature;
IMAGE_FILE_HEADER FileHeader;
IMAGE_OPTIONAL_HEADER OptionalHeader;
If e_lfanew points to a NE Signature instead of PE Signature, it indicates a Win16 NE executable file. If it is LE Signature, it indicates a VxD document. For the LX Signatrue table OS/2 document.
 
 
The IMAGE_FILE_HEADER structure is as follows:
DWORD Machine; indicates the type of CPU used, which can be found in Winnt. h (My header file is defined as follows)
# Define IMAGE_FILE_MACHINE_UNKNOWN 0
# Define image_file_machine_i386 0x014c // The intel 386.
# Define image_file_machine_r3000 0x0162 // MIPS little-Endian, 0x160 big-Endian
# Define image_file_machine_r4000 0x0166 // MIPS little-Endian
# Define image_file_machine_r10000 0x0168 // MIPS little-Endian
# Define image_file_machine_wcemipsv2 0x0169 // MIPS little-Endian wce v2
# Define image_file_machine_alpha 0 0x0184 // alpha_axp
# Define image_file_machine_powerpc 0x01f0 // IBM PowerPC little-Endian
# Define image_file_machine_sh3 0x01a2 // SH3 little-Endian
# Define image_file_machine_sh3e 0x01a4 // sh3e little-Endian
# Define image_file_machine_sh4 0x01a6 // sh4 little-Endian
# Define image_file_machine_arm 0x01c0 // arm little-Endian
# Define image_file_machine_thumb 0x01c2
# Define IMAGE_FILE_MACHINE_IA64 0 0x0200 // Intel 64
# Define IMAGE_FILE_MACHINE_MIPS16 0x0266 // MIPS
# Define IMAGE_FILE_MACHINE_MIPSFPU 0x0366 // MIPS
# Define image_file_machine_mipsfp160. 0x0466 // MIPS
# Define IMAGE_FILE_MACHINE_ALPHA64 0x0284 // ALPHA64
# Define image_file_machine_axp64 image_file_machine_alpha64
Word numberofsections; the number of sections s in OBJ of exe
DWORD timedatestamp; specifies the time when the connector generates this file. The total number of seconds since p.m. on January 1, December 31, 1969.
DWORD pointertosymboltable; coff symbol table offset position. It is only useful for coff debugging.
DWORD numberofsymbols; number of symbols in the coff symbol table.
DWORD sizeofoptionalheader; a dispensable header size. In the EXE file, this is the size of image_optional_header. In the OBJ file, the value is 0 in most cases.
Word characteristics; describes the nature of the file. Important features are as follows:
0x0001 file not relocated
0x0002 files are executable files
0x2000 files are dynamic connection Libraries
All properties defined by the system are listed below:
# Define image_file_relocs_stripped 0x0001 // relocation info stripped from file.
# Define image_file_executable_image 0x0002 // file is executable (I. e. No unresolved externel references ).
# Define image_file_line_nums_stripped 0x0004 // line nunbers stripped from file.
# Define image_file_local_syms_stripped 0x0008 // local symbols stripped from file.
# Define image_file_aggresive_ws_trim 0x0010 // agressively trim Working Set
# Define image_file_large_address_aware 0x0020 // app can handle> 2 GB addresses
# Define image_file_bytes_reversed_lo 0x0080 // bytes of machine word are reversed.
# Define image_file_32bit_machine 0x0100 // 32 bit word machine.
# Define image_file_debug_stripped 0x0200 // debugging info stripped from file in. dbg file
# Define image_file_removable_run_from_swap 0x0400 // If image is on removable media, copy and run from the swap file.
# Define image_file_net_run_from_swap 0x0800 // If image is on net, copy and run from the swap file.
# Define image_file_system 0x1000 // system file.
# Define image_file_dll 0x2000 // file is a DLL.
# Define image_file_up_system_only 0X4000 // file shoshould only be run on a up Machine
# Define image_file_bytes_reversed_hi 0x8000 // bytes of machine word are reversed.


Image_optinal_header Structure
This structure is some additional information except image_file_header.
Word magic; defines the image status. For example, 0x0107 indicates a ROM image and 0x010b indicates a normal EXE image.
Byte majorlinkerversion
The connector version of the byte minorlinkerversion PE file, which is in decimal format.
For other information, see the WINNT. h header file.

Section Table

Contains information about each section of the image. Sections are arranged in the starting position rather than in alphabetical order. Each region of Section Table stores an address, and the original data of the file is mapped to the memory. Sections is a memory range. All code and data required by programs and operating systems have a corresponding section storage.

The PE Header is an array of image_section_header elements. The number of elements in the array is recorded in image_nt_header.fileheader.numberofsection.

Image_section_header is the complete information of the section of the EXE or OBJ file.
Byte name [image_sizeof_short_name] is an eight-bit ANSI name (without a null Terminator), indicating the section name (such as. text ).
Union {
DWORD physicaladdress;
DWORD virtualsize;
} MISC;
Indicates the virtual memory size of the Code section or data section in the EXE file (alignment is not performed ).
For the OBJ file, it indicates the actual address of the Section. The first section starts from 0. The starting address of the next section is the last section address plus the sizeofrawdata value (the adjusted section virtual memory size ).
DWORD virtualaddress; In EXE, it indicates the virtual address mapped to the section by the loader. The actual starting address of section is the address plus the base address. This address is often set by the compiled program to 0x1000.
It is meaningless in the OBJ file and always 0.
DWORD sizeofrawdata; In the EXE file, it indicates the value after the section size is alignment.
In OBJ, the actual section size specified by the compilation is displayed.
DWORD pointertorawdata; the offset starting from the file header. The initial information of the section can be obtained from this position.
DWORD pointertorelocations; it has no meaning in EXE, total 0.
In OBJ, This is the offset starting from the file header to point to the relocation information of the Section. The relocation information of each OBJ section follows the section information.
DWORD pointertolinenumbers; offset address of the row number table. In the EXE file, the row number information is placed at the end of the file. In the OBJ file, the row number table is placed after the original data of each section and the relocation table.
Word numberofrelocations; number of relocated items in the relocation table (pointed to by pointertorelocations ). Only used for OBJ files.
Word numberoflinenumbers; number of row numbers in the row number table (pointed by pointertolinenumbers ).
DWORD characteristics; a group of flags that indicate attributes in a section (such as code or data readable and writable ). See the definition of image_scn_xxx_xxx in winnt. h.


Sections
. Text Section
Contains all general program code. In. Text, apart from the code generated by the compiler and the code of the Runtime Library, there are some other things. In the PE file, when you call functions in another group of modules (such as getmessage in user32.dll), the call command generated by the compiler does not directly transmit control to the functions in the DLL, instead, it is passed to a jmp dword ptr [XXXXXXXX] command, which is also in. text. The JMP command jumps to a DWORD in. idata. This DWORD contains the real function entry address.
For example:

Why is this method used to call DLL?
By collecting all calls to the same DLL function in one place, you do not need to modify the instructions for calling each DLL when loading the program. You only need to put the real address of the DLL function. in the DWORD of idata. This call brings about a disadvantage, that is, You Cannot initialize a variable with the real address of the DLL function.
For example:
Farproc pfngetmessage = getmessage;
The actual jmp dword ptr [XXXXXXXX] command address of this variable, instead of the actual function address to be called.
For API functions modified with _ declspec (dllimport), the compiler generates call dword ptr [XXXXXXXX] instead of the jmp dword ptr [XXXXXXXX] command to call XXXXXXXX in. idata.


. Data Section
The place where the initialization information is stored. Including global variables, string constants, and static variables. These variables give initial values during compilation. The connector combines all. Data in OBJ and Lib into. Data in exe. The variable is placed in the stack during execution.

. BSS section
Store any uninitialized static and global variables. Put all. BSS in OBJ and Lib together in. BSS of the EXE file. In section table, rawdataoffset of. BSS is always 0, indicating that this section does not occupy any space.

. CRT Section
Another initialized data section used by Microsoft C/C ++ Runtime Library (CRT. Here, it stores the constructors of the static C ++ class executed before Main or winman.

. Rsrc Section
Storage module resources. Such as. Res file content.

. Idata Section
Contains information about the modules from other DLLs input (import) Functions and materials.

. Edata Section
Stores information about the PE file output function. It is usually only seen in DLL. edata.

. Reloc Section
Store a base relocation array. Base relocation is the adjustment value of a group of commands or initialization variables. If the loader cannot load the EXE or DLL to the preset address, you must make such adjustments.

. TLS Section
When _ declspec (thread) is used, the defined information is not put into. Data or. BSS, but a copy is put into. TLS.
The full name of. TLS is Thread Local Storage. Each thread can have its own set of static data. The program code using the data does not need to be executed by that thread. Suppose a program has several threads to process the same job. If you declare an STL such:
_ Declspec (thread) int I = 0; // This is a global variable Declaration
Each thread will have a copy of variable I.


. RDATA Section
There are at least four purposes:
1. In the EXE generated by using the Microsoft Connection Program,. RDATA contains DEBUG directory (not included in OBJ ).
2. If description is specified in the program's. Def file, the specified string will appear in. RDATA.
3. Put the guid values in. RDATA of EXE or DLL.
4. Directory for storing TLS (Thread Local Storage. Used by the compiler's Runtime Library.


. Drectve Section
It only appears in the OBJ file and contains the text description of the connector command parameters.


PE file input

Before being loaded into the memory, the. idata information stored in the PE file is used by the loader to determine the function address and correct them, so that the image can be used. After being loaded, idata implies a pointer pointing to the input function of EXE/DLL.
. Idata section (import table) starts with an image_import_descriptior array. The DLL connected to the PE file will have a corresponding image_import_descriptor structure here.

PE file output

The output function information of the PE file is stored in. edata .. Edata is initially an IMAGE_EXPORT_DIRECTORY structure.


Base address relocation of PE documents

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.