PE file Format overview

Source: Internet
Author: User
Tags data structures relative reserved stub table name intel pentium

Summary of this chapter

· PE file Format overview

· PE file structure

· How to get Oep in a PE file

· How to get resources in a PE file

· How to modify the PE file to display an instance of the MessageBox 2.1 introduction

Typically, EXE files under windows are in PE format. PE is an abbreviation for the English portable executable, which is an executable binary file (DLLs and execute program) format designed by Microsoft for Microsoft Windows NT, Windows 95 and Win32s systems, The destination file and library file are often the same format. This format was standardized by the TIS (Tool Interface Standard) Committee (Microsoft, Intel, Borland, Watcom, IBM, etc.) in 1993. Obviously, it references some of the unixes and VMS Coff (Common Object File format) formats.

Understanding the structure of executable files is very important, under DOS, especially under Windows. With this structure in place, you can encrypt, shell, and modify executables, and some hackers take advantage of these techniques. In order to make readers more aware of the PE file format, this chapter introduces the PE file format from a programmer's point of view. If you are already familiar with this knowledge, you can skip this chapter. 2.2 PE file format Overview

Know the PE file by understanding its structure layout and how it is loaded into the computer's memory. They are described separately below. 2.2.1 PE file structure layout

There are two ways to locate a structure information in a file. The first is through the linked list method, for this method, the data in the file storage location is relatively free. The second method is to use a compact or fixed location, this method requires a fixed data structure, and its storage location in the file is relatively fixed. Both of these methods are used in the PE file structure.

Because the size of each data structure in the PE file header is fixed, it is possible to write a calculation program to determine a parameter value in a PE file. When writing a program, the data structure definition used, including the variable type, the variable location, and the variable array size in the data structure, must take the prototype provided by Windows. The overall hierarchical distribution of the PE file structure shown in Figure 2.1 is as follows:

Overall hierarchical distribution of PE file structure

· DOS MZ Header

All PE files (even 32-bit DLLs) must start with a simple DOS MZ header, which is a image_dos_header structure. With it, once the program executes under DOS, DOS recognizes that this is a valid execution, and then runs the DOS Stub immediately following the MZ header.

· DOS Stub

The DOS stub is actually a valid EXE, and in an operating system that does not support the PE file format, it will simply display an error message similar to the string "This program requires Windows" or the programmer can implement the full DOS code according to his own intentions. In most cases, the DOS stub is generated automatically by the assembler/compiler.

· PE Header

Immediately following the DOS stub is the PE Header. It is a image_nt_headers structure. It contains important domains that many PE files need to be loaded into memory. When executed in an operating system that supports the PE file structure, the PE loader will find the starting offset of the PE header from the DOS MZ header. Thus skipping the DOS stub directly navigates to the real file header of the PE header.

· Section Table

The PE header is followed by the Array Structure section table (stanza table). If there are 5 sections in the PE file, there are 5 (Image_section_header) members in the section table structure array, each containing the attributes of the corresponding section, the file offset, the virtual offset, and so on. The first default member that is ranked in the section table is text, which is the code section header. Additional section Table members (section headers) can be found by traversing the Find method.

· Sections

The real content of the PE file is divided into chunks, called sections (knots). The name of each standard section starts with a dot, but it can also not start with a dot, and the maximum length of a section name is 8 bytes. Sections are arranged in their initial address, rather than in alphabetical order. You can find these sections by using the information provided in the Section table. The code of the program, resources, etc. are placed in these sections.

The partitioning of a section is based on the common attributes of each group of data, not the logical concept. Each section is a piece of data that has a common attribute, such as code/data, read/write, and so on. If the data/code in the PE file has the same attributes, they can be grouped into the same section. The section name is just a different section of the symbol, like "data", "code" is named only for the convenience of identification, only the section of the property setting determines the features and functions of the section. 2.2.2 PE file Memory mapping

Under Windows systems, when a PE application is running, the data structure layout of the PE file on disk is consistent with the data structure layout in memory. When the system loads an executable program, the first is the Windows loader (also known as the PE loader) that maps the files in the disk to the address space of the process, which traverses the PE file and determines which part of the file is mapped. The way to do this is to map the higher offset of the file to the higher memory address. Once the disk file is loaded into memory, the offset address of an item may differ from the original offset address, but it is represented by a transformation from the disk file offset to the memory offset, as shown in Figure 2.2.

PE File Memory mapping

When the PE file is loaded into memory, the in-memory version is called the module, and the starting address of the map file is called the module handle (hmodule), and other data structures in memory can be accessed through the module handle. This initial memory address is also known as the file image base (ImageBase). The main steps to load a PE program are as follows:

(1) When the PE file is executed, the PE loader first assigns a 4GB virtual address space to the process, and then maps the disk space occupied by the program as virtual memory into this 4GB virtual address space. In general, it maps to the location of the 0x400000 in the virtual address space. Loading an application is less time than most people think, because loading a PE file is not a one-time read of the file from disk to memory, but rather simply a memory map, mapping a large file and mapping a small file takes a little longer. Of course, when the code in the file is actually executed, the operating system is still swapping the code in the virtual memory that exists on disk into physical memory (RAM). However, this exchange is not the whole of the virtual address space occupied by the entire file from disk to physical memory all at once, the operating system will exchange as needed and memory consumption of one or more pages. Of course, this exchange is bidirectional, that is, a portion of the physical memory that is not currently in use, and may be swapped to disk.

(2) The PE loader creates process objects and main thread objects and other content in the kernel.

(3) The PE loader searches the import table in the PE file to load the dynamic-link library used by the application. Loading a dynamic-link library is exactly the same as loading a method on an application.

(4) The PE loader executes the code at the address specified in the PE file header, starting the application main thread. 2.2.3 Big-endian and Little-endian

The value in the Image_file_header member machine in the PE header, as defined in WinNT.h, should be 0x014c for the Intel CPU. But when you open the PE file with the hexadecimal editor, you see that word shows 4c 01. In fact, 4c 01 is 0x014c, but because the Intel CPU is Little-endian, so the display is like this. For Big-endian and Little-endian, see the example below. An integer int variable with a length of 4 bytes. When the value of this shaping variable is 0x12345678, for Big-endian, the {12,34,45,78} is displayed, and for Little-endian, {78,45,34,12} is displayed. Note that Intel is using Little-endian. 2.2.4 3 different types of addresses

PE files in various structures, involving a lot of addresses, offsets. Some refer to offsets in the file, and some refer to offsets in memory. The first of the following refers to the address in the file, and the second to third is the address in memory.

The first, the address in the file. For example, using the hexadecimal editor to open the PE file, the address (offset) is the address in the file, using the file address of a structure, you can find the structure in the file.

Second, when the file is mapped throughout the memory, such as some PE analysis software, the entire PE file is mapped into memory, this is the virtual address in memory (VA). If you know the memory address of a structure in this file, then it is equal to the address of this PE file that is mapped to the memory and the address of the structure in the file.

Third, when the PE is executed, the PE file is loaded into the memory by the loader, and the RVA is often required. For example, knowing the RVA of a structure, the program loading point plus RVA can get the memory address of the structure. For example, if the PE file is loaded into the 0x400000 of the virtual address (VA) space, the RVA of a structure is 0x1000, then its virtual address is 0x401000.

The PE file format is used for RVA, mainly to reduce the burden of PE loader. Because each module could be overloaded to any virtual address space, it would be a nightmare if the PE loader were to fix each relocation item. Conversely, if all relocated items use RVA, then the PE loader does not have to worry about those things, that is, it simply repositions the entire module to the new starting Va. This is like the concept of relative paths and absolute paths: RVA resembles a relative path, and VA is like an absolute path.

Note that RVA and VA refer to the memory, not the file. Refers to the offset from the load point instead of a memory address, and only the RVA plus the address of the load point is the actual memory address. 2.3 PE File Structure

There is a definition of the PE file format in the file winnt.h of the Win32 SDK. The variables used in this article, if not specifically stated, are defined in the file winnt.h.

Some PE header file structures generally have 32-bit and 64-bit points, such as Image_nt_headers32 and Image_nt_headers64, and so on, except in the 64-bit version of some of the extended domain, these structures are always the same. is 32-bit or 64-bit, it needs to be defined with a # define _win64, if not, the 32-bit file structure is used. The compiler chooses the appropriate compilation mode based on this definition. 2.3.1 MS-DOS head

The MS-DOS header occupies the first 64 bytes of the PE file, describing the structure of its contents as follows:

L

This structure is included in the WINNT.H

//

typedef struct _IMAGE_DOS_HEADER {//DOS in. exe header

WORD e_magic; Magic Numbers

WORD E_CBLP; Number of bytes in the last page of the file

WORD E_CP; File pages

WORD E_CRLC; Redefine the number of elements

WORD E_cparhdr; Head size, in paragraph

WORD E_minalloc; Minimum additional segment required

WORD E_maxalloc; The maximum additional segment required

WORD E_ss; Initial SS value (relative offset)

WORD e_sp; The initial SP value

WORD e_csum; Checksum

WORD e_ip; The initial IP value

WORD E_cs; Initial CS value (relative offset)

WORD E_LFARLC; Redistribute Table file addresses

WORD E_ovno; Cover number

WORD E_res[4]; Reserved words

WORD E_oemid; OEM identifier (relative e_oeminfo)

WORD E_oeminfo; OEM Information

WORD E_RES2[10]; Reserved words

LONG e_lfanew; The file address of the new EXE header

} Image_dos_header, *pimage_dos_header;

L

One of the first domain e_magic, known as the magic number, is used to represent an MS-DOS compatible file type. All MS-DOS-compliant executables set this value to 0X5A4D, which represents the ASCII character Mz. The reason why MS-DOS's head is sometimes called Mz's head is that. There are many other domains that are useful for MS-DOS, but for Windows NT, there is only one useful domain-the last domain E_lfnew, a 4-byte file offset, and the PE file header is positioned. 2.3.2 Image_nt_header Head

The PE header is immediately following the MS-DOS header and the real-mode program remnants, and describes the structure of its contents as follows:

L

typedef struct _IMAGE_NT_HEADERS {

DWORD Signature; PE file Header flag: "Pe/0/0"

Image_file_header Fileheader; Information on physical distribution of PE files

Image_optional_header32 Optionalheader; Information on the logical distribution of PE files

} image_nt_headers32, *pimage_nt_headers32;

Immediately after the PE file header flag is the PE file header structure, consisting of 20 bytes, which is defined as:

L

typedef struct _IMAGE_FILE_HEADER {

WORD machine;

WORD numberofsections;

DWORD TimeDateStamp;

DWORD pointertosymboltable;

DWORD Numberofsymbols;

WORD Sizeofoptionalheader;

WORD characteristics;

} Image_file_header, *pimage_file_header;

#define Image_sizeof_file_header 20

L

Note that the size of the file header is already defined in the include file, which makes it convenient to get the size of the structure.

Machine: Represents the environment and platform to be executed by the program, and now the known values are shown in table 2.1.

Environment and platform code for application execution

Image_file_machine_i386 (0x14c)

Intel 80386 processors or more

0x014d

Intel 80486 processors or more

0x014e

Intel Pentium processor or above

0x0160

R3000 (MIPS) processor, big endian

image_file_machine_r3000 (0x162)

R3000 (MIPS) processor, little endian

image_file_machine_r4000 (0x166)

R4000 (MIPS) processor, little endian

image_file_machine_r10000 (0x168)

R10000 (MIPS) processor, little endian

Image_file_machine_alpha (0x184)

DEC Alpha AXP Processor

IMAGE_FILE_MACHINE_POWERPC (0X1F0)

IBM Power Pc,little Endian

Numberofsections: Number of segments.

TimeDateStamp: The time the file was established. This value can be used to differentiate different versions of the same file, even if their commercial version number is the same. The format of this value is not explicitly defined, but it is clear that most C compilers have set it to the number of seconds since 1970.1.1 00:00:00 (time_t). This value is also sometimes used as a binding input table of contents. Note: Some compilers will ignore this value.

Pointertosymboltable and Numberofsymbols: Used in debug information, the use is not very clear, but their value is always 0.

Sizeofoptionalheader: The length of the optional head (sizeof Image_optional_header), which can be used to verify the correctness of the PE file.

Characteristics: is a collection of flags, most of which are used in obj or lib files.

Below the file header is the optional header, which is a structure called Image_optional_header, consisting of 224 bytes. Although its name is "optional head", make sure that the head is not "optional" but "required". The optional header contains a lot of important information about the executable image. For example, the initial stack size, the location of the program entry point, the preferred base address, the operating system version, the segment alignment information, and so on. The IMAGE_ Optional_header structure is as follows:

L

#define Image_numberof_directory_entries 16

typedef struct _IMAGE_OPTIONAL_HEADER {

//

Standard domain

//

WORD Magic;

BYTE majorlinkerversion;

BYTE minorlinkerversion;

DWORD Sizeofcode;

DWORD Sizeofinitializeddata;

DWORD Sizeofuninitializeddata;

DWORD Addressofentrypoint;

DWORD Baseofcode;

DWORD Baseofdata;

//

NT additional Domain

//

DWORD ImageBase;

DWORD sectionalignment;

DWORD FileAlignment;

WORD majoroperatingsystemversion;

WORD minoroperatingsystemversion;

WORD majorimageversion;

WORD minorimageversion;

WORD majorsubsystemversion;

WORD minorsubsystemversion;

DWORD Win32versionvalue;

DWORD Sizeofimage;

DWORD sizeofheaders;

DWORD CheckSum;

WORD Subsystem;

WORD DllCharacteristics;

DWORD Sizeofstackreserve;

DWORD Sizeofstackcommit;

DWORD Sizeofheapreserve;

DWORD Sizeofheapcommit;

DWORD Loaderflags;

DWORD numberofrvaandsizes;

Image_data_directory Datadirectory[image_numberof_directory_entries];

} Image_optional_header, *pimage_optional_header;

L

Where the parameter meaning is described below.

Magic: This value seems to always 0x010b.

Majorlinkerversion and Minorlinkerversion: The version number of the linker, this value is not very reliable.

Sizeofcode: The length of the executable code.

Sizeofinitializeddata: The length of the initialized data (data segment).

Sizeofuninitializeddata: Length of uninitialized data (BSS segment).

Addressofentrypoint: The entry RVA address of the code from which the program executes, often referred to as the original entry point of the program Oep (Original Entry points).

Baseofcode: Executable code start location.

Baseofdata: Initializes the data starting position.

ImageBase: Loading the program's preferred RVA address. This address can be changed by loader.

Sectionalignment: The alignment in memory after the segment is loaded.

FileAlignment: How segments are aligned in the file.

Majoroperatingsystemversion and minoroperatingsystemversion: OS version.

Majorimageversion and Minorimageversion: Program version.

Majorsubsystemversion and Minorsubsystemversion: Subsystem version number, which is supported by this domain system. For example, the program runs under NT, and if the subsystem version number is not 4.0, the dialog box cannot display the style.

Win32versionvalue: This value is always 0.

Sizeofimage: The amount of memory (bytes) that is consumed after the program is transferred equals the sum of the lengths of all segments.

Sizeofheaders: The sum of the length of all file headers, which equals the size of the original data from the beginning of the file to the first segment.

CheckSum: Checksum, which is used only in the driver, may be 0 in the executable file. Its calculation method is not exposed by Microsoft, and the Checksummappedfile () function in Imagehelp.dll can evaluate it.

Subsystem: An enumeration value that identifies the subsystem expected by the executable file.

Dllcharacteristics:dll status.

Sizeofstackreserve: preserves the stack size.

Sizeofstackcommit: The number of stacks actually requested after startup can be increased with the actual situation.

Sizeofheapreserve: Keep heap size.

Sizeofheapcommit: Actual heap size.

Loaderflags: No use at this time.

Numberofrvaandsizes: The following directory table entry number, this value is not reliable, can be used as a constant image_numberof_directory_entries to replace it, this value in the current version of Windows is set to 16. Note that if this value is not equal to 16, then the data structure size cannot be fixed, and other variable positions cannot be determined.

DataDirectory: is an image_data_directory array with the number of elements in the array of image_numberof_directory_entries, with the following structure:

L

typedef struct _IMAGE_DATA_DIRECTORY {

DWORD virtualaddress; Start RVA Address

DWORD Size; Length

} image_data_directory, *pimage_data_directory; 2.3.3 Image_section_header Head

In the PE file format, all section headers are located after the optional head. Each section has a head of 40 bytes long and has no padding information. The section header is defined as the following structure:

L

#define IMAGE_SIZEOF_SHORT_NAME 8

typedef struct _IMAGE_SECTION_HEADER {

BYTE Name[image_sizeof_short_name]; Section table name, such as ". Text"

Union {

DWORD physicaladdress; Physical Address

DWORD VirtualSize; True length

} Misc;

DWORD virtualaddress; Rva

DWORD Sizeofrawda

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.