PE file structure (1) Basic Concepts

Source: Internet
Author: User

A pe (portable execute) file is a general term for executable files in windows. Commonly Used files include DLL, EXE, ocx, and SYS. In fact, whether a file is a PE file has nothing to do with its extension, the PE file can be any extension. In Windows, how does one differentiate between executable files and non-executable files? We call loadlibrary to pass a file name. How does the system determine that this file is a dynamic library? This involves the PE file structure.

Generally, the structure of PE files is shown in the following figure: DOS headers, NT headers, section tables, and specific sections start from the start position.


  • The dos header is used to be compatible with the MS-DOS operating system to prompt a piece of text when the file is running on the MS-DOS, in most cases: This program cannot be run in DOS mode. another purpose is to specify the position of the NT header in the file.
  • The NT header contains the main information of the Windows PE file, including the signature of a 'pe ', the PE File Header (image_file_header), and the PE optional Header (image_optional_header32 ), the detailed structure and significance of the header are described in the PE File Header document.
  • Section Table: Description of subsequent sections of the PE file. Windows loads each section according to the section table description.
  • Section: each section is actually a container that can contain code, data, and so on. Each section can have independent memory permissions. For example, the Code Section has read/execute permissions by default, the name and quantity of a section can be defined by yourself, not necessarily three of them.

When a PE file is loaded into the memory, it is called an image. In general, PE files are not exactly the same on the hard disk and in the memory, after being loaded into the memory, the occupied virtual address space is greater than the space occupied on the hard disk. This is because each segment is continuous on the hard disk, the memory is page-aligned, so there will be some "holes" between sections after loading to the memory ".

Because of this alignment, In the PE Structure, the address at a certain position is stored in two ways, for the address in the file stored on the hard disk, the address is called the original storage address or physical address, which indicates the offset from the file header. The other is the address in the image loaded into the memory, which is called the relative virtual address (RVA ), the offset relative to the memory image header.

However, some CPU commands require absolute addresses, such as the addresses of global variables, the absolute address, instead of the Offset relative to the image header, must be used in assembly commands after the address of the passing function is compiled, therefore, it is recommended that the operating system load the PE file to a memory address (this is called the base address), and the compiler will find the addresses of some global variables and functions in the code based on this address, and use these addresses in the corresponding commands. For example, Ida looks like this:

This representation is called virtual address (VA ).

Some people may ask why there is a previous RVA in the simple expression of VA? Although the PE file specifies the base address for loading, there are many DLL files in windows, and each software also has its own DLL. What if the specified address has been occupied by another DLL? If the PE file cannot be loaded to the expected address, the system will help him re-select a suitable base address to load it here, then all the original va will become invalid, the NT header stores the information required for loading the PE file. The VA is invalid until you know which base address the PE will be loaded, therefore, most of the PE file headers use RVA to represent addresses, while VA is used to represent global variables and function addresses in code. Someone has to ask, what should I do if va becomes invalid after the loading base address has changed? The answer is "relocation. The system has its own way to correct these values, which will be described in detail in subsequent articles on table relocation. Since there is a relocation, why can't nt headers use Va to represent addresses? (100,000) Why )? Because not all PES are relocated, early EXE is not relocated.

We all know that the PE file can be exported to other PE files for use, or can be imported from other PE files. How does this happen? The PE file uses the export table to specify the functions to be exported, and the import table to specify the modules from which functions to be imported. The specific structure of the imported and exported tables is described in a separate article.

Well, after reading this article, I believe you should have a general understanding of the PE file. In the future, I will "Disassemble" the common parts of the entire PE file, so stay tuned.

PE file structure (1) Basic Concepts

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.