In-depth explanation of PE file structure

Source: Internet
Author: User

I. PE Structure Basics

I have read a lot of PE Structure stuff, or the overall structure, or a lot of ASM code. I am a little uncomfortable looking at cainiao! So write a post on your own and learn PE. Let's first understand a few questions!
1: Concepts of several addresses
VA: virtual address, that is, the address in the memory!
RVA: relative virtual address, equal to va-imagebase
Offset: the physical address. The file address on the disk is equal to the RVA-imagebase-section offset!
PE Loader: The program can run only after the memory is loaded. The PE Loader is used to load PE files.
Loading: This is the process of loading the instruction data of the Program on the disk to the memory and converting the address!
Connection: connection is the process of making multiple OBJ files into modules for PE loaders!
Let's take a look at the loading and connection concepts below!
2: Why is the address in OD a virtual address?
Many friends know that the address in the memory is va. Why is it va? This is caused by the concept of loading. The above describes the concept of loading. In the DOS era, there is only one task in the memory at any time, so you can load the program directly into the memory. This is called static loading, when a multi-task operating system appears, the so-called multi-task is a single task in a CPU time slice that humans cannot perceive. multiple tasks need to be loaded at the same time. At this time, each task must be placed behind the previous task, therefore, when the program on the disk is loaded into the memory for address translation, you must know the memory address of the previous task. This is the relocable loading method, but with the x86 protection mode, dynamic relocatable mode is generated. In Windows, it is a dynamic relocatable mode during runtime. That is to say, address conversion is performed only when the program is actually running, when the program loads memory, it is only a memory ing. When the program is executed, the system will use the page table in the memory to find the real address of data and commands! Here we will talk about two basic concepts in the paging mechanism: Paging is a process of division, and memory is segmented. What the page table needs to solve is the correspondence between the paging Number of the process and the memory block number, be sure to learn to understand things with the concept of collections!
3: How does a DLL file come from?
All of my friends know that DLL files are dynamic connection libraries. But to make it clearer, let's take a look at their origins. The world is completely different, and there are many intersections, although each program is different, there is always an intersection-the address with the same function. Statistically speaking, if we extract the intersection of many programs, when the program needs it, we don't need to enter it manually. Just introduce it directly. This is why the library concept is introduced! The earliest method to use the database is static linking and static connection. The fundamental difference between this method and dynamic connection is that each program in the former has a library copy, dynamic connection only has one library reference description, and does not write the library connection into every program. Therefore, some programs cannot be executed when the DLL file is missing in windows, at this time, the PE Loader will automatically tell us that an error is reported only when the program is running. Obviously, this indicates that the windows DLL file is a dynamic connection mode during running!
So what are the benefits of it? The library is the intersection of the same functions of a program. We do not know the number of identical functions. We do not know the code for upgrading the same functions, if the library is updated through static connections, the program will obviously have to be rewritten. The disadvantage of static Connection Library is the advantage of dynamic connection, which is also the fundamental motivation for introduction, the whole process is a process of problem solving! The advantage of dynamic connection at runtime solves the trouble of loading all databases into the memory. The library required by a program may not be an inevitable event, therefore, a pure dynamic connection will only make the program inefficient at the macro level! So the dynamic connection method is used in windows!


Ii. PE Structure
All the elements in the PE structure have only one purpose to load the program into memory! This is a fundamental solution and the reason why the structure of PE is generated! To put it bluntly, the problem to be solved is an address translation problem. How can we convert the address on the disk to the address in the memory and facilitate program execution! For better learning, let's first grasp important things and then explain what is useful in general!
1: Overall Structure
Image_dos_header
Dos_stub
Image_nt_headers
Image_section_header
Section 1
Section 2
Section ......
Section N

Size of image_dos_header 64byte and dos_stub
This is a DOS product, because the PE structure was generated earlier, and the first Windows was running in the DOS environment, so this stuff was introduced to be compatible with the dos pe Structure!
Image_dos_heaer is only effective. One is the well-known Mz, which is called e_magic during programming, and the other is e_lfanew, that is, 3C corresponding to opening the PE file with c32asm, it points to the offset of image_nt_header, which is an address! Okay, that's all about image_dos_header,
Dos_stub is a standard EXE file under DOS, similar to the DOS app written with MASM!

Because these two items are of little use, they are often used to modify the PE Header to prevent debugging, kill-free, and save the input table! PE deformation technology is an interesting thing, and the overlapping of image_dos_header and image_nt_header is the most common! You can search for an article to create a micro-PE!

Image_nt_headers 248 byte
Through its name, we will know that there is a header here, because it is a headers, so what is there?
It consists of three parts: image_nt_signature image_file_header image_optinal_header
Image_file_header: Location physical information
Image_optional_header: location memory information, so this is generally a bit of RVA address!


As mentioned above, the fundamental problem of PE structure is to solve address translation! There are several steps to achieve this fundamental problem. The first problem is that you must know whether PE is valid, as a valid PE, e_magic verifies image_dos_header and verifies image_dos_signature (4 byte). Then, it verifies image_nt_signatrue. However, I found that sometimes this is not the case, sometimes errors may occur when PE is deformed. It seems that people really say that it is better to have no books to believe in! At this time, we will first briefly understand image_nt_signature and image_dos_signautre to verify whether the PE file is valid. This is the first solution to the PE structure we are talking about!

Image_file_header 20 bytes
Generally, one of its important structures is sizeofoptionalsection, which refers to the size of the optional Header. The PE Structure implements the automatic loading process, the first structure must be related to the next structure. PE programming uses these connections for some simple arithmetic operations! The other is numberofsections, which specifies the number of table sections, because there is a set of sub-ideas, which I have always liked. Let's first describe my ideas. This idea comes from the Yin, Yang, and Tai Chi diagram. If you also like the picture, you will see a circle in the Yin or Yang. The legend of Bruce Lee says it is an eye, the vertical line is the egg, which is the subtle place of the yin and yang Taiji chart. It describes the yin and yang in every Yin, And the Yin in every yang. It is a set of sub-ideas. We will give you a step by step. Here we will use it to describe the structure array. For example, there are many sections. Each section is a section table structure. They are a structure array, this is obviously the simplest array idea. What is the array idea? It is the set of sub-ideas mentioned above. So numberofsecions is actually telling PE the size of the array of the loader!

Image_optional_header 224 bytes
Image_optional_header, which is commonly used. I will first list them and tell you how to remember them!
Addressofentrypoint: the entry point of the program. This is the last step that everyone is familiar.
Imagebase: base address. I have explained it in the basic concepts above.
Sectionaligment: Memory aligment granularity (this can be found using getsysteminfo ()
Filealignment)
Sizeofimage: image size in memory
Sizeofheaders: The size of all headers. You can use image_base + sizeof_headers to locate image_section_header.
Datadirectory: Directory, which stores the stuff to be provided by the operating system. For example, the DLL file contains 128 bytes.

Well, what are the roles of these important things listed above? When you are confused, please go back to the root. Our problem is how to load the memory. to load the memory, we need to first find the first command or data to be loaded. This is the entry point, find out who to load, and the problem we want to solve is where to load it? Where to load it? This is based on the base address. The program is very large. How can we allocate them more regularly? We must know the minimum regular unit size in the memory. This is the memory alignment granularity, we know the basic unit of Memory loading. We have to find the granularity in the file before loading it. What is the purpose? This can be an address! I also mentioned a phenomenon above, that is, every structure is associated with the next structure. Here we will summarize these connections!
E_lfanew of image_dos_haeder locates the physical offset of image_nt_header. numberofsecion of image_file_header indicates the size of the array of the table structure and opional_header!
Optional_header specifies the first address of the program to be loaded, specifies where it is loaded, and specifies the units in which other commands are loaded. The above is only a preliminary work. It is related to image_section_header, how to locate the location of image_section_header! E_lfanew finds image_nt_headers, And the last header of image_nt_headers, that is, image_optional_header, specifies the address of image_section_header!

Now, let us remember these locations! Remember these locations. Just remember two numbers, 16 and 32.
Image_nt_signature, that is, PE. The two characters indicate the start of the PE Header! Find it and you will find the starting point of image_nt_header!

The last 20 bytes are the content of image_file_header. The focus is as follows: Starting from image_file_header.
Add 16 bytes, that is, addressofentrypoint
ADD 32 bytes. The right side of imagebase is secionalignmen filealignment.
Starting from filealignment, you should note that this is not counted from the end of imga_file_header. Adding 16 bytes starting from filealignment is sizeofimage, followed by sizeofheaders!

I believe you remember 16 and 32, and you will remember most of the content!
For datadirectory, It is 128 bytes. When you see the. text or. Code Section, push back 128 bytes to datadirectory!

Image_data_directory
This is also a structure array, and its positioning method is also carried out through macros. Here I will only talk about the relationship between the input table and it! Image_data_directory has only two important elements. The first is the RVA of the element to which it points, and the second is the size of the element. RVA can determine the address of the element to which it points, the size of an element indicates the size of the structure array!
The input table is an image_import_descripor structure! The RVA value of image_data_directory points to its RVA. Size/its structure size is the array size of image_import_descriptor!


The main contents of image_import_descripor are as follows:
RVA of orignalthunk hnt
RVA of firstthunk IAT
Name DLL file name RVA


You can understand that the values here are all RVA. This RVA points to the RVA of the export function in the DLL. They point to the structure of image_data_thunk, which stores the RVA of the import function, that is to say, to locate a function in a DLL file, you must go through RVA three times before finding it!

Image_section_header
The main content of this structure is two
Virtualaddress: The roffset address in lordpe.
Pointtorawdata: The voffset address in lordpe.
Virtualsize: indicates the memory size. divided by the granularity mentioned above, several basic functional units are required.
Sizeofrawdata: the size of this file. divided by the file granularity mentioned above, we can see that there are several basic functional units!
It is used to calculate the section offset! Its structure size is 20 bytes. In the PE file, it is a structure array. The array size is determined by the numberofsection of image_file_header,
You can open a PE file and find it once! The following is a program implementation code of the C ++ version described above!

# Include <windows. h>
# Include <iostream>
Using namespace STD;
Int main (INT argc, char * argv [])
{
// Define variables
Image_dos_header dosheader;
Image_nt_headers ntheader;
Image_section_header secheader;

Handle hfile;
Char filename [256];
DWORD dwsize;
Int offsetsection = 0, numsection = 0;
Int I = 0, j = 0;
Int offset = 0, int num = 0;


// Take the built-in cmd program as an example.
Getsystemdirectory (filename, 256 );
Strcat (filename, "\ cmd.exe ");
If (hfile = createfile (filename, generic_write | generic_read, file_share_read | file_write _write, null, open_existing, file_attribute_normal, null) = invalid_handle_value)
{
Cout <"invalid_handle_value ";
Return 0;
}

// Setfilepointer ()
Setfilepointer (hfile, 0, 0, file_begin );

// Readfile ()
Readfile (hfile, & dosheader, sizeof (dosheader), & dwsize, null );

If (dosheader. e_magic! = Image_dos_signature)
{
Cout <"no dos Header" <Endl;
Closehandle (hfile );
Return 0;
}
Else
{
Cout <"dos Header" <Endl;
}

Setfilepointer (hfile, dosheader. e_lfanew, 0, file_begin );

Readfile (hfile, & ntheader, sizeof (ntheader), & dwsize, null );

If (ntheader. signature! = Image_nt_signature)
{
Cout <"No PE Header" <Endl;
Closehandle (hfile );
}


Else
{
Cout <"PE valid" <Endl;
Cout <"####### image_file_header ##############" <Endl;
Cout <"machine:" <ntheader. fileheader. Machine <Endl;
Cout <"numberofsections:" <ntheader. fileheader. numberofsections <Endl;
Cout <"sizeofoptionalheader:" <ntheader. fileheader. sizeofoptionalheader <Endl;
Cout <Endl;
Cout <"####### image_optional_header ########" <Endl;
Cout <"addresssofentrypoint:" <ntheader. optionalheader. addressofentrypoint <Endl;
Cout <"imagebase:" <ntheader. optionalheader. imagebase <Endl;
Cout <"sectionalignment:" <ntheader. optionalheader. sectionalignment <Endl;
Cout <"filealignment:" <ntheader. optionalheader. filealignment <Endl;
Cout <"sizeofimage:" <ntheader. optionalheader. sizeofimage <Endl;
Cout <"numberofheaders:" <ntheader. optionalheader. sizeofheaders <Endl;
Cout <Endl;
Cout <"####### RVA address of the image_descritor structure array ####" <Endl;
Cout <"image_import_descriptor RVA:"
}


// Use the image_dos_header.e_lfanew + sizeof (image_nt_signature) + sizeof (image_file_header) + sizeof (image_optional_header) Algorithm
Numsection = ntheader. fileheader. numberofsections;
Offsetsection = dosheader. e_lfanew + 0x18 + sizeof (image_optional_header );
For (I = 0; I <numsection; I ++)
{
Setfilepointer (hfile, offsetsection + sizeof (image_section_header) * I, 0, null );
Readfile (hfile, & secheader, sizeof (image_section_header), & dwsize, null );
For (j = 0; j <8; j ++)
{
// Output each node Header
Cout <secheader. name [J];
}
Cout <Endl;
// Output the information of each section
Cout <"pointtorawofdata:" Cout <"virtualaddress:" // Output the section offset of the first section and calculate the physical offset of the image_import_descriptor structure array.
If (I = 0)
{
// Offset = va-imagebase-section offset
Cout <". Text Segment offset:" Cout <"image_import_descriptor physical offset:" <ntheader. optionalheader. datadirectory [image_directory_entry_import]. virtualaddress-(secheader. VirtualAddress-SecHeader.PointerToRawData) <Endl;
}
}


Closehandle (hfile );
Return 0;

}

 

Iii. Thought Summary
I think you'll understand the code. I'll take a closer look at the next picture to see the snow. please be sure to take a closer look. Now let's get this post! With this content, you can basically change the abnormal PE!
To sum up what ideas have we used!
1: Thought of gathering
It is a function to define different contents into a collection and then generate corresponding content.
2: structure array
A conclusion drawn from the attention Graph
3: Regression
It is not easy to understand why to return to the root cause from the root question.
4: it makes sense to exist.
I personally think this idea is very important. I have explained at the beginning of this article why it is the idea, why is it a virtual address, a simple phenomenon, but it hides the development process of a thing!
5. process understanding
Understanding a process and then understanding its shortcomings or specific content is an overall thought.

In-depth explanation of PE file structure

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.