Learn to write compression shell experience Series 2 master the PE Structure, smooth and smooth

Source: Internet
Author: User

Author: little heart
Previous: http://www.bkjia.com/Article/201202/118214.html
 
There are many structures in the PE file, but in fact, more than half of these structures tell the loader how to load its PE. Normal PE files are always strictly filled with their internal structures, but there are also a small number of deformation PE files that are not so well-regulated, therefore, some information in the PE structure may eventually play a verification role. When looking at the overall structure of PE, we should not stick to the characteristics of every data structure, but should grasp it in general. This article attempts to introduce how loader loads important data in the PE structure from the analogy and example of the loader loading method. There are many excellent articles about PE and the author level is limited. If there is any error, please correct me.
 
Next, we will introduce the important structures one by one:
1. Various Headers
1.1 Dos Header
1.2 pe File Header
1.3 Block Header
2. Import table
3. Export the table
4. Relocate a table
5. Resources
6. Additional data
 
1. Header
 
There are three types of headers, one of which is the Dos header, in order to be compatible with the previous platform. One is the pe File Header, which contains the configuration information loaded by the pe itself, and the other is the block header, which contains information about each block of the pe file. The whole header provides the loader with the following information: "What is the PE attribute and how the whole PE is distributed in memory?"
 
1.1 Dos Header
To be compatible with the previous DOS platform, MS takes into account the MS-DOS header when designing the PE format. One structure is the MZ value, that is, the relative offset 00. This is one of the conditions when the shell program determines whether it is a PE file. Another condition is the distance offset 0x3 ch. An address is saved here, which points to the PE file header. This is where the loader starts to load the PE in the true sense. A "PE" ID is saved in the offset of 0 h from the PE header.
 
Based on the above two information, we can determine whether the file to be shelled is a PE file.
 
1.2 pe File Header
The following offset of the pe id is FILE_HEADER, which is called the PE Header. The NumberOfSections offset from FILE_HEADER 0 x H indicates the number of file blocks. The SizeOfOptionalHeader at 0x14h offset indicates the size of the optional Header. This value can be used to locate the size of the optional Header OptionalHeader in the node table.
 
The offset of the PE Header 0x18h is OptionalHeader. OptionalHeader is a relatively important structure in PE. Although the translation is optional, many of the options are required.
 
The following structure is important:
 
AddressOfEntryPoint
Specifies the place where the code starts to be executed after the file is loaded into the memory, which we often call OEP. For example, after shelling the file and saving the original entry point, modify the value and point it to the shell code so that the PE after shelling can be decompressed first, then jump to the original entry point to continue running.
 
ImageBase
Specifies the file priority load address. Loader first loads the file to the address specified by the ImageBase field. If the specified address fails to be loaded, the file is loaded to other addresses.
The EXE file is usually specified, that is, the load, so the EXE can always be loaded according to this address. Because ms sets a separate 4 GB process space for each exe file, the position of exe is a priority for loading, so it can always ensure that it is successfully loaded with ImageBase.
For DLL files, the priority loading address cannot be used by other DLL files, because one exe can load different dll files to its address space, therefore, DLL usually contains relocation information for secondary location. That is, if the address is occupied, the system calls the offset of the relocation table and relocates the address.
 
SectionAlignment and FileAlignment
Here is the memory and file alignment granularity. For more information, see section 1.
 
DataDirectory
A data table consists of 16 identical IMAGE_DATA_DIRECTORY structures. Because the PE file contains many table blocks, such as import tables, export tables, and relocate tables, a structure is required, stores information about these tables to tell Loader where to load and how to load these tables. The IMAGE_DATA_DIRECTORY structure specifies the location and length of a data table.
 
1.3 Block Header
The block table follows IMAGE_NT_HEADERS. Before loading the block table information, Loader only knows the number of blocks according to the NumberOfSections provided earlier, but does not know the block size, the offsets and names of the blocks, so there is a structure to tell Loader. In this way, the loader can put the block information into the memory according to the loading rules.
 
Note: During block loading, the PE may be loaded as a deformation file. Therefore, when the data code provided by the structure is inconsistent with the loading rule of the loader, it is necessary to load according to the loader alignment granularity.
 
First, you must locate the starting position of each header (see the method described earlier to determine whether it is located in the PE file), because the offset is fixed, you only need to move a fixed offset to read the data.
 
At this point, the important structure of the header table has been introduced. We have found that these structures are read according to fixed offset addressing and tell loader how to load PE files. From a macro perspective, the rough section of a PE file. It can be said that, as previously described, loader only treats this as data, reads it, And then compares it with its own loading rules.
 
 
The following describes several important parts of PE:
2. Import table
Exe uses external DLL to provide functions. During loading, loader needs to allocate the address of each function. When an exe is generated, the connector unifies all calling functions into the call [xxxxxxxx] format in a uniform way, and this address is reserved by the PE file at the beginning, only when loading is required, loader fills in the real function address to the xxxxxxxx address.
 
For example:
 
 
Code:
00401000 CALL [00404000]
00404000 when the file itself is not loaded, it will be reserved (usually it will be filled with the same value as OriginalFirstThunk) until it is loaded by loader, the address of the function to be called.
 
Each DLL corresponds to an IMAGE_IMPORT_DESCRIPTOR. Its important structure is as follows:
 
Code:
Dword OriginalFirstThunk // input name table structure IMAGE_THUNK_DATA-> point to IMAGE_IMPORT_BY_NAME
Dword TimeDateStamp
Dword ForwarderChain
Dword Name1 // DLL name pointer
Dword FirstThunk // input address table structure IMAGE_THUNK_DATA-> point to IMAGE_IMPORT_BY_NAME
There are three structures in total. IMAGE_THUNK_DATA is actually a Dword-sized joint structure. When the maximum value is 1, the 31-bit lower is a function serial number. When the maximum value is 0, indicates a RVA pointing to IMAGE_IMPORT_BY_NAME. IMAGE_IMPORT_BY_NAME indicates a function with a Hint value (which can be understood as a sequence number) and a variable byte structure, pointing to the name of the input function. This is like when we go to school, we will be assigned a student ID in addition to our own name.


 
As shown in the following figure, we can clearly see that before the Loader is loaded into PE, The OriginalFirstThunk of the IMAGE_THUNK_DATA structure actually points to the same structure as the FirstThunk. IMAGE_IMPORT_BY_NAME, loader uses their RVA guidance to locate IMAGE_IMPORT_BY_NAME, so as to know the name of the Input Function, everything is so simple :)


 



 
After the Loader loads the PE, OriginalFirstThunk still points to IMAGE_IMPORT_BY_NAME, but FirstThunk points to the real address of the Input Function in the system.
 
An EXE has many DLL files. If Loader needs to load so much data, how does it determine the end of a structure and the start of another structure? In fact, in the C-language string, we know that the end of a string is expressed by '0'. Similarly, the end of a structure, you can also use the value 0 of the same size as the structure to indicate the end :).


 
 
We can understand how to import tables in this way. Assume that the student organizes the exam (corresponding to the PE file). When the student is assigned a location, we use the student's student ID or name to give the seat number (find MAGE_IMPORT_BY_NAME through OriginalFirstThunk and FirstThunk ). When the students sat down for the exam, we began to distribute the exam and let the students take the exam (when loading the PE file, modify the data pointed to by FirstThunk .). In this way, the seat number is only a means of examination, and the ultimate goal is to let students take the examination (corresponding to loading PE files ).
 
3. Export the table
An important function of Dll is function encapsulation. Therefore, the task that the internal function informs loader is handed over to the export table. As an export table, you need to tell loader what kind of function they provide. One is to inform loader of its export sequence number, and the other is to inform the loader of several functions. It is also like a teacher name. You can set a student ID or name.
 
The important structure of the exported table is as follows:
 
 
Code:
Name DLL Name
Base, and the ordinal number is the index value of the function address.
The total number of functions exported by NumberOfFunctions DLL.
NumberOfNames number of functions derived by name
The RVA address of all AddressOfFunctions Functions
AddressOfNames: name of the function to be exported
AddressOfNameOrdinals
We can understand how an export table works. Assume that the instructor wants to name the student in the class. The first thing to know is the total number of students in the class (corresponding to the total DLL_Name) (corresponding to NumberOfFunctions ). The teacher needs to know the student ID of the first student (corresponding to the Base). In this way, all the students can be clicked (corresponding to AddressOfFunctions) as long as the student ID increases accordingly ). Some students drop out of school, and their names will be deregistered. However, to ensure the uniqueness and consistency of student IDs, student IDs will be retained, and student IDs will not change. Some students are diligent and studious (corresponding to NumberOfNames). They actively answer questions in class and leave a deep impression on the teacher. The teacher is familiar with him and can directly shout out his name. If you are not familiar with it, you can only click the student ID (corresponding to NumberOfFunctions-NumberOfNames ). When you name a student, make sure that the student ID does not cross the border. (that is, make sure that all student IDs are in this class) the order can be different (the order of AddressOfNameOrdinals is not in the order of AddressOfFunctions, and the internal order number is different ). We need to make it clear that if a definite export function exists, the export function must have at least the serial number, and the name is optional.
 
 
The algorithm used to export the table functions is as follows: serial number (Order of AddressOfFunctions + Base) + name (if the serial number exists in the corresponding serial number of AddressOfNames) + address (AddressOfFunctions)
 
Note the following two points for exporting functions:
 
1. AddressOfNameOrdinals serves as a link between AddressOfFunctions and AddressOfNames. Because the order of AddressOfNames corresponds to the order of AddressOfFunctions, this order is in the AddressOfNameOrdinals value, and the value + Base gets the number of the export function. It can be seen that AddressOfNameOrdinals is the addressing basis of AddressOfFunctions and AddressOfNames. It is also the index number of the internal sorting of the import table, and the serial number is obtained through this value.
2. If an export function is exported by serial number, the sequence number calculation method is the order of AddressOfFunctions + Base. After the result is imported to AddressOfNameOrdinals, no corresponding sequence number exists. In addition, if the AddressOfFunctions value of the export function is 0, it indicates that the export function of this sequence number does not exist.
 
 
4. Relocate a table
When a dll is to be loaded, it is expected that the loading address is occupied. At this time, the relocation table will tell loade a series of data, and loade will correct the loading address based on the given data. This is the task of table relocation.
 
For example, the DLL contains the following commands:
 
0010720B A1 00104000 mov eax, dword ptr ds: [0x0040100]
 
I tried to copy the data pointed to by 0x401000 to eax, but when the DLL was actually loaded, loader found that the pointer was moved to the place where 0x0050100h and there was a deviation of 0x100000h. What should I do? Yes. At this time, loader will modify the code according to the information provided by the relocation table as follows:
 
0010720B A1 00105000 mov eax, dword ptr ds: [0x0050100]
 
Obviously, the attempt to load the address has been set in the PE Optional header, and the actual loading address is dynamically obtained in loader loading. The only thing you need to tell loader is the location where the code needs to be relocated.
 
The relocation table should be well understood in the PE format. Its structure is to tell loader how to find itself and then give a series of data (in fact, it is to relocate the address of the Code ), correct the new load address in a consistent way. Just as when you go to the library to return books (to load DLL), you will find that the second floor of the book has to be lined up (the loaded address needs to be relocated ), at this time, you checked the registration serial number of the book to search for other places where the book can be returned (search for the location of the relocation code, select another loading address), and found that the book can be returned on the fourth floor without waiting in line, then you run to the fourth floor to return the book (load the updated code ).
 
The structure of the relocated table is relatively simple, as shown below:
 

Code:
Dword VirtualAddress // start address of relocation
Dword SizeOfBlock // size of the current relocated Structure
Word TypeOffset // array structure. The value of four bits in height indicates similar relocation (in X86 systems, this value is always IMAGE_REL_BASED_HIGHLOW). The value of 12 bits in height indicates the Relocation Address, the value + VirtualAddress can locate the address to be modified.
Shows the overall structure:
 
 
When a relocation table item ends, it will end with the relocation of an IMAGE_REL_BASED_ABSOLUTE type. This relocation does nothing and is only used for filling, so that the structure of the next relocation item is based on a four-byte line. All relocation structures end with the relocation table with VirtualAddress 0.
 
 
5. Resource segment
Resources can be said to be in the PE, the structure is relatively complex, and the type is also relatively complex. Various interfaces in windows, including sound, images, and even a piece of code stub, can be saved as resources.
Resources are divided into 16 types, each type has a different project name, each project will contain different data, then the resource table is based on the attribute such as the resource, to arrange its structure. For example, common resource trees:


 
A pe resource table generally contains at most three layers. The first layer generally indicates the resource type, and the second layer indicates each resource item under the same resource type, the third layer represents the attributes of each resource item. There are three important data structures to implement these functions respectively:
1. IMAGE_RESOURCE_DIRECTORY
 
You only need to care about the last two structures: NumberOfNamedEntries and NumberOfIdEntries. The former indicates the number of resources using the name, and the latter indicates the number of resource entries using the ID, the sum of directory items in the Resource Directory.
 
2. IMAGE_RESOURCE_DIRECTORY_ENTRY
 
This structure is followed by IMAGE_RESOURCE_DIRECTORY, which contains two members: name and OffsetToData. The former points to the directory item name or ID number, and the latter indicates the address of the resource data or subdirectory offset. It has different meanings based on the location of the directory.
 
Name
1. indicates the type of the resource when it is located in the first-level directory.
2. indicates the resource ID or name when it is located in the second-level directory.
3. in the third directory, when the highest bit is 0, it indicates that the value is ID. When the highest bit is 1, the lowest Bit is pointer, pointing to a unicode encoding structure, this structure represents a unicode-encoded string.
 
Offset
This is one of them. Its own highest bit determines its own meaning.
1. If the highest bit is 1, the low data points to the next IMAGE_RESOURCE_DIRECTORY address.
2. If the maximum bit is 0, it points to the IMAGE_RESOURCE_DATA_ENTRY structure.
 
It is worth noting that when name and offset are used as pointers, they are calculated from the beginning of the resource segment, rather than starting from the file header.
 
3. IMAGE_RESOURCE_DATA_ENTRY
This structure is actually the most critical structure for locating resource data. No matter how many layers of the preceding directory are there, laoder traverses the preceding structure and makes all preparations to find the resource address. The structure has four members:
 
Code:
OffsetToData // RVA of resource data
Size // The length of the resource data
CodePage // code page, usually set to 0
Reserved // Reserved value segment
Here, the OffsetToData and Size of resource data tell the loader where to load and the Size of the load.
 
 

 
Once the loader locates the beginning of the resource segment, the rest of the structure only needs to follow the predefined format step by step. For example, in the 0XE17H field, the data is 0x80 h, which tells loader that there is an IMAGE_RESOURCE_DIRECTORY directory on the second layer, loader continues to traverse, and 0XE2Fh also tells loader, the third layer also has an IMAGE_RESOURCE_DIRECTORY directory. loader continues to traverse until 0xe48 H reads the RESOURCE_DATA structure, so that the real resource location can be found. Loader can load resources.

 
6. Additional data
Attaching, as its name implies, is the affiliated data. Why is there such a name? Because loader determines the size of a PE file (that is, the size of the file to be loaded), which is the maximum disk offset of the file and the size of the last file. The data exceeding the size is not loaded by Loader, that is, it is not mapped to the memory space. It's just like the additional score during the exam. Although there is, it is not included in the total score :). Additional data can contain many things, such as special data calls and verification of the program, and configuration information during running. Some viruses place normal infected programs on additional data.
 
It is easy to process additional data. You only need to locate the end of the last section, and the additional size is equal to the file size minus the offset at the end of the section.
For example:


 
We can see that the Offset + size of the last section is 0x1000 h. That is to say, the offset at the end of the last section should be 0x9ffh. The subsequent data becomes additional data and will not be loaded by loader.
 

 
Mastering the pe structure is the basis for operating pe files. The most common structure of PE files is described above. I hope this article will help beginners, progress together: But the level is limited. please correct me more.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.