PE Study Notes (2)

Source: Internet
Author: User


V. Section Table)

The section table is a structure array next to the PE Header. The number of members of this array is determined by the value of the numberofsections field in the file header (image_file_header) structure. The member structure of the table is also named image_section_header (40 bytes ). Its structure definition:

Typedef struct _ image_section_header {
Byte name [image_sizeof_short_name];
Union {
DWORD physicaladdress;
DWORD virtualsize;
DWORD virtualaddress;
DWORD sizeofrawdata;
DWORD pointertorawdata;
DWORD pointertorelocations;
DWORD pointertolinenumbers;
Word numberofrelocations;
Word numberoflinenumbers;
DWORD characteristics;
} Image_section_header, * pimage_section_header;


1. image_sizeof_short_name: the node name of no more than 8 bytes. The node name is only a flag. We can select any name or even leave it empty. It cannot end with null. The name is not an asciiz string, so it does not end with null.

2. physicaladdress: Specifies the file address.

3. virtualsize: the meaning of this field is related to the program type. If it is exe, it indicates the total size after the node is loaded into the memory, which is the size before they are adjusted to a multiple closest to the file alignment granularity. Later, sizeofrawdata is the adjusted size. This field is meaningless for the OBJ document.

4. virtualaddress: The RVA (relative virtual address) in this section ). The PE Loader will read this value when ing the section to the memory. Therefore, if the Domain value is 1000 h and the PE file is packed at 401000 H, this section will be loaded to H. Microsoft sets the value of this field in the first section to 0x1000 h. This field is meaningless for OBJ files, and it is always 0.
5. sizeofrawdata: size of the section after file alignment processing. The PE Loader extracts the value of this field to understand the number of bytes to be mapped into the memory. Assume that the size of an object is 0x200. If the previous virtualsize field indicates that the length of this section is 0x388 bytes, the value of this field is 0x400, it indicates the length of 0x400 bytes in this section. In OBJ, this corresponds to the actual section size specified by the compiler or consortium compiler.

6. pointertorawdata: This is the file-based offset in this section. The PE Loader uses this field value to locate the position of the section data in the file. If you set up a pe program (instead of loading it by the operating system loader) in the memory ing mode, you must find the information in this section based on this value, instead of following the RVA value in virtualaddress.

7. pointertorelocations: In objs, This is the offset based on the program and is used to point to the relocation information of the Section. The relocation information of each OBJ section follows the section information. In exes, this field (one by one) is meaningless and always 0. However, when the linker generates an EXE, it will decide most of the records to be corrected (fixups), leaving only the relocated addresses of the base addresses and the relocated addresses of the imported function, which will be solved during the loading process. The two identical pieces of information are placed in the base relocation section and imported function section. Therefore, exes does not need to be relocated after each section.

8. pointertolinenumbers: offset of the row number table (based on the Program ). The line number table is related to the source code line number and its location mapped to the memory. In the EXE file, the row number information is placed at the end of the program. If there is no coff row, set it to 0.

9. numberofrelocations: Number of relocation items in the relocation table (directed by pointertorelocations. This field is only used in OBJ. In EXE, the value is 0.

10. numberoflinenumbers: Number of row numbers in the row number table (pointed by pointertolinenumbers.

11. characteristics: Contains tags to indicate the attributes of a section, such as whether the section contains executable code, initialized data, uninitialized data, and whether it is writable or readable. The following are some tags:

Image_scn_type_reg reserved.
Image_scn_type_dsect reserved.
Image_scn_type_noload reserved.
Image_scn_type_group reserved.
Image_scn_type_no_pad reserved.
Image_scn_type_copy reserved.
Image_scn_cnt_code section contains executable code.
Image_scn_cnt_initialized_data section contains initialized data.
Image_scn_cnt_uninitialized_data section contains uninitialized data.
Image_scn_lnk_other reserved.
Image_scn_lnk_info reserved.
Image_scn_type_over reserved.
Image_scn_lnk_comdat section contains comdat data.
Image_scn_mem_fardata reserved.
Image_scn_mem_purgeable reserved.
Image_scn_mem_16bit reserved.
Image_scn_mem_locked reserved.
Image_scn_mem_preload reserved.
Image_scn_align_1bytes align data on a 1-byte boundary.
Image_scn_align_2bytes align data on a 2-byte boundary.
Image_scn_align_4bytes align data on a 4-byte boundary.
Image_scn_align_8bytes align data on a 8-byte boundary.
Image_scn_align_16bytes align data on a 16-byte boundary.
Image_scn_align_32bytes align data on a 32-byte boundary.
Image_scn_align_64bytes align data on a 64-byte boundary.
Image_scn_lnk_nreloc_ovfl section contains extended relocations.
Image_scn_mem_discardable section can be discarded as needed.
Image_scn_mem_not_cached Section cannot be cached.
Image_scn_mem_not_paged Section cannot be paged.
Image_scn_mem_shared section can be shared in memory.
Image_scn_mem_execute section can be executed as code.
Image_scn_mem_read section can be read.
Image_scn_mem_write section can be written.

To traverse a table, follow these steps:

1. Verify the PE file validity.
2. Locate the start address of the PE Header.
3. Obtain the number of segments from the numberofsections field of the file header.
4. Two methods are used to locate the Section Table: imagebase + sizeofheaders or the start address of the PE Header + the structure size of the PE Header. (The Section Table is followed by the PE Header ). If you do not use the File ing method, you can use setfilepointer to directly locate the file pointer to the section table. The file offset of the section table is stored in the sizeofheaders field (sizeofheaders is the structure member of image_optional_header ).
5. process the structure of each image_section_header.

6. Import table)

6.1 import functions:

An import function is called by a module but is not in the caller module. Therefore, it is named "import )". The import function is actually located in one or more DLL. The caller module only retains some function information, including the function name and its resident DLL name.

Before the pe program is loaded into the memory, the content in. data stored in the PE file is used by the loader to determine the function location and fix them for image use. After being loaded,. idata contains pointers to the import functions of EXE/DLL.

6.2. Data directory:

Data Directory is an image_data_directory structure array with 16 members. Data directory contains the location and size information of important data structures in PE files. Each member contains information about an important data structure.

Each member of data directory is of the image_data_directory structure type. Its definition is as follows:

Typedef struct _ image_data_directory {
DWORD virtualaddress;
DWORD size;
} Image_data_directory, * pimage_data_directory;

Image_data_directory structure member description:
1. virtualaddress: it is actually the relative virtual address (RVA) of the data structure ). For example, if the structure is about import symbols, the domain contains the RVA pointing to the image_import_descriptor array.

2. Size: the number of bytes that contain the data structure pointed to by virtualaddress.

6.3. common methods for finding important data structures in PE files:

1. Locate the PE Header from the DOS header.
2. Read the data directory address from optional Header.
3. image_data_directory structure size multiplied by the index number of the search structure. For example, to search for the position information of import symbols, you must use the image_data_directory structure size (8 bytes) multiply by 1 (index number of import symbols in data diecloud ).
4. Add the data diecloud address to the preceding result to obtain the image_data_directory structure item containing the queried data structure information.

6.4 import tables:

The virtualaddress of the first item in the data directory array contains the address of the import table. The import table is actually an array of image_import_descriptor structures. Each structure contains information about a DLL of the PE File Import function. The array ends with a member of all 0.

Image_import_descriptor structure composition:

Typedef struct _ image_import_descriptor {
Union {
DWORD characteristics; // 0 for terminating null import Descriptor
DWORD originalfirstthunk; // RVA to original unbound IAT (pimage_thunk_data)
DWORD timedatestamp; // 0 if not bound,
//-1 if bound, and real date/time stamp
// In image_directory_entry_bound_import (New bind)
// O. W. date/time stamp of DLL bound to (old bind)

DWORD forwarderchain; //-1 if no forwarders
DWORD name;
DWORD firstthunk; // RVA to IAT (if bound this IAT has actual addresses)
} Image_import_descriptor;

Image_import_descriptor structure member description:
1. the first item of the structure is a Union sub-structure. In fact, this Union sub-structure only adds an alias to originalfirstthunk. You can also call it characteristics ". This member item contains the RVA pointing to an image_thunk_data structure array.

2. timedatestamp: the time when the program is generated. This domain is usually 0. Microsoft's bind program can write the DLL generation time corresponding to image_import_descriptor here.

3. forwarderchain: this field involves forwarding, which means that one DLL function is calling another DLL. For example, in WINNT, kernel32.dll transfers some of its output functions to NTDLL. dll. The application may assume that it calls kernel32.dll, but in fact it calls NTDLL. dll. This domain contains an index pointing to the firstthunk array. The function specified by this index is a transfer function.

3. Name: contains the RVA pointing to the DLL name, that is, the pointer to the DLL name, which is also an ASCII string.

4. firstthunk: Very similar to originalfirstthunk. It also contains the RVA pointing to an image_thunk_data structure array (of course, this is another image_thunk_data structure array ).

In the image_import_descriptor array, the most important part is the imported DLL name and two image_thunk_data arrays. Each image_thunk_data corresponds to an import function. In EXE, two arrays (directed by the characteristics and firstthunk fields respectively) exist in parallel and both end with a null character.

Why do we need two parallel arrays? The first array (pointed by characteristics) is never modified. Sometimes it is called the hint-name table. The second array (pointed by firstthunk) is rewritten by the loader. The loader checks Each image_thunk_data one by one and finds the address of the function recorded by it, and then writes the address to the DWORD image_thunk_data. The image_thunk_data array has been rewritten by the loader as the address of the input function, so it is also called import Address Table (IAT ). IAT is a writable area. The API hook exploits this feature. After the PE Loader loads the PE, The image_thunk_data to which the firstthunk points is rewritten, while the image_thunk_data pointed by characteristics is not rewritten. Therefore, if you look for the import function name, the PE Loader can also find the function name based on image_thunk_data pointed to by characteristics.
6.4 image_thunk_data:

Image_thunk_data is a collection of DWORD types. We usually interpret it as a pointer to an image_import_by_name structure. Note that image_thunk_data contains a pointer to an image_import_by_name structure, rather than the structure itself.

Image_thunk_data structure definition:
Typedef struct _ image_thunk_data32 {
Union {
Pbyte forwarderstring;
Pdword function;
DWORD ordinal;
Pimage_import_by_name addressofdata;
} U1;
} Image_thunk_data32;

Image_thunk_data is determined only after PE is loaded. The Win32 loader uses the initial content of image_thunk_data (possibly the function name or function serial number) to find the position of the input function. Then the loader changes the content of image_thunk_data with the obtained address.

6.5. image_import_by_name:

Image_import_by_name structure definition:
Typedef struct _ image_import_by_name {
Word hint;
Byte name [1];
} Image_import_by_name, * pimage_import_by_name;

1. Hint: indicates the index number of the function in the export table where the DLL resides. This field is used by the PE Loader to quickly query functions in the DLL export table. This value is not required. Some connectors set this value to 0.

2. Name: name of the function containing the imported function. The function name is an ASCII string. Note that although the name size is defined as byte, it is actually a variable size field, but we have no better way to represent the variable size field in the structure. This structure is provided for viewing the structure of the description name.

In some cases, some functions are only exported by ordinal numbers. That is to say, they cannot be called by function names but can only be called by their locations. In this case, the image_import_by_name structure of the function does not exist in the caller module. The difference is that the low-level characters of the image_thunk_data value of the function indicate the ordinal number of the function, and the highest binary (MSB) is set to 1. For example, if a function is only exported from the ordinal number and its ordinal number is 1234 H, the image_thunk_data value of the function is 80001234 H. Microsoft provides a convenient constant to test the MSB bits of the DWORD value, that is, image_ordinal_flag32, whose value is 80000000 H.

6.6 perform the following steps to list all the import functions of a PE file:

1. Check whether the file is a valid PE.
2. Locate the PE Header from the DOS header.
3. Obtain the address of the optionalheader data directory.
4. Go to the second member of the data directory to extract its virtualaddress value.
5. Use the above value to locate the first image_import_descriptor structure.
6. Check the originalfirstthunk value. If the value is not 0, move the RVA value in originalfirstthunk to the RVA array. If originalfirstthunk is 0, use the firstthunk value instead. Some connectors set originalfirstthunk to 0 when generating PE files, which is a bug. For the sake of security, check the value of originalfirstthunk first.
7. For each array element, we compare whether the element value is equal to image_ordinal_flag32. If the highest binary of the element value is 1, the function is imported from the ordinal number, and the ordinal number can be extracted from the low bytes of the value.
8. If the highest binary of an element value is 0, you can use this value as RVA to transfer it to the image_import_by_name array. Skipping the hint is the function name.
9. jump to the next array element and extract the function name until the bottom of the array (it ends with null ). Now we have traversed a DLL import function and processed the next DLL.
10. jump to the next image_import_descriptor and process it, so that the loop ends until the array is bottomed out. (The image_import_descriptor array ends with an all-0 field element ).

6.7. Bound import:

When the PE Loader loads the PE file, check the import table and map the related DLLs to the process address space. Then, as we do, traverse the image_thunk_data array and replace the image_thunk_datas value with the actual address of the import function. This step takes a lot of time. If the programmer can correctly predict the function address in advance, the PE Loader does not need to correct the image_thunk_datas value every time it loads the PE file. Bound import is the product of this idea.
Microsoft's compilation tool similar to visual studioincludes bind.exe, which checks the import table of the PE file and replaces the image_thunk_data value with the real address of the import function. When a file is loaded, the PE Loader must check the validity of the address. If the dll version is different from the information stored in the PE file, or the DLLs needs to be relocated, the loader considers that the previously calculated address is invalid. It must traverse the array pointed to by originalfirstthunk to obtain the new address of the import function.

VII. Export table (export table)

When the PE Loader runs a program, it loads the related DLLs into the address space of the process. Then, based on the imported function information of the main program, find the actual function address in the relevant DLLs to correct the main program. The PE Loader searches for export functions in DLLs. The pe program places information about its export function in. edata.

To export a function to another dll/EXE, you can use either the function name or the sequence number to export the function. For example, if a DLL needs to export a function named "getsysconfig", if it is exported by the function name, other DLLs/exes must use the function name getsysconfig to call this function. Another method is to export data by ordinal number. The ordinal number is the unique 16-digit number of a function in the specified DLL. It is unique in the DLL. For example, in the above example, the DLL can be exported by ordinal number, for example, 16, so other DLLs/exes to call this function must use this value as the getprocaddress call parameter. This is the so-called export by ordinal number.

7.1 The export table is the first member of the Data Directory, also known as image_export_directory. Structure Definition:
Typedef struct _ image_export_directory {
DWORD characteristics;
DWORD timedatestamp;
Word majorversion;
Word minorversion;
DWORD name;
DWORD base;
DWORD numberoffunctions;
DWORD numberofnames;
DWORD addressoffunctions; // RVA from base of Image
DWORD addressofnames; // RVA from base of Image
DWORD addressofnameordinals; // RVA from base of Image
} Image_export_directory, * pimage_export_directory;

Image_export_directory structure member description:

1. characteristics: this field is useless and always 0.

2. timedatestamp: the time when the program is generated.

3. majorversion/minorversion: no actual use, 0.

4. Name: a rva value that points to an asciiz string (DLL name, such as mydll. dll ). The real name of the module. This field is required because the file name may change. In this case, the PE Loader uses this internal name.

3. Base: Base. The ordinal number is the index value of the function address array.

4. numberoffunctions: Total number of functions/Symbols exported by the module.

5. numberofnames: number of functions/Symbols exported by name. This value is not the total number of functions/Symbols exported by the module, which is given by numberoffunctions above. The value of this field can be 0, indicating that the module may only be exported by ordinal number. If the module does not export any function/symbol at all, the RVA of the exported table in the data directory is 0.

6. addressoffunctions: The module has an rvas array pointing to all functions/symbols. This field is the RVA pointing to the rvas array. In short, the rvas of all functions in the module is saved in an array, and this field points to the first address of this array.

7. addressofnames: similar to the previous field, the module has an rvas array pointing to all function names. This field is the RVA pointing to the rvas array.

9. addressofnameordinals: RVA, pointing to a 16-Bit Array containing the ordinal number of related functions in the above addressofnames array.

The export table is designed to facilitate the PE Loader.
First, the module must save the addresses of all export functions for the PE Loader to query. The module saves the information in the array pointed to by the addressoffunctions domain, and the number of array elements is stored in the numberoffunctions domain. Therefore, if the module exports 40 functions, the array to which addressoffunctions points must have 40 elements, and the numberoffunctions value is 40.

Now, if some functions are exported by name, the module must keep the information in the file. These named rvas are stored in an array for the PE Loader to query. This array is directed by addressofnames, and numberofnames contains the number of names. Consider the working mechanism of the PE Loader. It knows the function name and wants to obtain the address of these functions. So far, the module has two modules: the name array and the address array, but there is no link between them. Therefore, we also need something to contact the function name and its address. PE reference indicates that the index of the address array is used as the join. Therefore, when the PE Loader finds the matching name in the name array, it also obtains the index pointing to the corresponding element in the address table. These indexes are stored in another array (the last one) pointed to by the addressofnameordinals domain. Because the array serves as the contact name and address, the number of elements must be the same as that of the name array. For example, each name has only one relevant address, but not necessarily: each address can have several names. Therefore, we get the "alias" for the same address ". In order to connect, the name array and the index array must be used in parallel. For example, the first element of the index array must contain the index of the first name, and so on.

7.2 if we have an export function name and want to obtain the address, we can do this:

1. Locate the PE Header.
2. Read the virtual address of the exported table from the data directory.
3. Locate the export table and obtain the number of names (numberofnames ).
4. traverse the matched names of the arrays pointed to by addressofnames and addressofnameordinals in parallel. If the matching name is found in the array pointed to by addressofnames, the index value is extracted from the array pointed to by addressofnameordinals. For example, if the RVA with the matching name is found to store the 77th elements in the addressofnames array, the 77th elements in the addressofnameordinals array are extracted as the index value. If numberofnames elements are traversed, the current module has no name.
5. The value extracted from the addressofnameordinals array is used as the index of the addressoffunctions array. That is to say, if the value is 5, you must read the 5th elements in the addressoffunctions array. This value is the RVA of the function.

7.3 assume that we only have the ordinal number of the function, so how can we obtain the function address? You can do this:

1. Locate the PE Header.
2. Read the virtual address of the exported table from the data directory.
3. Locate the export table and obtain the nbase value.
4. The loss of nbase is worth the index pointing to the addressoffunctions array.
5. Compare the value with numberoffunctions. If the value is greater than or equal to the latter, the ordinal number is invalid.
6. You can use the index above to obtain the rva in the addressoffunctions array.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.