---restore content starts---
The format of an operating system ' s executable file was in many ways a mirror of the operating system.
Winnt.h is a very important header file that defines the internal structure of most windows.
The PE format is documented (in the loosest sense of the word) in the WINNT. H header file. About midway through WINNT. H is a section titled "Image Format." This section starts out with small tidbits from the old familiar MS-dos MZ format and NE format headers before moving into The newer PE information. WINNT. H provides definitions of the raw data structures used by PE files, but contains only a few useful comments of what the structures and the flags mean. Whoever wrote the header file for the PE format (the name Michael J. O ' Leary keeps popping up) was certainly a believer in Long, descriptive names, along with deeply nested structures and macros. When the coding with WINNT. H, it's not uncommon to has expressions like this:
The author Matt Pietrek wrote a program for analyzing PE format, open source, and the code address is:
Http://github.com/zed-0xff/pedump
Also, you can upload PE files to the website http://pedump.me/for online PE format analysis.
The results of the online analysis are as follows:
Imports
Let's go over a few fundamental ideas this permeate the design of a PE file (see Figure 1). I'll use the term "module" to mean the code, data, and resources of a executable file or DLL that has been loaded into m Emory.
The first important thing to know about PE files are that the executable file on disk are very similar to what the module WI ll look like after Windows have loaded it. The Windows loader doesn ' t need to work extremely hard-to-create a process from the disk file. The loader uses the memory-mapped file mechanism to map the appropriate pieces of the a file into the virtual Addre SS Space. (EXE file storage format on disk is similar to the storage format after being loaded into memory, so use a memory-mapped mechanism to map EXE to virtual address space)
For Win32, all the memory used by the module for code, data, resources, import tables, export tables, and other required m Odule data structures is in one contiguous block of memory. All your need to know in this situation are where the loader mapped the file into memory. You can easily find all the various pieces of the "module by" following pointers that is stored as part of the image.
Another idea you should was acquainted with is the Relative Virtual Address (RVA). Many fields in PE files is specified in terms of RVAs. An RVA are simply the offset of some item, relative to where the file is memory-mapped. For example, let's say the loader maps a PE file into memory starting at address 0x10000 in the virtual address space. If a certain table in the image starts at address 0x10464 and then the table's RVA is 0x464.
To convert an RVA into a usable pointer, simply add the RVA to the base address of the module. The base address is the starting address of a memory-mapped EXE or DLL and was an important concept in Win32. For the sake of convenience, Windows NT and Windows uses the base address of a module as the module ' s instance handle ( HINSTANCE). What's important for Win32 are so you can call GetModuleHandle for any DLL that your process uses to get A pointer for accessing the module ' s components. (Base address plus RVA to get the available addresses, and the base address is the DLL or EXE file module handle, can be obtained through GetModuleHandle)
The final concept that's need to know on PE files is sections. Unlike segments, sections is blocks of contiguous memory with no size constraints. Some sections contain code or data, your program declared and uses directly, while other data sections is created for The linker and librarian, and contain information vital to the operating system. In some descriptions of the PE format, sections is also referred to as objects. The term object have so many overloaded meanings that I'll stick to calling the code and data areas sections.
The PE Header
Executable file formats, the PE file has a collection of fields at a known (or easy to find) Define what is the rest of the file looks like. This header contains information such as the locations and sizes of the code and data areas, what operating system the FIL E is intended for, the initial stack size, and other vital pieces of information. As with other executable formats from Microsoft, this main header is ' t at the very beginning of the file. The first few hundred bytes of the typical PE file is taken up by the MS-dos stub. This stub was a tiny program, the prints out something to the effect of ' This program cannot was run in MS-DOS mode.
As in other Microsoft executable formats, your find the real header by looking up its starting offset, which are stored in t He ms-DOS stub header. The WINNT. H file includes a structure definition for the MS-DOS stub header that makes it very easy-to-look up where the PE header s Tarts. The E_lfanew field is a relative offset (or RVA, if your prefer) to the actual PE header. To get a pointer to the PE header in memory, just add this field ' s value to the image base:
Once you has a pointer to the main PE header and the fun can begin. The main PE header is a structure of type image_nt_headers, which are defined in WINNT. H. This structure is composed of a DWORD and substructures and are laid out as follows (PE header is a image_nt_headers structure type):
The straight point of view is this:
This is the point of detail:
The Signature field viewed as ASCII text is "Pe\0\0".
Following the PE signature DWORD in the PE header is a structure of type image_file_header. The structure contain only is the most basic information about the file.
Image_file_header fields
-
WORD Machine
The CPU, the this file, is
-
intended for. The following CPU IDs is defined:
-
WORD NumberOfSections
-
The number of sections in the file.
-
DWORD TimeDateStamp
The time
-
, the linker (or compiler for a OBJ file) produced this file. This field holds the number of seconds since December 31st, 1969, at 4:00PM (previously thought how to get the time stamp of EXE file, original PE file can be extracted)
-
DWORD PointerToSymbolTable
-
The file offset of the COFF symbol table. This field is a used in OBJ files and PE files with COFF debug information. PE files support multiple debug formats, so debuggers should refer to the Image_directory_entry_debug ENTRY in the Data di Rectory (defined later).
-
DWORD NumberOfSymbols
-
The number of symbols in the COFF symbol table. See above.
-
WORD SizeOfOptionalHeader
-
The size of a optional header that can follow this structure. In Objs, the field is 0. In executables, it's the size of the Image_optional_header structure that follows this structure.
-
WORD Characteristics
-
Flags with information about the file. Some Important fields:
Other fields is defined in WINNT. H
The third component of the PE header is a structure of type image_optional_header. For PE files, this portion certainly isn ' t optional.
The COFF format allows individual implementations to define a structure in additional information beyond the standard Image_file_header. The fields in the Image_optional_header is what the PE designers felt is critical information beyond the basic Informati on the Image_file_header.
All of the fields of the image_optional_header aren ' t necessarily important to know about. The more important ones to be aware of is the ImageBase and the Subsystem fields. You can skim or skip the description of the fields.
Image_optional_header fields
-
-
WORD Magic
-
-
appears to be a signature WORD of some sort. Always appears to is set to 0x010b.
-
-
BYTE MajorLinkerVersion
-
-
BYTE MinorLinkerVersion
-
The version of the
-
linker that produced this file. The numbers should be displayed as decimal values, rather than as Hex. A Typical linker version is 2.23.
-
-
DWORD SizeOfCode
-
The combined and rounded-up size of all the
-
code sections. Usually, most files only has one code section and so this field matches the size of the. Text section.
-
-
DWORD SizeOfInitializedData
-
This was supposedly the total size of all the sections that was
-
composed of initialized data (not including code SEGM Ents.) However, it doesn ' t seem to being consistent with what's appears in the file.
-
-
DWORD SizeOfUninitializedData
-
-
the size of the sections that the loader commits space for in the virtual address space, but that's don't take up any Space in the disk file. These sections don ' t need to has specific values at program startup and hence the term uninitialized data. Uninitialized data usually goes into a section called. BSS.
-
-
DWORD AddressOfEntryPoint
-
The
-
address where the loader would begin execution. This was an RVA, and usually can usually was found in the. Text section.
-
DWORD BaseOfCode
-
The RVA where the file ' s code sections begin. The code sections typically come before the data sections and after the PE header in memory. This RVA was usually 0x1000 in Microsoft linker-produced EXEs. Borland ' s TLINK32 looks like it adds the image base to the RVA of the first code sections and stores the result in this fie Ld.
-
DWORD BaseOfData
The
-
RVA where the file ' s data sections begin. The data sections typically come last in memory, after the PE header and the code sections.
-
-
DWORD ImageBase
-
When
-
the linker creates an executable, it assumes, the file would be memory-mapped to a specific location in me Mory. That
address was storedin this field, assuming a load address allows linker optimizations to take place. If The file really is memory-mapped to this address by the loader, the code doesn ' t need any patching before it can be run . In executables produced for Windows NT, the default image base is 0x10000. For DLLs, the default is 0x400000. In Windows 0x10000, the address of can ' t used to load 32-bit EXEs because it lies within a linear address region shared By all processes. Because of this, Microsoft have changed the default base address for Win32 executables to 0x400000. Older programs that were linked assuming a base address of 0x10000 would take longer to load under Windows because the L Oader needs to apply the base relocations.
-
-
DWORD SectionAlignment
-
When
-
the mapped into memory, each of the sections is guaranteed to start at a virtual address that's a multiple of this value. F Or paging purposes, the default section alignment is 0x1000.
-
-
DWORD FileAlignment
-
-
the in the PE file, the raw data, the comprises, guaranteed to start at a multiple of this value. The default value is 0x200 bytes, probably to ensure, which sections always start at the beginning of a disk sector (which a Re also 0x200 bytes in length). This field was equivalent to the Segment/resource alignment size in NE files. Unlike NE files, PE files typically don ' t has hundreds of sections, so the space wasted by aligning the file sections is Almost always very small.
-
-
(
SectionAlignment
是在内存中的对齐单位,
FileAlignment
是PE格式中的对齐单位,所以从中也可以看出来,磁盘中的PE和内存中的PE的大小是不同的)
-
-
WORD MajorOperatingSystemVersion
-
-
WORD MinorOperatingSystemVersion
-
The
-
minimum version of the operating system required to use this executable. This field is somewhat ambiguous since the Subsystem field (a few field later) appear to serve a similar purpose. This field defaults the 1.0 in all Win32 EXEs to date.
-
-
WORD MajorImageVersion
-
-
WORD MinorImageVersion
-
-
A user-definable field. This allows is a different versions of an EXE or DLL. You set these fields via the linker/version switch. For example, "link/version:2.0 myobj.obj".
-
-
WORD MajorSubsystemVersion
-
-
WORD MinorSubsystemVersion
-
-
Contains The minimum subsystem version required to run the executable. A Typical value for this field is 3.10 (meaning Windows NT 3.1).
-
-
DWORD Reserved1
-
Seems to always be
-
0.
-
-
DWORD SizeOfImage
-
-
This appears to is the total size of the portions of the image, the loader have to worry about.
It is the size of the region starting at the "image base up" to the end of the "last" section . The end of the last section was rounded up to the nearest multiple of the section alignment.
-
dword sizeofheaders
-
the size of the PE header and the section (object) Table . The raw data for the sections starts immediately and the header components.
-
dword CheckSum
-
supposedly a CRC CheckSum of the file. As in other Microsoft executable formats, this field is ignored and set to 0. The one exception to this rule was for trusted services and these EXEs must has a valid checksum.
-
word Subsystem
-
The type of Subsystem the this executable uses for it S user interface. WINNT. H defines the following values:
-
WORD DllCharacteristics
-
a set of flags indicating under which circumstances A DLL ' s initialization function (such as DllMain) would be Calle D. This value appears to always being set to 0, yet the operating system still calls the DLL initialization function for all Four events.
Reference:
Https://msdn.microsoft.com/en-us/library/ms809762.aspx
The relationship between RVA, VA, ImageBase:
http://blog.csdn.net/fantcy/article/details/4474604
As in other Microsoft executable formats, your find the real header by looking up its starting offset, which are stored in t He ms-DOS stub header. The WINNT. H file includes a structure definition for the MS-DOS stub header that makes it very easy-to-look up where the PE header s Tarts. The E_lfanew field is a relative offset (or RVA, if your prefer) to the actual PE header. To get a pointer to the PE header in memory, just add this field ' s value to the image base:
Peering Inside the Pe:a tour of the Win32 portable executable File Format Read the notes (not finished)