Analysis of the principles of PE format file compilation links

Source: Internet
Author: User

 

(* 1 *): write a program. A. cpp and foo. cpp
The content of A. cpp is:

Extern void Foo ();
Void main ()
{
Foo ();
}

Foo. CPP contains the following content:

# Include "stdio. H"
Void Foo ()
{
Printf ("I am foo! ");
}
Compile the program to generate a. OBJ Foo. OBJ a.exe.

(* 2 *): copy the above three files to the./visualstdio/vc98/bin directory.
Use the following command to resolve the problem:
Dumpbin/All A. OBJ> aobj.txt <enter>
Dumpbin/All Foo. OBJ> fooobj.txt <enter>
Dumpbin/All a.exe> aexe.txt <enter>

(* 3 *):
Open File A. OBJ and find the code section. The content is as follows:
Section header #3
. Text name
0 physical address
0 virtual address
2e size of raw data
355 file pointer to raw data // attention !!~!~
383 file pointer to relocation table
397 file pointer to line numbers
2 Number of relocations
3 Number of line numbers
60501020 flags
Code
Communal; sym = _ main
16 byte align
Execute read

Raw data #3
00000000: 55 8B EC 83 EC 40 53 56 57 8d 7d C0 B9 10 00 U ...... @ SVW .}.....
00000010: 00 B8 CC F3 AB E8 00 00 00 00 5f 5E 5B ...... _ ^ [
00000020: 83 C4 40 3B EC E8 00 00 00 8B E5 5d C3... @; ......].

(* 4 *): decompile the code section of A. obj.
Open the ursoft w32dasm tool (I use version 8.93)
Select all files when opening the file, because the software mainly targets file formats such as PE, le, and NE. So
The offset must be specified to decompile the OBJ file. Above attention! (Note: another way to obtain this information is to use dumpbin/section:. text ). That is, the file offset of the Code section.
Therefore, in the prompt dialog box that opens the OBJ file, enter 00000355
Start disassembly from offset 00000355 hex.
Check for 16 bit disassembly is not required.
The decompiled code section contains the following content:

: 00000000 55 push EBP
: 00000001 8bec mov EBP, ESP
: 00000003 83ec40 sub ESP, 00000040
: 00000006 53 push EBX
: 00000007 56 push ESI
: 00000008 57 push EDI
: 00000009 8d7dc0 Lea EDI, dword ptr [ebp-40]
: 10000000c b90000000 mov ECx, 00000010
: 00000011 b8cccccccc mov eax, cccccccc
: 00000016 F3 repz
: 00000017 AB stosd
: 00000018 e800000000 call 0000001d // attention !!!
: 0000001d 5f pop EDI
: 0000001e 5E pop ESI
: 0000001f 5B pop EBX
: 00000020 83c440 add ESP, 00000040
: 00000023 3bec cmp ebp, ESP
: 00000025 e800000000 call 0000002a
: 0000002a 8be5 mov ESP, EBP
: 0000002c 5d pop EBP
: 10000002d C3 RET
Brief description:
The 0xe8 is the call instruction opcode. The next DWORD shoshould contain the offset to the foo function (relative to the call instruction ).

It's pretty clear that Foo probably isn't zero bytes away from the call instruction. Simply put, this code wouldn't work as expected if you

Were to execute it. The code is broken, and needs to be fixed up.

In the above example of a call to function Foo, there will be a rel32 fixup record, and it will have the offset of the DWORD that the linker

Needs to overwrite with the appropriate value.

 

Old Gill annotation: 0xe8 is the call instruction code, which should be followed by the address of the target function. It is not currently linked (OBJ file), so the target function address cannot be known, and it is left blank (0 ), link writes the target function address to this location (00000019) based on the relocation information ).

 

(* 5 *): view the relocations following the Code Section:
Relocations #3
Symbol
Offset type applied to index name
------------------------------------------------
00000019 rel32 00000000 12? Foo @ yaxxz (void _ cdecl Foo (void ))
00000026 rel32 00000000 13 _ chkesp

This (first) fixup record says that the linker needs to calculate the relative offset
Function Foo, and write that value to offset four in the Section.

(* 6 *): the content of the.exe code section of the example:
// A. cpp
: 00401000 55 push EBP
: 00401001 8bec mov EBP, ESP
: 00401003 83ec40 sub ESP, 00000040
: 00401006 53 push EBX
: 00401007 56 push ESI
: 00401008 57 push EDI
: 00401009 8d7dc0 Lea EDI, dword ptr [ebp-40]
: 0040100c b90000000 mov ECx, 00000010
: 00401011 b8cccccccc mov eax, cccccccc
: 00401016 F3 repz
: 00401017 AB stosd
: 00401018 e813000000 call 00401030 // e800000000 has been changed to e813000000 (link Program)
: 0040101d 5f pop EDI
: 0040101e 5E pop ESI
: 0040101f 5B pop EBX
: 00401020 83c440 add ESP, 00000040
: 00401023 3bec cmp ebp, ESP
: 00401025 e846000000 call 00401070
: 0040102a 8be5 mov ESP, EBP
: 0040102c 5d pop EBP
: 0040102d C3 RET

: 0040102e CC int 03
: 0040102f CC int 03
// NO content is omitted in the middle.
* Referenced by a call at address:
|: 00401018
|
// Foo. cpp
: 00401030 55 push EBP
: 00401031 8bec mov EBP, ESP
: 00401033 83ec40 sub ESP, 00000040
: 00401036 53 push EBX
: 00401037 56 push ESI
: 00401038 57 push EDI
: 00401039 8d7dc0 Lea EDI, dword ptr [ebp-40]
: 0040103c b90000000 mov ECx, 00000010
: 00401041 b8cccccccc mov eax, cccccccc
: 00401046 F3 repz
: 00401047 AB stosd
: 00401048 68ecc04000 push 0040c0ec
: 0040104d e85e000000 call 004010b0
: 00401052 83c404 add ESP, 00000004
: 00401055 5f pop EDI
: 00401056 5E pop ESI
: 00401057 5B pop EBX
: 00401058 83c440 add ESP, 00000040
: 0040105b 3bec cmp ebp, ESP
: 0040105d e80e000000 call 00401070
: 00401062 8be5 mov ESP, EBP
: 00401064 5d pop EBP
: 00401065 C3 RET

(* 7 *) let's take a look at the content of foo. OBJ: (the offset of the Code Section in fooobj. txt is 0x000003bf, and w32dasm is used for decompilation .)
: 00000000 55 push EBP
: 00000001 8bec mov EBP, ESP
: 00000003 83ec40 sub ESP, 00000040
: 00000006 53 push EBX
: 00000007 56 push ESI
: 00000008 57 push EDI
: 00000009 8d7dc0 Lea EDI, dword ptr [ebp-40]
: 10000000c b90000000 mov ECx, 00000010
: 00000011 b8cccccccc mov eax, cccccccc
: 00000016 F3 repz
: 00000017 AB stosd
: 00000018 6800000000 push 00000000 // data address to be written to the link
: 0000001d e800000000 call 00000022 // address of the printf function to be written to the link
: 00000022 83c404 add ESP, 00000004
: 00000025 5f pop EDI
: 00000026 5E pop ESI
: 00000027 5B pop EBX
: 00000028 83c440 add ESP, 00000040
: 2017002b 3bec cmp ebp, ESP
: 0000002d e800000000 call 00000032
: 00000032 8be5 mov ESP, EBP
: 00000034 5d pop EBP
: 00000035 C3 RET
In summary, we can see that when the connector integrates various compilation units (. OBJ), the above a. obj and foo. OBJ have recorded the data to be adjusted,
For example, the location of the foo function in A. obj is
: 00000018 e800000000 call 0000001d // attention !!!

Raw data #3
00000000: 55 8B EC 83 EC 40 53 56 57 8d 7d C0 B9 10 00 U ...... @ SVW .}.....
00000010: 00 B8 CC F3 AB E8 00 00 00 00 5f 5E 5B ...... _ ^ [
00000020: 83 C4 40 3B EC E8 00 00 00 8B E5 5d C3... @; ......].

Relocations followed by the Section #3
Symbol
Offset type applied to index name
------------------------------------------------
00000019 rel32 00000000 12? Foo @ yaxxz (void _ cdecl Foo (void ))
During connection, the connector integrates the code section and links the foo. OBJ code section to the Code section of A. obj. As follows:
: 00401000 55 push EBP
....
: 00401018 e813000000 call 00401030
....
: 0040102d C3 RET
: 0040102e CC int 03
: 0040102f CC int 03
// NO content is omitted in the middle.
* Referenced by a call at address:
|: 00401018
|
: 00401030 55 push EBP
....
: 00401065 C3 RET
00401030 in call 00400000 is the base address where the code is preferentially loaded.
In e813000000, 13000000 is the offset value. In fact, it is 00000013, which is a feature of Intel CPU.
The low-end address value is written before and after the high-end value (Note: a peculiarity of Intel processors Where numerical data is stored in
Reverse Order to character data.

To copy a 32 bit value (56 A7 00 Fe) into the eax register, you will find the opcode, A1 (mov eax) followed by (Fe 00 A7 56 ).
A1 Fe 00 A7 56)

Jump from offset 00401018 to 00401030. In this way, the command is e813000000.
Manual algorithm:
Because the call command occupies 5 bytes (one is call nmemonic (E8), and the other four are offset values ).
And 0040101d-00401018 = 5, so the offset should actually be counted from 0040101d.
Therefore, 00401030-0040101d = 13
Therefore, the call command is e813000000.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.