Understanding elf using readelf and objdump

Source: Internet
Author: User

Translator's note:
Original article address:
Http://www.linuxforums.org/articles/understanding-elf-using-readelf-and-objdump_125.html
This is the first time I have translated a technical article. There may be some typos, especially when some statements are not available. This may be because I typed in five strokes. Please correct your criticism.
[] Content is the Translator's note,Note: XXXX. Italic is the author's note.


Learning elf through readelf and objdump

First, you should understand three types of ELF target files:

L redirected files: these files hold code and data and need to be linked together with other target files to generate an executable file or a shared library file. In other words, you can understand a redirected file as follows: it is the basis for generating executable files and libraries.

If you compile the source code as follows, you can get the file:

$ Gcc-C test. c

This will generate a test. O, which is a redirected file.

Kernel Modules (such as *. O or *. Ko) are all redirected files.

L Executable File: This target file holds executable programs (Program, which is composed of executable binary code), such as your MP3 player, your VCD software player, even your TXT editor is an executable file of elf.

Compile a program to obtain a file similar to the following:

$ Gcc-O test. c

After you confirm that the executable bit of the "test" program is enabled (the executable bit of the file in Linux), you can execute it. There is a question: how is the shell script executed? A shell script is not an elf executable file, but an interpreter.

L shared library file: This file holds code and data, but is used in two different ways.

1. Link editor can process it together with other redirected files + Shared library files to generate another target file. [This is the way to compile the static library with other *. O files]

2. dynamic linker combines it, executable files, and other libraries to generate a process image ).

One sentence: these files are common *. So files. (Usually in/usr/lib)

Are there other methods to determine the ELF file type? Of course. In each ELF file, there is a file header with fields representing the type of the file. If you want to create a binary package, you can use the readelf command to read this header. For example, (the command results will be properly reduced to display only the relevant domain information ):

$ Readelf-H/bin/ls type: exec (Executable File)

SN @ Ubuntu :~ $ Readelf-H/bin/ls

Elfheader:

Magic: 7f 45 4C 46 01 01 00 00 00 00 00 00 00 00 00 00

Class: elf32

Data: 2's complement, little endian

Version: 1 (current)

OS/ABI: Unix-System V

Abiversion: 0

Type: exec (Executable File)

MACHINE: Intel 80386

Version: 0x1

Entrypoint address: 0x8049d60

Startof program headers: 52 (bytes into file)

Startof section headers: 95164 (bytes into file)

Flags: 0x0

Sizeof this header: 52 (bytes)

Sizeof program headers: 32 (bytes)

Numberof program headers: 9

Sizeof section headers: 40 (bytes)

Numberof section headers: 29

Sectionheader string table index: 28

$ Readelf-H/usr/lib/crt1.o type: REL (relocatable file)

SN @ Ubuntu :~ $ Readelf-H/usr/lib/crt1.o

Elfheader:

Magic: 7f 45 4C 46 01 01 00 00 00 00 00 00 00 00 00 00

Class: elf32

Data: 2's complement, little endian

Version: 1 (current)

OS/ABI: Unix-System V

Abiversion: 0

Type: REL (relocatable file)

MACHINE: Intel 80386

........

$ Readelf-H/lib/libc-2.3.2.so type: Dyn (shared object file)

SN @ Ubuntu :~ $ Readelf-H/lib/libcap. so.2

Elfheader:

Magic: 7f 45 4C 46 01 01 00 00 00 00 00 00 00 00 00 00

Class: elf32

Data: 2's complement, little endian

Version: 1 (current)

OS/ABI: Unix-System V

Abiversion: 0

Type: Dyn (shared object file)

MACHINE: Intel 80386

......

The "file" command does not properly view the target file information. I don't want to talk about this. Let's focus on readelf and objdump. Now let's start learning them.

To make it easier for us to learn about elf, you can use the following simple C program:

/*test.c */#includeint global_data = 4;int global_data_2;int main(int argc, char **argv){int local_data = 3;    printf("HelloWorldn");    printf("global_data= %dn",   global_data);     printf("global_data_2= %dn", global_data_2);     printf("local_data= %dn", local_data);    return(0);}

And compile it:

$ Gcc-O test. c

A. view the elf header.

The generated binary is the target to be viewed. Let's start with the elf header:

$ Readelf-h test

Elfheader:

Magic: 7f 45 4C 46 01 01 00 00 00 00 00 00 00 00 00 00

Class: elf32 data: 2's complement, little endian

Version: 1 (current)

OS/ABI: Unix-System V

Abiversion: 0

Type: exec (Executable File)

MACHINE: Intel 80386

Version: 0x1

Entrypoint address: 0x80482c0

Startof program headers: 52 (bytes into file)

Startof section headers: 2060 (bytes into file)

Flags: 0x0

Sizeof this header: 52 (bytes)

Sizeof program headers: 32 (bytes)

Numberof program headers: 7

Sizeof section headers: 40 (bytes)

Numberof section headers: 28

Sectionheader string table index: 25

What does this header tell us?

This executable file can be run on an intel X86 32-bit system machine (from the "machine" and "class" fields ).

During execution, the program runs from the virtual address 0x080482c0 (see "entry point address. This address does not point to our common main () function address, but it points to a function named _ start. You never felt you created it, did you? Of course you don't have it, the __start function is created by linker, and its goal is to initialize your program.

This program has 28 sections and 7 segments. [I recently read some elf articles, and some translate sections into segments ", some also translate segment into segments. You should pay attention when reading the article ].

What is a section )? Section is a partition in the target file. It contains some information (this information is useful for the connection process): program code, program data (variables, arrays, strings ), redirection information and other information. Therefore, the combination of several types of information in each zone has an obvious meaning: the code zone only contains code, and the data zone only contains initialized or uninitialized data. The section header table (SHT) precisely tells us what sections are in the elf target file. At least from the "Number of section headers" field, we know that the "test" target file has 28 sections.

If the section is a binary representation, our Linux kernel cannot read it in one way. the Linux kernel prepares several VMA (virtual memory areas), including consecutive page frames of virtual addresses. Within VMA, one or more sections are mapped. In this example, each VMA represents an elf segment (segment ). Then how does the kernel know which section to go to and which segment? This is the work of the program header table (PHT.

Two different charts of the elf structure.

B. View section header table (SHT)

Let's look at the form of a section in the program:

$ Readelf-s test

Thereare 28 section headers, starting at offset 0x80c:

Sectionheaders:

[Nr] Name type ADDR off size es flg lk inf al

[4]. dynsym 08048174 000174 000060 10 A 5 1 4

........

[11]. PLT progbits 08048290 000290 000030 04 ax 0 0 4

[12]. Text progbits 080482c0 0002c0 0001d0 00X0 0 4

........

[20]. Got progbits 080495d8 0005d8 000004 04 wa 0 0 4

[21]. Got. PLT progbits 080495dc 0005dc 000014 04 wa 0 0 4

........

[22]. Data progbits 080495f0 0005f0 000010 00 wa 0 0 4

[23]. BSS nobits 08049600 000600 000008 00 wa 0 0 4

........

[26]. symtab 00000000 000c6c 000480 10 27 2C 4

........

The compiler saves executable code to the. text section. The text section is marked as executable ('x' in the flag field ). In this section, you can see the machine code of our main () function.

$ Objdump-D-J. Text Test

The-D option tells the objdump to break down the machine code. -J tells objdump to only care about the specific section (in this example, It Is. text ). The following is part of the content after the command is executed.

08048370 :.......

8048397: 83 EC 08sub $0x8, % ESP

804839a: FF 35 fc95 04 08 pushl 0x80495fc

80483a0: 68 C1 8404 08 push $0x80484c1

80483a5: E8 06 FFFF call 80482b0

80483aa: 83 C4 10add $0x10, % ESP

80483ad: 83 EC 08sub $0x8, % ESP

80483b0: FF 35 0496 04 08 pushl 0x8049604

80483b6: 68 D3 8404 08 push $0x80484d3

80483bb: E8 F0 feff call 80482b0 .......

The. Data section stores all initialization variables, which are not in the stack. "Initialized" indicates that these variables are assigned to the initial value, such as "global_data ". What about "local_data? "Local_data" values are not in this section, and they live in the process stack.

Use objdump to view the. Data Section:

$ Objdump-D-J. Data Test

.....

080495fc <global_data>:

80495fc: 04 00 00 00 ...... [modified here .]

We can infer that the objdump can be used to translate the address and symbol. You do not need to find it in the symbol table. We can know that 080495fc [the original author is 0x08049424] is the address of global_data. Here we can see that its initial value is 4. Note the common executable files created in Linux. There is no symbol table for annotation. Objdump is difficult to parse this address.

What about BSS? BSS (blockstarted by symbol) is a ing [Note that the ing is not saved. The size of this section in the target file is 0, but in the process, this section has actual space, that is to say, the initialized variables are created during the running of the program, and there is no space for them in the static program.] the section where the variables are not initialized, you may think that "Every stuff should have a definite initial value ". It is true that in Linux, all uninitialized variables are set to 0. This is why. BSS is only 0. For a variable of the character type, it is a null character. Knowing this fact, we know that global_data_2 is forced to become 0 at runtime.

$ Objdump-D-J. BSS Test

Disassemblyof section. BSS:

.....

08049604: 8049604: 00 00 00 .........

We mentioned the symbol table before. This table can find the association between symbol names (not external functions and variables) and addresses. Use-S, readelf can demodulated this symbol table.

$ Readelf-S./test

Symboltable '. dynsym' contains 6 entries:

Num: value size type bind vis ndx name

.....

Func global default und printf@GLIBC_2.0 (2)

.....

Symboltable '. symtab' contains 72 entries:

Num: value size type bind vis ndx name

.....

49: 080495fc 4 object global default 22 global_data

.....

55: 08048370 109 func global default 12 main

.....

59: 00000000 57 func global default und printf @ glibc_2.0

.....

61: 08049604 4 object global default 23 global_data_2

.....

"Value" indicates the address corresponding to the symbol. For example, if the address referenced by a command (pushl 0x80495fc), this address means global_data. The processing of the printf () symbol is different because it is the symbol of an external function. You know that printf is defined in glibc, not in the test program. Then, I will explain how our test program calls printf.

C. view the program header table (PHT)

As I explained earlier, segment (segment) is a method for OS to "understand" our program. Let's see how our program becomes a segment:

$ Readelf-l test

Here are 7 program headers, starting at offset 52 program headers:

Type offset incluaddr physaddr filesiz memsiz flg align

[00] phdr 0x000034 0x08048034 0x08048034 0x000e0 0x000e0 r e 0x4

[01] interp 0x000114 0x08048114 0x08048114 0x00013 0x00013 R 0x1

[02] load 0x000000 0x08048000 0x08048000 0x004fc 0x004fc r e 0x1000

[03] load 0x0004fc 0x080494fc 0x080494fc 0x00104 0x00010c RW 0x1000

[04] dynamic 0x000510 0x08049510 0x08049510 0x000c8 0x000c8 RW 0x4

[05] Note 0x000128 0x08048128 0x08048128 0x00020 0x00020 R 0x4

[06] stack 0x000000 0x00000000 0x00000000 0x00000 0x00000 RW 0x4

Section to segment mapping:
Segment sections...
00
01. interp
02. interp. note. abi-tag. hash. dynsym. dynstr. GNU. version. GNU. version_r. rel. dyn. rel. PLT. init. PLT. text. fini. rodata. eh_frame
03. ctors. dtors. JCR. Dynamic. Got. Got. PLT. Data. BSS
04. Dynamic
05. Note. Abi-tag

06

Note: I added the line number of [x] to these outputs. It does not exist in the actual output.

This ing is intuitive. For example, field 2, where 15 sections are mapped .. The text section is mapped to this section. It indicates R, and e Indicates readable and executable. W is the meaning of readable.

Take a look at the "mongoaddr" column and we can find that this is the virtual first address of each segment. The first address of Segment 2 is 0x08048000. In this section, we can find that this address is not the real address of the segment in the memory. Ignore "phyaddr" first, because Linux has been running in the storage mode (in Intel/AMD 32-bit and 64-bit), so we are concerned about this virtual address.

There are many types of segments. We only care about two types:

  • Load: the content of this segment is loaded from the executable file. "Offset" indicates the location where the kernel should start reading the file. "Filesiz" tells us how many bytes are read from the file. For example, Segment 2 contains content ranging from 0 to 0x4fc. For fast execution, the file content is read to the memory only when necessary. [Load here refers to the virtual space mapped to the user, rather than copying data to the corresponding physical page .]
  • STACK: this segment is the stack area. Interestingly, all its fields are 0, except "flg" and "align ". No, right? No. It determines the starting address of the stack and the size of the stack. Remember: for Intel CPUs, the stack increases downward (the address decrease indicates that the CPU is in the stack ).

Curious to see the true layout of the program segment, right? We can also see it using the/proc/<pid>/maps file. <Pid> is the ID of the process we want to view. There is another small problem before we can take action. Our test process runs too fast and it ends before we enter/proc. I use GDB to solve this problem. You can also call sleep () before return to solve this problem.

In another console (or a simulated terminal such as xterm ):

$ GDB Test

(GDB) B Main

Breakpoint 1 at 0x8048376

(GDB) r

Breakpoint 1, 0x08048376 in main ()

Hold it here, open another console, and find the PID of test. If you want to save time, you can:

$ CAT/proc/'pgrep test'/maps

You will see the following output: (your output may be a little different)

[1] 0039d000-003b2000 R-XP 00000000 1080084/lib/ld-2.3.3.so

[2] 003b2000-003b3000 r -- p 00014000 1080084/lib/ld-2.3.3.so

[3] 003b3000-003b4000 RW-P 00015000 1080084/lib/ld-2.3.3.so

[4] 003b6000-004cb000 R-XP 00000000 1080085/lib/tls/libc-2.3.3.so

[5] 004cb000-004cd000 r -- p 00115000 1080085/lib/tls/libc-2.3.3.so

[6] 004cd000-004cf000 RW-P 00117000 1080085/lib/tls/libc-2.3.3.so

[7] 004cf000-004d1000 RW-P 004cf000 00:00 0

[8] 08048000-08049000 R-XP 00000000 66970/tmp/test

[9] 08049000-0804a000 RW-P 00000000 66970/tmp/test

[10] b7fec000-b7fed000 RW-P b7fec000 00: 00 0

[11] bffeb000-c0000000 RW-P bffeb000 0

[12] ffffe000-fffff000 --- P 00000000 0

Note: I added the line number of [x] to these outputs. It does not exist in the actual output.

[A case for the translator]

Return to GDB and enter:

(GDB) q

Finally, we can see 12 segments (actually VMA ). Focus on the first field and the last field. The first field shows the VMA address range, and the last field shows the file behind it. Do you see the similarities between VMA's 8th rows and the previous PHT's 2nd rows? The difference is that sht said that it ended at 0x080484fc, but in section 8, we can see that its end address is 0x08049000. The same phenomenon exists between vma9 and section 3. Sht shows that segment 3 starts at 0x080494fc. VMA starts from 0x08049000.

We must understand the following factors:

1. Although VMA starts with different addresses, the associated section is still mapped to a precise virtual address.

2. the kernel allocates memory based on 4 kb pages, so the address of each page is an integer multiple of 4 kb. For example, 0x1000 or 0 x. For VMA 9, the address of this page is 0x08049000. Or technically, the address of this segment must be aligned with the page size.

Which VMA is the stack? Vma11 is. Generally, the kernel dynamically allocates several pages and maps them to the highest possible virtual address in the user space. This is the stack region. Simply put, the address space of each process is divided into two parts (the premise is a 32-bit CPU): user space and kernel space. The user space is 0x00000000-0xc0000000, so the kernel space must be above 0xc0000000.

Therefore, the address allocated to the stack is near the 0xc0000000 boundary. The end address is fixed, and the start address can be changed based on the number of saved content.

D. What is a function?

A Program (which itself is executable) calls a function. It is easy to do: just call a process (function ). But what if it calls a function defined in the glibc library like printf?

Here, we will not discuss in depth how the dynamic linker works. I will focus on how the calling mechanism works in the executable body (or executable file or process. With this premise, let's continue.

When a program wants to call a function, it must follow the following process:

1. It has to complete a jump (jump) and jump to the function-related entry to be called in PLT (Procedure linkage tabe.

2. In PLT, there is a jump to the address of the related entry in got (Global Offset Table.

3. If this function is called for the first time, go to step 1. Otherwise, go to step 2.

4. The related got entry contains an address (pointing to the address point of the next command in Plt ). The program will jump to this address and call the dynamic linker to let it handle the function address. If the function address is found, the address is put into the related got entries, and the function is executed.

So when you call this function again, got has its address, and PLT directly jumps to this address. This process is called "lazy binding". The mechanism that completes the binding process is called "lazy binding". All external symbols are called "lazy binding" until they are actually needed for the first time, these symbols are translated into addresses. (In this example, function symbols are translated into addresses only when a function is called ). Now go to step 1.

5. Jump to the address point mentioned in got. This address point is the address of the function. No need to go through the dynamic linker.

6. After the function is executed, jump back to the next instruction of the caller.

Generally, the best way to view the content of an executable file is to reverse parse it. You can do this:

$ Objdump-D-J. Text Test

You can see the following code:

... 08048370:... 804838f: E8 1C FF call 80482b0 [This is the address of the entry into the PLT .]

What did we do at 0x80482b0:

080482b0: [PLT table, each table item has 16 bytes, and each table item is a piece of assembly code .]

80482b0: FF 25 EC 95 04 08 JMP * 0x80495ec [this address is the address of a table item in the got table. This got table item corresponds to the PLT table item]

80482b6: 68 08 00 00 00 push $0x8

80482bb: E9 D0 ff jmp 8048290 <_ init + 0x18>

As you can see, at 0x80482b0, it is an indirect jump * 0x80495ec (* Before the address ). So, let's see where it jumps. Let's look at 0x80482b0 again. Assume that the address is either in. Got or in. Got. PLT. Looking back at sht, we found it in. Got. PLT. I use readelf to complete hexadecimal transpose.

$ Readelf-x 21 test

Hex dump of section '. Got. PLT ':

0x080495dc 080482a6 00000000 00000000 08049510

................

0x080495ec 080482b6 ....

Note that the first column is the virtual address. The data on this address is in column 5th, not the second column.

Yes! Our "080482b6" is here. In other words, we return to PLT [back to the second command push $0x8 in the corresponding PLT table item], where we jump to another address. The work here is completed by the dynamic linker at the beginning, so we skipped it. Assume that the dynamic linker has completed this task and holds the address of the printf function in a got entry.

E. Other tools used to check the elf Structure

Besides readelf and objdump, Beye is also a tool. It is a File Viewer that can parse the elf structure. You can get the source file from the http://beye.sourceforge.net and compile it yourself.

Generally, Beye is stored in Linux Live CD.

I personally prefer Beye because it provides a GUI display. You can navigate to the Section to view the elf header, list the symbol table or other tasks. You just need to click a few keyboards.

For example, you can list symbols and directly jump to the symbol address. We try to jump to the main function. Step 1: Start Beye.

$ Beye Test

Press F7 first, and then press Ctrl + A to view the symbol table. To save time, press F7 to open the "Find string" menu. Enter "Main" and press Enter. The highlighted item is what you are looking. Press enter to jump to the main address. Do not forget to switch to the Assembly mode (select by F2), so that you can see the advanced form of the machine code (Assembly form ).

Is the symbol listed by Beye.

Generally, we want to see the virtual address, not the file offset. It is better to switch to the virtual address view. Press F6, press Ctrl + C, and select "local ". The leftmost column is the virtual address.

Summary

This article is just an introduction to the elf structure. You can start using readelf and objdump. If necessary, you can use the Beye tool to help you quickly explore the internal binary. Let's familiarize yourself with what you have learned.

Further reading:

Http://www.linuxjournal.com/article/1059
Http://www.linuxjournal.com/article/1060

A good elf introductory article written by Eric youngdale.

Http://en.wikipedia.org/wiki/Executable_and_Linkable_Format

Elf's description on Wikipedia, where you can find some other useful articles.

Http://en.wikipedia.org/wiki/Executable_and_Linkable_Format

This document fully describes the structure of ELF. After reading this article, you can have a comprehensive understanding of elf.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.