ELF File Format learning in C language, elf File Format

Source: Internet
Author: User
Tags field table

ELF File Format learning in C language, elf File Format

Recently, ELF files are related to the lab, so this is almost the case. Learn about ELF.

ELF is a file format. For the time being, only the ELF File Format of executable files is used.

First, the layout of the file format is shown as follows:

It is hard to understand this, so write a small program and read it with readelf.

The program is relatively simple:

 

#include <stdio.h>#include <stdlib.h>int data[100] ={0};int bss[100];int main(){int i=0;for(i=0; i<100; i++)bss[i] = i;printf("the bss[3]= %d\n", bss[3]);return 1;}

 

 

First, run the readelf-h command to check the elf header:

First, the first magic, magic number, is mainly used by the program to confirm whether the read is an elf file header. Among them, the first 7f is the default, followed by 45, 4c, 46 is the relative value of elf in the ascii code, followed by 01, which has no practical significance. Each time the program reads the elf header file, it will check whether the magic number is correct to avoid reading the elf File.

The next class, Data, Version, OS/ABI, ABI Version type, and machine version are all about machines, systems, and file versions. It's not the main content of this time. Just look at it.

The next Entry point address 0x8048330 indicates the Entry address of the program. After the program is loaded, the first command starts from this point. In terms of commands, the process is established throughout the program, after loading the corresponding virtual address ing into the memory, After all preparations are completed, the program will be executed. At this time, set the eip to 0x8048330. When I first learned C, many people thought that the main function was required because it was the entry of the program and the First Command executed by the program was the main function. If so, the function at 0x8048330 should be the main function.

Through the program, we can use objdump to disassemble the program:

As you can see, part 08048330 of the program is a function called-start, which is not the main function we take for granted. Why? Let's take a look at the main content of the <_ start> function. At a glance, we find that the main content of this function is to store the relevant registers, next I jumped to a function called <_ libc_start_main @ plt>. That is to say, other function operations were performed before the program actually executed the main operation, that is, main is not the first function to be executed. So before main, what did all those functions do?

It is actually very simple. At the beginning of the main function, the variables in the main function are directly in the stack at the beginning, and then you can directly use malloc and new to apply for heap space at the beginning, what is the initial address for Stack and stack settings? Does it seem to have been set in main? For example, stdin and stdout are not enabled. Therefore, functions such as _ start in front of main do this kind of work, initialize stack information, and open files such as standard input and output.

In short, it is the following content:

//************************************** *********

_ Start:
Init stack;
Init heap;
Open stdin;
Open stdout;
Open stderr;
:
Push argv;
Push argc;
Call _ main; (call main)
:
Destory heap;
Close stdin;
Close stdout;
Close stderr;
:
Call _ exit;

**************************************** ******/

Therefore, main is only an intermediate function of the entire program, and it is not the function that is initially executed by the program !!!


Next, Start of program headers: 52 (bytes into file)

Start of section headers: 5120 (bytes into file)

This is the address of the program header and Section header. This address indicates the offset between the first address of the program header and the header of the Section header and that of the ELF File Header.

For example, the program always reads memory from the disk. If the program is located on the disk at 0x1000, the program header and Section header are located at 0x1052 and 0x6210 on the disk. Note that this is the disk address.

What is the program header and Section header. Generally speaking, a program table is a table with information about all the content that the program needs to load into the memory from the disk. The system divides the content that needs to be loaded into one block, which requires a table to manage and record their information. This table is a program table. Accordingly, the Section table is a table that records the corresponding information of each Section. The program table and Section table are the arrays of two struct in the program. Therefore, the program header and Section header are the first addresses of the two struct arrays.

The following flag should be a flag or something. I haven't figured it out yet.

Below: Size of this header: 52 (bytes)

This indicates that the elf header file contains 52 bytes. For details, see Start of program headers: 52 (bytes into file ), the program header content is also in the disk from the lower 52 bytes below the first elf byte, which indicates that in the disk, after the elf header table, it is the program header content immediately, it is consistent with the layout of the above elf.


Size of program headers: 32 (bytes)

This indicates the size of each item in each program header table. As mentioned above, the program header table is an array of struct, And the 32byte represents the size of this struct.

Number of program headers: 9

This indicates the number of program headers in the program header table. Multiply by the size of each item in the preceding program header table, that is, the size of the entire program header table.

 

Size of section headers: 40 (bytes)
Number of section headers: 36

The two data items are the data in the Section table, which is the same as the data in the program header table.

On a disk, this elf header file is in the form of a struct. The entire content is similar to the following code:

 

struct Elf {uint32_t e_magic;// must equal ELF_MAGICuint16_t e_type;uint16_t e_machine;uint32_t e_version;uint32_t e_entry;uint32_t e_phoff;uint32_t e_shoff;uint32_t e_flags;uint16_t e_ehsize;uint16_t e_phentsize;uint16_t e_phnum;uint16_t e_shentsize;uint16_t e_shnum;uint16_t e_shstrndx;};

This corresponds to the results obtained by the preceding readelf.

 

 

Next, let's look at the program header.

This is the content in the program header table. These program headers record all the content that the program needs to copy to the memory and copy the content to the memory to run the program. There are a total of 9 program headers in this program. Each item has its own attributes.


We can see that the type of the first item is PHDR, which indicates that we want to save the content as the program header table. Offset indicates the offset between the start address and the program header address. That is, the system needs to read the corresponding program header content from the disk to the memory through this offset address. When reading data from a disk, it must be stored in the memory. Mongoaddr indicates that this part of content needs to be placed in the starting address of the virtual memory. The PhysAddr behind it is designed to be compatible with systems that adopt the real address mode. The FileSize and memSize indicate the size of the program header in the file and in the memory respectively. (The two sizes can be different, but they must be filesize <= memsize ). the flg behind it indicates the flag of the program header. R indicates readable, W indicates writable, and E Indicates executable. The final Align indicates the alignment of the program header. 0x4 indicates 4-byte alignment, and 0 X indicates 4 K alignment.

The type in the first program header indicates that the content in the program header represents the program header table. The position of the starting file is 0x34 bytes away from the position of the starting file of the program. The calculation shows that 0x34 = 52 matches the starting position of the program header table in the preceding elf header file. The subsequent filesize is 0x120 = 288 bytes. elf knows that there are 9 items in the program header table, each of which occupies 32 bytes, in this way, the size of the entire program header is 32*9 = 288. this is also consistent. Next, use gdb to view the program content in 0x8048034.


Because each program header is a trivial matter 32 byte = 0x20byte, the above two lines represent a program header. We can see that the content given by readelf is exactly the same, that is, in this address space, the corresponding program header table is stored.


The following eight program headers are the same, but the types are different. The meanings of each type are as follows:

PHDR saves the program header table.

INTERP specifies that the interpreter must be called after the program has been mapped from executable to memory. Here the interpreter does not mean that the memory of the binary file must be explained by another program. It refers to a program that links other libraries to meet unresolved references.
LOAD indicates a segment mapped from a binary file to a virtual address space. Constant data (such as strings) and program target code are saved.
The DYNAMIC segment stores the information used by other DYNAMIC connectors (that is, the interpreter specified in INTERP.
NOTE stores proprietary information.

After careful observation, we can find that the program headers whose attributes are load contain all the segments in the other seven program headers. This problem can also be found from the address range on the loaded virtual memory. Therefore, when a program is loaded, you only need to load the two program headers whose type is load. Other program headers are only used to conveniently find the corresponding content.

 

In the above chapter, the program table contains information about the segments contained in each program header:

You can see the familiar. text,. data, And. bss segments. This means that all program headers are actually part of the section in the program. The entire program is divided into several sections according to certain methods, and the program header is the section that needs to be loaded into memory in all sections. The architecture of a program file also shows that there is no program header in it.


Next: section header

Because the number of executable file segments is large, not all of them are cut out. Only some sections are given:

Here we mainly look at the text, data, and bss segments.

Text indicates the code segment of the program. We can see that the address in the memory is 0x8048330, which is the entry address of the entire program.

The following data and bss segments. First, let's look at the. bss segment. We can see that the comment Segment under the bss segment has the same offset address in the file. This is often said that the. bss segment does not occupy the size of the file. This is because bss segments represent uninitialized global variables. In C, uninitialized global variables are initialized to 0, so you do not need to allocate space to bss in the file, because as long as the variable belongs to the bss segment, it is 0. the total size of the variable in the bss segment in the memory can be recorded through the section table. That is, the size of the bss segment above.

You can use the above program header for verification. In the preceding program header, the. data and bss segments are both out of item 03 of the program header,

Pull out the third item separately:


As you can see, in the program header of this item, The filesize amount memsize is different, and the two are different, 0x45c-0x100 = 0x35c. The size of the. bss segment is greater than the size of the c-byte. I guess it should be related to byte alignment. Because the starting position of the program header in the memory is 0x0849f14, and the memory size is 0x084a370. Without this extra c-byte, the following content may be difficult to achieve byte alignment. Therefore, the compiler considers that a c-byte is added.

How is the variable in the bss segment initialized to 0?

When the program is loaded into the memory, you only need to copy the filesize content from the file to enter the memory, and then fill the remaining part with 0.

From the field ing contained in the above program header, we can see that the bss segment is placed at the bottom of its program header, or even the lowest position of the entire program in the memory space. Therefore, through the above method, you can clear all the variables of the bss segment to 0, and then complete the initialization to 0. The variable can be solved through the symbol table and its location.

Symbol table (only information about data and bss is listed ):

In the program, there are two global variables:

 

<Span> In the symbol table, value should be the first address of the variable, and the subsequent size is the memory size occupied by the variable. Because both variables are int-type arrays with a size of 100, the size is 400. The NDX below specifies the segment where the symbol is located. You can find segment 25 from the segment table. We can see that segment 25 is the bss segment. It may be because the value of data [0] is also 0, which leads to the fact that both the data and bss values are 0. Therefore, when the compiler is optimized, both variables are placed in the bss segment.

Re-compile a simple program:

 

#include <stdio.h>#include <stdlib.h>int d=10;int b;int main(){int i=0;printf("the out is  %d\n", b+d+i);return 1;}

 

 

The symbol table and field table are:

We can see that the segment numbers of B and d are exactly the data and bss segments, while the d address is 0x0804a014, the data starting address is 0x0804a00c, And the size is c, therefore, d is the size of the last four words stored in the data segment. The B variable is stored in the last four bytes of the bss segment.

As you can see from the modified program, the program initializes the global variables through the symbol table, which is consistent with the above analysis.


Summary:

The entire program is managed through the elf header file. elf records all the content information in the program. The entire program is first divided into n segments, and the information of these segments is stored in the program field table. The program header is the segment in which the memory needs to be loaded in all segments. The program header table records its information, including its location in the file and the location of the loaded memory, as well as its size in the file and the size in the memory. When running the program, you must first find the position of the program header table and load the segments in the program header table to the memory according to the information in the program header, then run the program.

Through this elf study, I learned a few vague questions:

Why is the bss segment 0 bytes in the file?

How the global variables in the program are initialized.

I still haven't figured out whether the segment in the bss, data segment, and memory management is a thing. Now I understand all of these. It is good to learn this, this makes some problems clearer.

 

 

 

 

 

 

 

Copyright Disclaimer: This article is an original article by the blogger and cannot be reproduced without the permission of the blogger.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.