Target File Format

Source: Internet
Author: User
Tags field table

The file generated after the compiler compiles the source code is called the target file. What exactly is stored in the target file? Or how do we store our source code after compilation?
In terms of structure, the target file is an executable file format that has been compiled, but it is not linked yet. Some symbols or addresses may not be adjusted yet. In fact, it is stored in the executable file format, but it is slightly different from the real executable file in structure.
The executable file format covers various aspects of program compilation, linking, loading and execution.

3.1 target file format
Currently, the popular executable file formats on the PC platform are PE (portable executable) in windows and elf (executable linkable format) in Linux, both of which are coff (Common File Format) format variant. The target file is the intermediate files that are compiled by the source code but not linked. It is similar to the content and structure of the executable file, therefore, it is generally stored in the same format as the executable file format.
The executable file (windows.exe and elf executable files in Linux) is stored in the executable file format. Dynamic Link Library (for Windows. DLL and Linux. so) and static link library (static Linking Library) (windows. lib and Linux. a) All files are stored in the executable file format. They are occasionally read in Windows stored in PE-COFF format, and stored in ELF format in Linux.
In the ELF File standard, files in the ELF format in the system are classified into four categories listed in the following table:


In Linux, we can use the file command to view the corresponding file format.

3.2 What is the target file?
The content in the target file must contain at least the compiled machine instruction code and data. Yes, in addition to the content, the target file also includes the information required for the link, such as the symbol table, debugging information, and string. In general, the target file stores the information in segments based on different attributes. In general, they all represent a certain length area, basically, there is no difference. The only difference is that the elf Link View and the loading view will be mentioned later.
The machine commands compiled by the program source code are often placed in the Code Section. The common names of the Code sections are ". code "or". text ". data ". Let's look at the structure after a simple program is compiled into the target file, as shown in.

Assume that the executable file (target file) is in the format of ELF. It can be seen that the ELF file starts with a "File Header", which describes the file attributes of the entire file, including the file executable, static, winter, and entry address (if executable), target hardware, and target operating system information. The file header also includesSection Table)The field table is an array that describes the segments in the file. The field table describes the offset position of each segment in the file and its attributes. All information about each segment can be obtained from the segment table. The file header is followed by the content of each segment. For example, the code segment stores program instructions.
Generally, the C language is compiled into machine code and stored in. text Segment; initialized global variables and local static variables are saved in. data Segment; uninitialized global variables and local static variables are generally put in. in the BSS segment, we know that the default values of uninitialized global variables and local static variables are both 0. They can also be placed in. data Segment, but because they are all 0. it is unnecessary to allocate space for data segments and store data 0. When the program is running, they really occupy the memory space, and the executable file must record the total size of all uninitialized global variables and local static variables, that is, the. BSS segment. Therefore, the BSS segment only reserves a location for uninitialized global variables and local static variables. It has contents, so it does not occupy space in the file.
In general, the program source code is divided into two types after compilation: program instructions and program data. Code segments are program instructions, while data segments and. BSS segments are program data.
Why are program commands and data separated? There are several main reasons:
(1) On the one hand, after the program is loaded, data and commands are mapped to two virtual storage areas respectively. Since the data area is read-only for the process, and the command area is read-only for the process, the permissions of the two virtual storage areas can be set to read-write and read-only respectively. This prevents program commands from being rewritten intentionally or unintentionally.
(2) On the other hand, they have an extremely powerful Cache System for modern CPUs. The program must increase the cache hit rate as much as possible. The separation of the command zone and data Zone helps improve the program's locality.
(3) the third reason is also very important, that is, when the system runs multiple copies of the program, their commands are the same, therefore, you only need to save a copy of the instruction part of the program in the memory. This is true for read-only regions such as commands and for other read-only data. Of course, the data regions of each replica process are different. They are private to the process.

3.3 mining simplesection. o
The source code list is as follows.

  1. Int printf (const char * format ,....);
  2. Int global_init_var = 84;
  3. Int global_uninit_var;
  4. Void func1 (int I)
  5. {
  6. Printf ("% d \ n", I );
  7. }
  8. Int main (void)
  9. {
  10. Static int static_var = 85;
  11. Static int static_var2;
  12. Int A = 1;
  13. Int B;
  14. Func1 (static_var + static_var2 + A + B );
  15. Return;

Note: Unless otherwise specified, the ELF file format on the 32-bit intel X86 platform is analyzed below.
We use GCC to compile this file (parameter-C indicates that only compilation is not linked ):
We get a 1104-byte simplesection. O target file (the file size may vary depending on the compiler version and machine platform. You can use objdump to view the internal structure of the object and run the following command:

The preceding results show that simplesection. the number of O segments is more than we think. In addition to the most basic code segments, data segments, and BSS segments, there are three read-only data segments (. rodata), comment Information Section (. comment) and stack prompt segment (. note. GNU-stack), the meaning of the three additional segments will not be pursued for the moment.
Let's take a look at the attributes of several important segments. The easiest thing to understand is the length (size) and the position (File offset) of the segments ), "contents" and "alloc" of the first line of each segment indicate various attributes of the segment, and "contents" indicates that the segment exists in the file. The BSS segment does not contain "contents", indicating that it does not actually exist in the ELF File. ". Note. although the GNU-stack section contains "contents", its length is 0, which is an odd section. It is ignored for the moment and is considered to not exist in the ELF File. So what actually exists in the ELF file is. text ,. data ,. rodata and. the length of the four comment segments and the offset article in the file have been expressed, as shown in.



3.3.1 code segment
The-S parameter of objdump can print the content of all segments in hexadecimal format, and the-D parameter can disassemble all segments containing commands. We extract the daunting content from the objdump output and analyze the content about the code segment (The ellipsis indicates irrelevant content ):

"Contents of section. text "is. text data is printed in hexadecimal format, with a total of 0x5b bytes and the leftmost offset. The four middle columns are hexadecimal content and the rightmost column is. the ASCII code of the text segment. The following disassembly results show that the. text section contains the commands of the func1 () and main () functions in simplesection. C .. The first byte "0x55" in the text segment is the first "Push % EBP" command of the "func1 ()" function, and the last byte 0xc3 integer main () the last instruction "RET" of the function ".

3.3.2 data segment and read-only data segment
The. Data Segment storesInitialized global variables and local static variables. In the previous simplesection. C code, there are two such variables: global_init_var and global_uninit_var. These two variables have exactly 8 bytes each, so the size of the. Data Segment is 8 bytes.
Simplesection. in C, when we call printf, we use a String constant "% d \ n", which is a read-only data, so it is put to ". rodata segment. The output result shows that the four bytes of the segment are in the ASCII byte order of the String constant and end with '\ 0.
The. rodata segment stores read-only data. Generally, it is a read-only variable (such as a variable modified by const) and a String constant. Set up separately. the rodata segment has many advantages. It not only supports the const keyword of C ++ in terms of semantics, but also can be loaded by the operating system. the attributes of the rodata segment are mapped to read-only, so that any modification operation on this segment will be processed as a very special operation, ensuring the program security.
It is also worth mentioning that sometimes the compiler will put the String constant in the. Data Segment rather than separately in the. rodata segment. If you are interested, you can try to change the simplesection. c file name to simplesection. cpp, and compile it with various msvc compilers to check the storage of string constants.

The first four bytes in the. Data Segment are 0x54, 0x00, 0x00, and 0x00 from low to high. The value is exactly global_init_var, that is, 84 in decimal format. Global_init_var is a four-byte int type, why is the storage order 0x54, 0x00, 0x00, 0x00 instead of 0x00, 0x00, 0x00, 0x54? This involves the issue of the CPU's byte order, that is, the so-called large-end and small-end problems.

3.3.3 BSS segment
. The BSS segment stores uninitialized global variables and local variables. As mentioned above, global_uninit_var and static_var2 are stored in. BSS section, in fact, is more accurate. BSS segments reserve space for them. However, we can see that the size of this segment is only 4 bytes, which is inconsistent with the size of global_uninit_var and static_var2 being 8 bytes.
In fact, we can see from the symbol table that only static_var2 is stored in the. BSS segment, but global_uninit_var is not stored in any segment, just an undefined "common match ". This is actually related to different languages and different compiler implementations,A write compiler stores uninitialized global variables in the target file. BSS segments, some of which are not stored, are reserved for an undefined global variable, and are retained until the final link becomes an executable file. BSS segment allocation space.
In principle, we can simply store it as a global uninitialized variable in the. BSS segment. It is worth mentioning that it is easy to understand the static variables visible inside the compilation unit (for example, the static variable added to global_uninit_var is indeed stored in the. BSS segment.

Location where quiz variables are stored
Now let's do a small test. Please refer to the following code:

  1. Static int X1 = 0;
  2. Static int X2 = 1;

What segments will X1 and X2 be placed in?
X1 will be placed in the. BSS segment, and X2 will be placed in. Data. Why is one stored in the. BSS segment and the other in the. Data Segment? Because X1 is 0, it can be considered uninitialized, because the uninitialized values are all 0, so it is optimized. BSS, which can save disk space, because. BSS does not occupy disk space. The initialization value of another variable X2 is 1, which is initialized, so it is placed in the. Data Segment.
Note:The optimization of this similar compiler will bring a lot of obstacles to our analysis of the mechanisms behind the system software, so that many problems are not clear at a glance.

3.3.4 other sections
In addition to the. Text,. Data, And. BSS most commonly used segments, the ELF file may also contain other segments to store program-related information. Lists some common elf segments.

The names of these segments are prefixed by '.', indicating that the names of these tables are retained by the system. Applications can also use some non-system reserved names as segment names.

Custom segments
Under normal circumstances, the target file compiled by GCC can only be, and the code will be placed in ". text, global variables and static variables will be placed in ". data "and ". BSS section, as we have previously analyzed.But sometimes you mayWe hope that the variables or some part of the code can be put in the specified segment to implement some specific functions. For example, to meet the memory and I/O address layout of some hardware, or for example, a page error exception occurs when the Linux operating system kernel is used to complete initialization and user space replication. GCC provides an extension mechanism that allows programmers to specify the segments of variables:

Before a global variable or function, we add "_ attribute _ (Section (" name "))) "attribute, you can put the corresponding variables or functions in the section with" name "as the segment name.



Introduced by: programmer self-cultivation

Target File Format

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.