Basic file format of Arm System

Source: Internet
Author: User

The basic file formats of the arm system are commonly used in ARM-based embedded system development.

There are three basic file formats for the arm system:
1) bin, flat-board binary format, which is generally used for Directly Writing to flash and loading to the monitor program.
2) Elf, executable and linkable format, a common object file format, generally generated by GNU Compiler Collection (GCC.
3) axf, an extended version in Bin format. The main part is the same as the bin, and debugging information is added to the file header and tail for axd.

This article mainly discusses bin and elf.
The ELF format is an object file format. Generally, object files can be divided into three types: relocated object files, executable object files, and shared object files. Elf files can also be divided into these three types.
First, let's talk about the relocated object file. This object file is generally generated by the aggreger (AS) in GCC (please do not think that GCC is just a compiler). In addition to the binary machine code, there are some information that can be used for relocation. It is mainly used as the linker (LD) Input. Linker will relocate the symbols to be relocated based on the information to generate executable object files. The relocable object file in the ELF format consists of the header and section.
The header includes the elf header and section header. the elf header is located in the file header, used to store the architecture of the target machine, size-side configuration, elf header size, object file type, section header offset in the file, section header size, number of items in section header. Section header defines the type, location, size, and other information of each section in the file. Linker finds the entry of the section header by searching for the elf header, finds the corresponding section entry in the section header, and then locates the target section.
Section includes
. Text: compiled machine code.
. Rodata: read-only data, such as printf ("Hello !") The string hello in.
. Data: initialized global variables. Local variables will be stored in the stack at runtime and will not appear in. Data or. BSS segments.
. BSS: uninitialized global variable. Here it is just a placeholder and there is no actual storage space in the object file.
. Symtab: symbol table, used to store information about global variables and functions defined or referenced in a program.
. Rel. Text: a list that stores a series of positions in. Text. These locations will be modified When linker merges the file with other object files. In general, these locations store some commands that reference global variables or external functions. Commands that reference local variables or local functions do not need to be modified, because the address of local variables and local functions generally uses the PC relative offset address. Note that this section and the following. Rel. Data are not required at runtime. This section will be removed when an executable elf object file is generated.
. Rel. Data: Save the relocation information of global variables. Generally, if the initialization value of a global variable is the address of another global variable or the address of an external function, it needs to be relocated.
. Debug: Save the debug information.
. Strtab: A string table that stores. symtab,. debug, and section names .. In symtab,. debug, and section table, all fields that store the name actually save an offset value. The corresponding string can be found in the string table through this offset value.
The following is a detailed discussion of. symtab:
Each relocated object file has a. symtab. This symbol table stores all the defined and referenced symbols in this object file. When the source program is a C-language program, the symbols in. symtab are directly derived from the C compiler (PC3 ). There are three main types of symbols mentioned here:
1) defined in this object file can be global symbols of other object files. In C language source programs, it is mainly those non-static (without static modification) global variables and non-static functions. In arm assembly language, it is the variables exported by the Export command.
2) global variables referenced in this object file but defined in other files. In arm assembly language, it is a variable introduced by the import command.
3) local variables. Local variables are only visible in this object file. The local variable here refers to the local variable of the connector, which should be different from the local variable of the general program. The local variables mentioned here include the global variables modified with static, the section name in the object file, and the name of the source code file. In general, local variables are managed by the system runtime environment during runtime, and linker does not care about them.
Each symbol that meets the preceding conditions has a data item in the. symtab file. The data structure of this data item is:

Typedef struct {
Int name; // The symbol name, which is actually the Offset Value of. strtab.
Int value; // The position in the Section, which is expressed by the offset relative to the section address.
Int size; // size
Char type; // type, generally data or function
Char binding; // whether it is a local variable or a global variable
Char reserved; // Reserved Bit
Char section; // the section to which the symbol belongs. Optional values:. Text (represented by number 1) and. Data (represented by number
// 3 stands for), ABS (symbols that should not be relocated), und (in this object file
// Undefined symbols, which may be defined in other files) and COM (General uninitialized variable symbols ).
} Elf_sym

Now it is assumed that all modules of the application have been assembled and a relocated object file has been created. These objects have the same structure and have their own. text ,. data section, which has their own. symtab. the next step for GCC is to use linker (LD) to connect these object files and necessary libraries to executable files with an absolute runtime address, is an executable ELF format file.
The linker connection action can be divided into two parts:
1) parse the symbol. Determine the point of the referenced symbol.
2) symbol relocation. Merge sections, allocate Runtime Environment addresses, and reference symbol relocation.
Symbol parsing:
In an object file, commands define symbols, and commands reference symbols. There may be a situation where a referenced symbol has multiple definitions. The purpose of symbolic Parsing is to determine which symbol is referenced by a symbolic reference in this object file.
During compilation, except for the global variables defined in this file, the compiler generates a symbol table item, when it is found that a referenced symbol is not defined in this file, the compiler will automatically generate a symbol table item, leaving the work of confirming these references to linker. The assembler reads these symbol table items during assembly to generate a. symtab. During the reading process, if an uncertain symbolic reference item is found, the assembler will generate an additional data item for these symbols, called a relocated data item, which is stored in Rel. text or Rel. the data section is determined by linker. The following is the data structure of the relocation entry:

Typedef struct {
Int offset; // specifies the offset of the reference to be relocated in the object, which is actually the reference to be relocated.
// Actual location in the object
Int symbol; // the actual point of the relocated reference.
Int type; // relocation type: r_arm_pc24: Use the 24bit PC relative address to relocate the reference.
// R_arm_abs32: Use the 32bit absolute address to relocate the reference.
} Elf32_rel

What linker needs to resolve is the references of the relocated data items generated. Based on the rules defined in the C language, linker searches for the appropriate symbol in each input object file for each relocated data item and fills the symbol in the symbol item. However, since we do not know the real address of this symbol, even if we know the real point of the reference, we still cannot determine the address to which the reference points.
Symbol relocation:
Symbol relocation is used to solve the above problems. Linker first merges sections. The process of linker merging object files is very simple. Generally, sections with the same attributes are merged. For example, the. text section of different object files will be merged into a. Text. Similarly, the. symtab section is merged into a. symtab. Two problems are involved:
1) The order in which object files are merged. This problem involves the running address of the final command and symbol. Most importantly, which section is at the forefront? This is the most important part in the development of the arm raw system. After the cpu Of the arm system is powered on, the system automatically obtains and executes commands from the 0x00000000 address, which is mapped to memory. This action is not programmable. Therefore, the section at the top must contain the entry point of the program. Otherwise, the system will not be able to run normally.
2) Relationship between the input segment and the output end. Theoretically, any section can be mapped to an output segment at will. A. Data section can be combined with A. text section to output A. Text. Of course, this action is meaningless. We must tell linker to use those sections as input to generate an output section.
These two problems are all controlled by a file called a connection script. Linker reads the connection script to determine the ing between the input and output of the section, set the entry point of the program, and set the section in the header of the entire executable file.
Another function of the connection script is to specify the address of each section. After the section is merged, linker will compile the symbols in a unified manner with the data. symtab and assign an absolute runtime address. This address uses the section address as the base address. If the address of the. text section is 0x00000000, the symbol in. text uses the address 0x00000000 as the reference address. The connection script also specifies the section address. In embedded development, parameters such as text_base and data_base that need to be specified during project compilation are added to the connection script to complete Section Address allocation.
After the preceding two steps are completed, the linker executes the quote symbol relocation operation. Linker traversal. rel section (including. rel text and. rel data). For each data item. in symtab, find the actual address of the reference. all the symbols in symtab have absolute run addresses). Then, based on the offset provided by the offset field, fill in the corresponding addresses.
At this point, all the symbolic relocation work has been completed. Linker deletes the rel. Text and rel. Data sections used to save the relocation information, and adds a segment header and A. init section. Generate executable ELF format object files.
The segment header stores the memory ing information for the operating system .. The init section contains a _ init function. When a program is loaded, the program loader of the operating system loads the program to the user's memory space by reading the segment header, and maps the program according to the seging information in the segment header. text and. data fields are mapped to the appropriate address. Then call the _ init function in. init to complete the initialization.
Because elf files have strong versatility, the popular development mode is to first generate executable files in the ELF file format by using the compilation tool. When using external tools, extract the corresponding part of the ELF File and generate the BIND file. For example, the famous GNU bootloader U-boot adopts this practice. The Compiler tool set is GCC, And the bin generation tool is elf2bin. Although the famous Development Environment ads of arm uses its own armcc and the armcpp compiler, they work in the same way as GNU gcc.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.