As you know, computer programming language is usually divided into three categories: machine language, assembly language and high-level language. High-level languages need to be translated into machine language to perform, and translation is divided into two ways, one is compiled, the other is interpreted, so we basically divide the high-level language into two major categories, one is a compiled language, such as C,c++,java, and the other is interpreted language, such as Python, Ruby, MATLAB, JavaScript.
This article will show you how to convert a program written in a high-level C + + language into a binary code that a processor can execute, including four steps:
Introduction to the GCC tool chain
Generally speaking, GCC is the abbreviation of the Gun Compiler collection, and it is a common compiling tool on Linux system. The GCC tool chain software includes GCC, Binutils, C runtime libraries, and more.
Gcc
GCC (GNU C Compiler) is a compilation tool. The process of translating a program written by the C + + language into a binary code that the processor can execute is done by the compiler.
Binutils
A set of binary program processing tools, including: Addr2line, AR, objcopy, Objdump, as, LD, LDD, readelf, size, and so on. This set of tools is an indispensable tool for development and debugging, and is described below separately:
Addr2line: Used to translate the program address into its corresponding program source file and the corresponding code line, you can also get the corresponding function. This tool will help the debugger locate the corresponding source code location during debugging.
As: Mainly for assembly, please refer to the following article for a detailed description of the assembly.
LD: Mainly for links, please refer to the following article for a detailed description of the links.
AR: Used primarily to create static libraries. For beginners ' understanding, the concept of dynamic and static libraries is described here:
If you want to generate one library file for more than one. O target file, there are two types of libraries, one static and the other dynamic.
In Windows, a static library is a file with a. lib suffix, and a shared library is a file with a. dll suffix. In Linux, a static library is a file with a suffix of. A, and a shared library is a file with a suffix of. So.
The difference between a static library and a dynamic library is that the code is loaded in a different time. The code for the static library has been loaded into the executable program during compilation, so the volume is large. The code for a shared library is loaded into memory when the executable is running, only a simple reference during compilation, so the code is small. In a Linux system, you can use the LDD command to view a shared library that an executable program relies on.
If there are multiple programs in a system that need to run concurrently and the shared libraries exist between these programs, the use of dynamic libraries will save memory.
LDD: Can be used to view a shared library that an executable program relies on.
Objcopy: Translates an object file into another format, such as converting. Bin to an. Elf, or converting. Elf to. bin.
Objdump: The main function is disassembly. For a detailed introduction to disassembly, see the following article.
Readelf: Displays information about elf files, see later for more information.
Size: Lists the size and total dimensions of each part of the executable file, code snippets, data segments, total size, and so on, see later for a specific usage instance using size.
C Runtime Library
The C language standard mainly consists of two parts: The syntax of C is described, and the other part describes the C standard library. The C standard library defines a set of standard header files, each of which contains related functions, variables, type declarations, and macro definitions, such as the common printf function, a C standard library function, whose prototype is defined in the Stdio header file.
The C language standard only defines the C standard library function prototypes and does not provide implementations. Therefore, the C language compiler typically requires support for a C runtime library (C run time LIBRAY,CRT). The C Runtime Library is often referred to as the C run-time library. Similar to the C language, C + + also defines its own standards, while providing the relevant support libraries, called C + + runtime libraries.
Preparatory work
Because the GCC toolchain is primarily used in a Linux environment, this article will also use Linux as the working environment. To be able to demonstrate the entire process of compiling, this section first prepares a simple Hello program written in C as an example, with the source code shown below:
#include <stdio.h>
//此程序很简单,仅仅打印一个Hello World的字符串。
int main(void)
{
printf("Hello World! \n");
return 0;
}
Compilation process
1. Preprocessing
The process of preprocessing consists mainly of the following processes:
Delete all # define, and expand all macro definitions, and process all conditional precompiled directives, such as # if #ifdef #elif #else #endif等.
Processes the # include precompiled directive, inserting the contained file into the location of the precompiled instruction.
Delete all comments "//" and "/* * */".
Add line numbers and file identities so that compile-time generates debug line numbers and compile error warning line numbers.
All #pragma compiler directives are retained, and subsequent compilation procedures require them to be used.
The commands for preprocessing using GCC are as follows:
$ gcc -E hello.c -o hello.i // 将源文件hello.c文件预处理生成hello.i
// GCC的选项-E使GCC在进行完预处理后即停止
The hello.i file can be opened as a normal text file for viewing, and its code snippet looks like this:
//HELLO.I Code Snippets
extern void Funlockfile (FILE *__stream)__attribute__ (__nothrow__, __leaf__));
#942 "/usr/include/stdio.h" 3 4
#2 "Hello.c" 2
# 3 "Hello.c"
int
main (void {
printf ( Www.thd540.com "Hello world!" "\n" return 0
/span>
2. Compiling
The compiling process is a series of lexical analysis, grammar analysis, semantic analysis and optimization of the finished files to form the corresponding assembly code.
The commands for compiling with GCC are as follows:
$ gcc -S hello.i -o hello.s // 将预处理生成的hello.i文件编译生成汇编程序hello.s
// GCC的选项-S使GCC在执行完编译后停止,生成汇编程序
The code snippet for the assembler Hello.s generated by the above command is shown below, all of which are assembly code.
// hello.s代码片段
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $.LC0, %edi
call puts
movl $0,www.078881.cn %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
3. Compilation
The Assembly procedure calls the assembly code for processing, generating instructions that the processor can recognize and saving in the target file with the suffix. O. Since almost every assembly statement corresponds to a single processor instruction, the assembly is simpler than the compilation process by invoking the assembler as in Binutils, which is translated according to the table of the assembly instruction and the processor directive one by one.
When a program is made up of multiple source code files, each file must be compiled before it is generated. o The target file before it can enter the next link work. Note: The destination file is already part of the final program, but it cannot be performed before the link.
The command to assemble using GCC is as follows:
$ gcc -c hello.s -o hello.o // 将编译生成的hello.s文件汇编生成目标文件hello.o
// GCC的选项-c使GCC在执行完汇编后停止,生成目标文件
//或者直接调用as进行汇编
$ as -c hello.s -o hello.o //使用Binutils中的as将hello.s文件汇编生成目标文件
Note: The hello.o destination file is an elf (executable and linkable format)-formatted, redirected file.
4. Links
Links are also classified as static links and dynamic links, with the following key points:
Static linking is the process of adding a static library to an executable file directly during the compile phase, so that the executable file is relatively large. The linker copies the code of the function from its location (in a different destination file or from a static link library) to the final executable program. To create an executable, the main tasks that the linker must complete are: symbolic parsing (linking the definition and reference of a symbol in the target file) and repositioning (matching the symbol definition to the memory address and then modifying all references to the symbol).
Dynamic link refers to the link stage only to add some descriptive information, while the program executes the corresponding dynamic library from the system to load into memory.
In a Linux system, the order of the dynamic Library search paths when GCC compiles a link is usually: first, the path specified by the parameter-L of the GCC command, and then addressed from the path specified by the environment variable Library_path, and then from the default path/lib,/usr/lib,/usr/local /lib looking.
In a Linux system, the sequence of dynamic library search paths when executing binaries is usually: first, the dynamic library search path specified when compiling the target code is searched, then addressed from the environment variable ld_library_path the specified path, and then from the configuration file/etc/ The dynamic library search path specified in ld.so.conf, and then from the default path/lib,/usr/lib.
In a Linux system, you can use the LDD command to view a shared library that an executable program relies on.
Because the path to the link dynamic library and the static library may be coincident, if there is a static library file and a dynamic library file with the same name in the path, For example, LIBTEST.A and LIBTEST.SO,GCC link when the default preference dynamic library, will link libtest.so, if you want GCC select link libtest.a You can specify the GCC option-static, this option will force the use of static libraries to link. Take Hello World for example:
If you use the command "gcc hello.c-o hello" will be linked using a dynamic library, the size of the generated elf executable file (viewed with the binutils size command) and the linked dynamic library (viewed using the binutils LDD command) are as follows:
$ gcc hello. C-o Hello
$ size Hello //use size to view size
text & nbsp data BSS Dec hex filename
1183 ; 552 8 1743 6CF Hello
$< Span class= "hljs-comment" > LDD hello//can see that the executable links a lot of other dynamic libraries, mainly the GLIBC dynamic Library of Linux
linux-vdso.so.1 =www.jimeiyulept.com> (0x00007fffefd7c000)
libc.so.6 =/lib/x86_64-linux-gnu/libc.so.6 ( 0x00007fadcdd82000)
/lib64/ld-linux-x86-64.so.2 ( 0x00007fadce14c000)
If you use the command "gcc-static hello.c-o Hello" it will be linked using a static library, The size of the generated elf executable file (viewed using the binutils size command) and the linked dynamic library (viewed using the binutils LDD command) are as follows:
$ gcc -static hello.c -o hello
$ size hello //使用size查看大小
text data bss dec hex filename
823726 7284 6360 837370 cc6fa hello //可以看出text的代码尺寸变得极大
$ ldd hello
not a dynamic executable //说明没有链接动态库
The final file generated after the linker link is an elf-formatted executable file, and an elf executable is usually linked to different segments, such as. text,. Data,. Rodata,. BSS, and so on.
Analyze elf Files
Segments of the 1.ELF file
As shown in the Elf file format, both the ELF header and the section Header table are segments (sections). A typical Elf file contains the following paragraphs:
. Text: The code snippet of the compiled program's instructions.
. Rodata:ro represents read only, that is, readonly data (for example, constant const).
. Data: Initialized C program global variables and static local variables.
. BSS: Uninitialized C program global variables and static local variables.
. debug: Debug Symbol table, debugger with this section of information to help Debug.
You can use Readelf-s to view information about its sections as follows:
$ readelf-s Hello
There isheaders, starting at offset0X19D8:
Section Headers:
[Nr]NameType Address Offset
Size entsize Flags Link Info Align
[0] NULL000000000000000000000000
00000000000000000000000000000000000
......
[). Init progbits00000000004003c8000003c8
000000000000001a0000000000000000 ax 0 Span class= "Hljs-number" >0 4
...
[14]. Text progbits &N Bsp 0000000000400430 00000430
0000000000000182 0000000000000000 AX 0 0 16
[15]. Fini progbits 00000000004005b4 000005b4
...
2. Anti-assembly Elf
Because Elf files cannot be opened as plain text files, if you want to view the instructions and data contained in an elf file directly, you need to use the Disassembly method.
Use objdump-d to disassemble it as follows:
$ objdump-d Hello
......
0000000000400526 <main>:Main tag's PC address
PC Address: Assembly format for instruction encoding instructions
400526:55Push%RBP
400527:48E5 mov%rsp,%rbp
40052A:BF C40540XX mov $0x4005c4,%edi
40052f:e8 cc FE FF FF CALLQ400400 <[email protected]>
400534:b80000 00 00 mov $0x0,%eax
400539: 5d &N Bsp pop &NBSP;%RBP
40053a: &NBSP;C3 &NBSP;RETQ
40053b: 0f 1f 44 00 00 &NBSP;NOPL 0x0 (%rax,%rax,1)
...
Use Objdump-s to disassemble it and display its C-language source code in a mix:
$ gcc-o hello-g hello.c//To add-G option
$ objdump-s Hello
......
0000000000400526<Main>:
#include<Stdio.h>
Int
Main (void)
{
400526:55 Push%RBP
400527:48 e5 mov%rsp,%rbp
printf ("Hello world!" "\ n");
40052A:BF C4 mov $0x4005c4,%edi
40052f:e8 cc FE FF FF CALLQ 400400<[email protected]>
return 0;
400534:b8 xx xx $0x0,%eax
}
400539:5D Pop%RBP
40053A:C3 RETQ
40053b:0f 1f NOPL 0x0 (%rax,%rax,1)
...
The ins and outs of the Linux program compilation process