compiler's working procedure and principle

Source: Internet
Author: User

Reprint: http://www.codeceo.com/article/compiler-process.html#0-youdao-1-33675-32553cecb956bf88a1550052113e506a

Code to run, you must first turn into a binary machine code. This is the task of the compiler.

For example, the following source code (assuming the file name is called test.c).

#include <stdio.h>int main (void) {  fputs ("Hello, world!\n " , stdout);   return 0 ;}

You need to work with the compiler before you can run it.

$ gcc test.c$./a.outhello, world!

For complex projects, the compilation process must also be divided into three steps.

$./configure$ make  $ make install

What the hell are these orders doing? Most of the books and materials, all vague, only said that it can be compiled, no further explanation.

This article describes the compiler's work process, which is the task of each of the three commands above. I mainly refer to Alex Smith's article "Building C Projects". It is important to state that this article is for the GCC compiler, which is for C and C + +, and does not necessarily apply to compilation in other languages.

First step configuration (Configure)

Before the compiler starts working, it needs to know the current system environment, such as where the standard library is located, where the software is installed, what components need to be installed, and so on. This is because different computer system environment, by specifying the compilation parameters, the compiler can flexibly adapt to the environment, compiling a variety of environments can run machine code. This step, which determines the compilation parameters, is called "Configuration" (Configure).

These configuration information is stored in a configuration file, which is a script file called Configure. Usually it is generated by the autoconf tool. The compiler learns the compilation parameters by running the script.

The Configure script has tried to take into account differences in different systems, and has given default values for various compilation parameters. If the user's system environment is special, or there are some specific requirements, you will need to manually provide compilation parameters to the Configure script.

$./configure--prefix=/www--with-mysql

The above code is the PHP source of a compilation configuration, the user specifies that the installed files are saved in the WWW directory, and compiled with the support of the MySQL module.

The second step determines the location of the standard library and header files

The source code will definitely use the standard library functions and header files (headers). They can be stored in any directory in the system, and the compiler will not actually be able to automatically detect their location, only through the configuration file to know.

The second step in compiling is to know the location of the standard library and header files from the configuration file. In general, a configuration file gives a list of several specific directories. At compile time, the compiler will sequentially go to these directories to find the target.

The third step is to determine the dependency relationship

For large projects, there is often a dependency between source files, and compilers need to determine the order in which they are compiled. Assuming that a file relies on B files, the compiler should ensure that the following two points are achieved.

(1) Compile a file only after the B file has been compiled.

(2) A file will be recompiled when the B file changes.

The compilation sequence is saved in a file called Makefile, which lists which files are compiled first and which are compiled. The makefile file is generated by the Configure script, which is why the compile-time configure must run first.

While determining dependencies, the compiler also determines which header files are used at compile time.

Pre-compilation of the fourth step header file (precompilation)

Different source files may refer to the same header file (e.g. stdio.h). When compiling, the header files must also be compiled together. To save time, the compiler compiles the header file before compiling the source code. This ensures that the header files need to be compiled only once, and they do not have to be recompiled each time they are used.

However, not all the contents of the header file will be precompiled. The # define command used to declare a macro is not precompiled.

Fifth step preprocessing (preprocessing)

After the precompilation is complete, the compiler begins to replace the bash header files and macros in the source code. Take the source code at the beginning of this article as an example, it contains the header file Stdio.h, replaced by the following look.

extern int fputs (const char *, FILE *), extern FILE *stdout;int main (void) {    fputs ("Hello, world!\n", stdout);    return 0;}

For readability, the code above only intercepts the source-related part of the header file, the fputs and file declarations, omitting the rest of the stdio.h (because they are very long). In addition, the header file of the above code is not precompiled, but in fact, the source is the precompiled result. The compiler will also remove the comment in this step.

This step is called "preprocessing" (preprocessing), because when you're done, you start to really deal with it.

Sixth step compilation (compilation)

After preprocessing, the compiler starts generating machine code. For some compilers, there is still an intermediate step, the source code will be converted to a sink code (assembly), and then the Exchange code to machine code.

The following is the beginning of this article, the source code to the assembly code.

. file"test.c". Section. Rodata. LC0:.string "Hello, world!\n.". Text. GLOBL Main. Type Main, @functionmain:. LFB0:. Cfi_startproc Pushq%RBP. Cfi_def_cfa_offset -. Cfi_offset6, - -movq%RSP,%RBP. Cfi_def_cfa_register6movq stdout (%rip),%Rax movq%rax,%RCX MOVL $ -, %edx MOVL $1, %ESI MOVL $. LC0,%EDI call fwrite MOVL $0, %eax POPQ%RBP. CFI_DEF_CFA7,8ret. Cfi_endproc. LFE0:. Size main,.-Main. Ident"GCC: (Debian 4.9.1-19) 4.9.1". Section. Note. GNU-stack,"", @progbits

This transcoded file is called an object file.

Seventh Step Connection (linking)

The object file cannot be run and must be further turned into an executable file. If you take a closer look at the transcoding results, you'll see that the STDOUT function and the fwrite function are referenced. That is, the program to run normally, in addition to the above code, there must be stdout and fwrite the code of the two functions, which are provided by the standard library of C language.

The next step of the compiler is to add the code for the external function (usually a file with a suffix of. Lib and. A) to the executable file. This is called connection (linking). This way of adding an external library to an executable file by copying, called static linking, is mentioned in the following article as well as dynamic linking.

The make command works by pre-compiling the fourth step header file until you finish this step.

Eighth Step installation (installation)

The connection in the previous step was made in memory, where the compiler generated the executable file in memory. Next, you must save the executable file to the installation directory that the user specified beforehand.

On the surface, the simple step is to copy the executable (associated data file) in the past. But in practice, this step must also be done by creating a directory, saving files, setting permissions, and so on. This entire save process is called "Installation" (installation).

Nineth Step Operating System connection

After the executable is installed, the operating system must be notified in some way to let it know that the program is available. For example, we installed a text reading program, often want to double-click the TXT file, the program will automatically run.

This requires that the program's metadata be registered in the operating system: file name, file description, associated suffix, and so on. Linux systems, this information is typically stored in a. desktop file in the/usr/share/applications directory. Also, in the Windows operating system, you need to create a shortcut in the Start Boot menu.

These things are called "Operating system connections." The Make install command is used to complete both the "Install" and "OS Connection" steps.

Tenth step to build the installation package

Write here, the whole process of compiling the source code is basically completed. But only a very small number of users, willing to resist the temper, to do the process from beginning to end. In fact, if you only have the source code to hand over to the user, they will assume you are an unfriendly fellow. Most users want a binary executable program that will run immediately. This requires the developer to make the executable file that was generated in the previous step into an installation package that can be distributed.

Therefore, the compiler must also have the ability to build the installation package. Usually is the executable file (associated with the data file), in a certain directory structure, save to the user of a compressed file package.

11th Step Dynamic Connection (linking)

Normally, by this step, the program is ready to run. What happens during runtime is irrelevant to the compiler. However, the developer can choose how the executable file connects to the external library during the compile phase, whether it is a static connection (at compile time) or a dynamic connection (runtime connection). So, finally, what is called dynamic connection.

As mentioned earlier, a static connection is a copy of an external function library into an executable file. The advantage of this is that the scope of application is wide, do not worry about the user machine is missing a library file, the disadvantage is that the installation package is relatively large, and between multiple applications, can not share the library file. Dynamic connections do the opposite, and the external libraries do not enter the installation package, but are dynamically referenced at run time. The advantage is that the installation package will be small, multiple applications can share the library file, the disadvantage is that the user must install the library files beforehand, and the version and installation location must meet the requirements, otherwise it will not work.

In reality, most of the software uses dynamic connection and shared library files. This dynamically shared library file, the Linux platform is a file with the suffix. So, the Windows platform is a. dll file, and the Mac platform is a. dylib file.

compiler's working procedure and principle

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.