GCC compilation process and intermediate RTL exploration

Source: Internet
Author: User
Article title: GCC compilation process and intermediate RTL exploration. Linux is a technology channel of the IT lab in China. Includes basic categories such as desktop applications, Linux system management, kernel research, embedded systems, and open source.
   1. Introduction to GCC
The compiler works to translate source code (usually written in advanced languages) into target code (usually low-level target code or machine language). in the implementation of modern compilers, this work is generally implemented in two phases:
  
In the first stage, the front-end of the compiler accepts the input source code and obtains an intermediate representation of the source program through lexical, syntax, and semantic analysis.
  
In the second stage, the back-end of the compiler optimizes the intermediate representation generated by the front-end processing and finally generates code that can run on the target machine.
  
GCC (GNU Compiler Collection) is a Collection of compilers widely used on UNIX and UNIX-like platforms. it supports front-ends of multiple languages, including C, C ++, Objective-C, Ada, fortran, Java, and treelang.
  
There are two important goals in the GCC design. one of them is that when you build a compiler that supports different hardware platforms, its code can be reused to the maximum extent, therefore, GCC must achieve a certain degree of hardware independence; the other is to generate high-quality executable code, which requires centralized optimization of the code. To achieve these two goals, GCC uses a language unrelated to the hardware platform, which can abstract the actual architecture, this intermediate Language is RTL (Register Transfer Language ).
  
Although the research and development work on GCC focuses on GCC back-end code optimization, the goal of this article is how the front-end works in the GCC compilation process.
  
The purpose of this study is to separate the front-end of GCC. when designing a new compiler, we only need to focus on how to design the front-end of the new compiler, the code optimization and target code generation are left to the GCC backend for completion, avoiding repetitive work of the backend design.
  
This document uses the C language as an example to describe how gcc [2] processes the front-end of a. c file after receiving the input, and obtains an intermediate representation and forwards it to the backend for processing. Then, after learning about the gcc workflow, I will introduce the hack gcc Process in the RTL presentation layer of gcc and share some experiences with you, hope to help readers who are interested in researching and developing gcc.
  
   2. gcc workflow
Gcc is a driver that accepts and explains command line parameters and determines the next action based on the analysis results of command line parameters. gcc provides multiple options to control the gcc compilation process, we can find detailed information about these compilation options in the GCC Manual.
  
The use of gcc is relatively simple, but it is complicated to understand the compilation process in depth. In the face of the huge [3] gcc, we can only choose the part of interest for analysis. However, we cannot obtain detailed documentation on the gcc compilation process [4]. this is mainly because gcc is too complicated and is constantly changing, so we only need to learn about gcc through other channels. There are two better methods: first, read the source, and you can trace the functions you are interested in. reading the code looks terrible, but there are actually many comments in the code to illustrate its functions, this makes reading easier. this method helps us grasp gcc as a whole. The other is debug gcc, which is to use a debugger to track the gcc compilation process, in this way, we can see the actual process of gcc compilation and track the details that we are interested in. First, let's look at some important functions of gcc and the call relationship between them from the source, and then debug gcc to track the details of our concerns during the hack gcc, in addition, you can detect and modify patch errors through debugging.
  
Before reading the gcc code, we recommend that you read the chapter passes and files of the compiler in GCC internals-if you have not read it before, this section will help you create a rough image of the gcc structure.
  
Well, we use functions in gcc as the unit. we hope to describe the top-down function call relationship in gcc as much as possible. In the gcc source code directory, it is easy to find a file main. c. It should be the gcc entry. this main. in the c file, there is only one function main, and there is only one statement in this main function, which calls the toplev_main function. A single main function is used to call toplev_main, so that different front-ends of different languages can easily design different main functions.
  
The toplev_main function is in toplev. as defined in the c file, we can see from the name that this file should be used to control the top-level gcc compilation process, the comments at the beginning of the program also show that it is used to process command line parameters, open files, call each analytic program [5] in an appropriate order, and record the processing time of each of them. Toplev_main first initializes gcc, mainly setting environment variables and diagnostic information, and then parses command line parameters, which we are not interested in, it is important that the do_compile function is called next. the function is compiled by name, and the toplev_main function is returned afterwards.
  
The do_compile function is also used in tolev. as defined in c, it calls some functions for further initialization, such as the initialization of the timer during the compilation process, the initialization of the specific programming language, and the initialization of the backend, etc, at the same time, it further processes the command line parameters parsed in the toplev_main function. After completing the above work, the compile_file () function is called, which should be used for real compilation.
  
The compile_file function is still in toplev. as defined in c, the compile_file function and the above do_compile function are mentioned here. they are void functions for both parameters and return types, the various parameters required during compilation, including the compiled file name, compilation parameters, and some hook functions used in gcc, are all represented by global variables. of course, these global variables have been properly initialized in the previous initialization functions. Next we will talk about the compile_file function, which implements some initialization work that we are not very concerned about. after that, it finally calls a hook function to analyze the entire input file (parse:
  
(* Lang_hooks.parse_file) (set_yydebug );
  
  
Here, lang_hooks is a global variable. The frontend of different languages assigns different values to it to call their own unique analysis programs, for the definition and initialization of the lang_hooks structure, see langhooks in the source code. h. langhooks. c and langhooks-def.h files, here is not detailed investigation. For C, this statement is equivalent to calling the c_common_parse_file function in the c-opts.c.
  
C_common_parse_file calls the c_parse_file function in the c-parse.c, in which the yyparse function is also called in the c-parse.c. It is necessary to introduce the c-parse.c file, which is a syntax parser obtained from the c-parse.y by GNU bison [6. C-parse.y is a YACC file that uses BNF (Backus Naur Form) to describe the syntax of a programming language. [7]
  
So far, the main function call relationship in gcc is still quite clear, from the main function layer-by-layer in-depth, into the yyparse function in the c-parse.c. As mentioned above, c-parse.c files are automatically generated after GNU bison acts on the YACC file of c-parse.y, which makes it difficult to read this code, because the c-parse.c file generated by bison contains many goto statements and switch statements with more than 500 cases, so many choices and jump statements undoubtedly bring great difficulties to the function call tracing gcc, we cannot continue.
  
Let's look back at the previous code, comments, and some documents. I 've noticed that a function is mentioned many times ?? Rest_of_compilation, which seems to be a very important function. let's take a look.
  
  
In toplev. in c, we have found this function. the comment shows that its function is: after processing the function definition or variable definition at the top of the program, then compile these functions or variables and output the corresponding assembly code. after this function is returned, the tree structure used in gcc will die out. It seems that the functions of this function are complicated. it has generated the compilation code corresponding to the source program and released the space occupied by the corresponding tree structure, what we are interested in is the internal use of RTL representation in the gcc compilation process, which should be done before the rest_of_compilation function returns.
  
Previously we tracked the yyparse function from the main function. here we found a very important rest_of_compilation function, but we still don't know what gcc has done in the middle of this process, perhaps we are concerned about RTL processing.
  
Now we only need to debug gcc to see the function call after entering yyparse. here we will introduce the gcc debugging method:
  
To debug gcc, you actually need to debug the locally compiled gcc source code. then, go to the directory where the gcc program is located and run the following command:
  
  
$ Gdb PC3
$ Break main
$ Run-dr/PATH/test. c
  
  
In this way, use-dr as the compilation parameter to run gcc to compile test. c file, and a breakpoint is set at the entrance of the main function. as the compilation parameter,-dr requires that it be dump to. remove from the file ending with rtl. Next, set another breakpoint before rest_of_compilation, run the continue command to run the breakpoint, and run the backtrace command to view the function stack frame at this time:
  
$ Break rest_of_compilation
$ Continue
$ Backtrace
  
The following table 1 lists the function calls from main to rest_of_compilation displayed during gdb debugging:
    
   Table 1. list of some function call stack frames
  
The debugging results confirm that the previous analysis is correct. the call sequence from the main function to the yyparse function is consistent with the analysis results obtained when we read the code. Now we get a series of function calls from yypare to rest_of_compilation during gcc compilation, which are worth noting. let's return to the source code to see the functions of these functions.
  
Always remember our goal: we do not care about how gcc generates tree structures, nor how gcc generates assembly code by the Intermediate presentation layer RTL, we are interested in how RTL is generated and want to make some modifications in the RTL presentation layer to achieve our goal. In order to save some space, this article omitted the analysis of functions that we are not very concerned about, directly jump to the part related to RTL generation and processing.
  
Finally, in the tree_rest_of_compilation in the tree-optimize.c, we found a series of function calls that seem to be related to RTL generation, particularly attracting attention to a hook function:
  
(* Lang_hooks.rtl_expand.stmt) (DECL_SAVED_TREE (fndecl ));
  
  
This line of code
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.