C Compiler anatomy _6.1 assembly Code Generation _ Introduction

Source: Internet
Author: User

6.1 Introduction to assembly code generation

After lexical analysis, parsing, semantic checking, and intermediate code generation, we finally came to the "target code generation phase" because the UCC compiler's target code was the 32-bit x86 assembly code, so we called this chapter "assembly code Generation". Most of the source code in the UCC compiler is for Windows and Linux platforms, but the default assembler on the Windows platform supports the Intel-style x86 assembler code, while the Linux platform's default assembler uses the-T-style x86 assembler code. There are some differences between the two in the compilation syntax, and to save space, we are mainly talking about the Linux platform at the-T assembly. In this chapter, the UCC compiler faces the input is no longer the C source code, but mainly by the various basic blocks of the linked list, we want to translate the intermediate code in the basic block into the x86 assembly code. We have introduced the syntax and semantics of the x86 assembly code in the "section 1.5 combined with C language", which is not repeated here.

Or a simple example of how the UCC compiler's assembly code is generated, as shown in 6.1.1, line 1th to 6th is a simple C program, and line 11th to 58th is the assembly code generated by the UCC compiler. Like the 13th line of C + + style comments are we are added to, to illustrate the 14th line of the assembly code ". Data" is generated by the segment () function. We omitted the assembly code corresponding to the main function.


Figure 6.1.1 Example of assembly code generation

A string that appears in a C program can be treated as a global character array (of course, the name of the character array is not visible to C programmers), and through the emitstrings () function of line 15th, the UCC compiler can produce a character array of the string as the 16th line, With the Emitfloatconstants function on line 18th, you can allocate storage space for a floating-point constant that is shaped like "1.0". The UCC compiler implicitly names strings and floating-point constants, such as ". Flt0" on line 16th, ". Str0" and 19th, all with "." Begin. In the C language syntax, variable or function names named by C programmers do not "." First, this guarantees that the names on the name will not occur. In assembly code, the symbol name is allowed with "." The beginning. The Emitglobals function in line 21st is used to handle the global variables defined by the C programmer, which produces assembly code in the form of line 22nd to 23rd. The 26th line of emitfunction () is used to generate the assembly code corresponding to the function f, as shown in line 27th to 58th.

The code on line 30th to 34th of Figure 6.1.1 is used to hold the value of the register, and the 35th line is used to reserve memory space in the stack space to hold local variables and temporary variables, which is called the "Prologue Prologue", which is the work to be handled when the function starts executing. The 53rd to 57th line of Figure 6.1.1 is called the "Epilogue Epilogue", which restores the previously saved register value, and the 58th line of assembly instruction RET is used to remove the return address from the stack and return it. The return value of function f is calculated on line 50th to 51st and is saved in register EAX. The Emitblock function in line 37th is used to generate assembly code for a base block. The UCC compiler internally uses the English word generate to denote the generation of the intermediate code, and uses emit to represent the generation of the assembly code, where we translate it uniformly into "generate".

We also note that the parameter num in line 2nd of Figure 6.1.1 is in the assembly code, the corresponding name is "%EBP" in line 50th, and the local variable B of line 3rd corresponds to " -8 (%EBP)" of the 40th row, which reminds us again at the assembly level, We need to set a new name for the local variable and the formal parameter. In order to get the name of a symbol object P in the assembly code, the UCC compiler defines the function Getaccessname, whose interface is shown below, and we will analyze the function later.

Static char* Getaccessname (Symbol p);

Below Let's analyze the function used to generate the diagram 6.1.1 assembly code emittranslationunit,6.1.2, as shown in line 1th to 11th of the function Emittranslationunit will generate assembly code for the entire translation unit, we have in the figure 6.1.1 notes Beginprogr The function of AM and other functions. The 8th line of Importfunctions is used in the Windows platform Intel-style assembly to produce a function declaration shaped like "Extrn F:near32", which importfunctions on the Linux platform and has no effect. Figure 6.1.2 The code of the 12th to 26th behavior Emitfunctions, through the 16th line of the while loop, we call Emitfunction on line 21st to generate assembly code for each function.


Figure 6.1.2 Emittranslationunit ()

Figure 6.1.2 The code of the 27th to 58th behavior function emitfunction, the export function called on line 33rd is used to produce a function declaration of the form ". Globl F" On the Linux platform, and the Definelabel function of the 35th line is used to produce a label of the form "F:". According to the C standard, when the return value of a function is a struct object, and the size of the object does not fall on {1,2,4,8}, the C compiler implicitly adds a parameter to the function, which is a pointer to the struct object. For example, the following struct-struct data object is 32 bytes, and the C compiler implicitly adds a structdata * recvaddr parameter to the function GetData, and the IF statement in Figure 6.1.2 Line 36th to 47th is used to handle this.

struct Data {int dt[8];};

C Programmer-Programmed function interface

struct Data GetData (int num);

Implicitly changed by the C compiler to

void GetData (struct Data * recvaddr, int num);

Figure 6.1.2 The Layoutframe function called in line 48th is used to calculate the offset of the "formal parameter, local variable and temporary variable" in the active record, and returns the sum of the memory in the stack of "local variables and temporary variables", which we will analyze later. Figure 6.1.2 Line 50th calls Emitprologue to produce "preface", 6.1.1 29th to 35th line, figure 6.1.1 35th line "Subl $16,%esp" in the constant 16, is "The function f local variables and temporary variables of the sum of the stack memory." We can generate assembly code for each base block by using the while loop of line 51st to 55th in Figure 6.1.2, which is mainly done by the Emitblock function called by line 54th. Figure 6.1.2 Line 56th calls the Emitepilogue function to produce the "Epilogue" section, as shown in line 52nd to 58th of 6.1.1.

The functions called Emitstrings and emitfloatconstants in Figure 6.1.2 are not complex, we will not start the discussion, in the subsequent chapters, we put the focus of the analysis on the translation of the basic block, that is, the Emitblock function. In this section, let's examine the two functions that were encountered earlier in Layoutframe and Getaccessname, where the Layoutframe function is shown in code 6.1.3. According to the C standard, after the transferred function returns, the register ebx, ESI, and EDI values are the same as the values before the function call, which are called "hedging registers". The UCC compiler puts the values of these registers into the stack in the preamble of the function, and restores the values of the registers in the "Epilogue of the function" to achieve the "hedging" requirement. In addition, when the called function returns, the register EBP needs to point to the activity record of the central function again, so the called function also saves the value of the EBP register in the stack, so there are 4 registers that need to be protected by the tuned function, that is, the value corresponding to the macro Preserve_regs of the 16th line of 6.1.3.


Figure 6.1.3 Layoutframe ()

Figure 6.1.3 3rd to 13th shows the layout of the stack, the position of the 1th parameter is "%EBP", and the position of the 1th local variable or temporary variable is " -4 (%EBP)". The while loop on line 24th to 35th is used to calculate its offsets for each parameter, the offsets are incremented from 20, and the while loop on line 39th to 54th is used to calculate offsets for local variables and temporary variables, with offsets starting from "4" descending. Line 56th returns the sum of the stack space occupied by local variables and temporary variables.

Next, let's take a look at the Linux version of the function Getaccessname, which is shown in code 6.1.4, and the 6th to 8th line is used for integer constants, and its symbolic shape is "$4" in the/T assembly instruction. The name of the floating-point constant corresponds to the Emitfloatconstants function in line 18th of Figure 6.1.1, and is set to the form ". Flt0", and the 9th to 12th line is used to produce a string name such as ". Str0" and ". BB2 "label. The global variable name and the "static variable name outside the function body" can still be used in assembly code, which is set by the assignment statement on line 17th. To avoid duplicate names, the UCC compiler renames the "static variable name in the function body", resulting in a name like "C.1" on line 23rd, which is processed on line 18th to 26th.


Figure 6.1.4 Getaccessname ()

For local variables, formal parameters, and temporary variables, in the assembly code, we use a shape such as "%ebp" such as the symbol, the figure 6.1.4 27th to 32nd line according to our figure 6.1.3 in the Layoutframe function calculated offset, to set the corresponding symbol name. In the assembly code, the function name can still be used directly, and this is handled in line 33rd to 34th of Figure 6.1.4. For global or Static "array elements and struct members", we can use a shape like the "arr+12" symbol on line 42nd, and for a local array or struct object, a symbol like "%EBP" is used, as shown in line 35th to 56th of 6.1.4. When we have obtained a symbol p in the assembly code in the name P->aname, there is no need to repeat the calculation, the 2nd line of the IF statement will be judged.

In this section, the overall flow of assembly code generation is discussed in conjunction with the example of Figure 6.1.1. One of the most important issues in assembly code generation is the allocation of registers. In a later chapter, we will discuss the function Emitblock used to generate assembly code for basic blocks.

C Compiler anatomy _6.1 assembly Code Generation _ Introduction

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.