C compiler profiling-managing the register for generating assembly code

Source: Internet
Author: User

C compiler profiling-managing the register for generating assembly code

In a computer, the CPU speed is much faster than the memory speed. The compiler should make full use of register resources to reduce unnecessary access to the memory, this increases the speed of the compilation code generated by the compiler. In the intermediate code generation stage, the UCC compiler uses the temporary variable t to store the values of common subexpressions such as "t: a + B;". When the assembly code is generated, the UCC compiler stores the values of these common subexpressions in registers as much as possible. When the values need to be reused again, they can be obtained directly from the corresponding registers. However, the resources of registers in the CPU are very limited. On a 32-bit x86 chip, the available registers of assembler programmers include {eax, ebx, ecx, edx, esi, edi, esp, ebp}, but register esp is generally used to point to the top of the stack, while ebp is generally used to point to the bottom of the activity record. Only {eax, ebx, ecx, edx, esi, edi} registers are available. When there are more common subexpressions than available registers, we need to write the values of some registers back (WriteBack) to the memory unit corresponding to the temporary variables to free up available registers to store other values. Of course, if the value in the register to be written back is no longer needed, or it does not change after the memory is loaded into the CPU, we do not have to waste time doing the write back operation. This is similar to the page replacement algorithm in the operation system request page. To simplify the register allocation algorithm, the UCC compiler only allocates registers for temporary variables, which means we want to reuse public subexpressions like t: a + B; as much as possible. Let's use a simple example to describe the register allocation of the UCC compiler, as shown in 6.2.1, code for the 2nd-8 behavior function f and Code for the 9th-13 behavior function g, after assigning values to s3 in row 5th of function f, we intend to use public subexpressions (a + B) and (c + d) again in rows 6th and 7th ). Lines 32nd to 46 are the assembly code corresponding to function f, while lines 48th to 58 are the assembly code corresponding to function g.

 

Figure 6.2.1 example of register allocation

From the intermediate code in Figure 6.2.1 15th to 23, we can find that function f has three temporary variables, each of which is used to store integers, occupying 12 bytes of stack space, the "subl $12, % esp" in line 1 is used to open up 12 bytes of memory in the stack to store the three temporary variables. Although we load the global variable a from the memory into the register eax through the "movl a, % eax" command in line 1, we execute the "addl B, after the "% eax" command, the value of the temporary variable "t0: a + B" is saved in the register eax, and the Register eax is assigned to t0 until t0 is no longer used, or when the register is insufficient. We can find that in both row 42nd and row 45th, We have reused the public subexpression (a + B) stored in register eax. In row 44th, we set "(a + B) the value of + (c + d) is stored in the register edx. In contrast, in Row 3 of function g, we store the value of "(a + B) + (c + d)" in the register eax, the reason is that after 57th rows, the Temporary Variable t0: a + B of row 25th is no longer used in the basic block BB1 ".

Note that an intermediate code may correspond to several Assembly commands, for example, "t0: a + B" in line 16th of 6.2.1; "corresponds to the two assembly commands shown in Figure 6.2.1 36th and 37.

If the range of register allocation is expanded to a famous variable (that is, the global, static, and local variables named by the C programmer), the memory access times can be further reduced. For example, when we read global variable a into a register R1 and then need the value of a again, we can get it directly from register R1 without having to access the memory. However, since C programmers can access "Famous variables" by using variable names, or indirectly by using the variable address addr, their access methods are flexible. If you access global variable a in the form of * addr, we may load the value of a into another register R2, the content in the registers R1 and R2 may be inconsistent. To avoid such problems, the compiler needs to perform more complex analysis. Temporary variables are only generated by the compiler and invisible to the C programmer. Generally, the UCC compiler only assigns values to temporary variables once, for example, temporary variables in rows 6.2.1 15th to 29. But there is an exception. The UCC compiler is processing something like "a> 0? When B: c "is a conditional expression, it does assign values to the temporary variables multiple times. We will discuss this later.

To avoid complex data flow analysis, the UCC compiler only allocates registers for temporary variables. Next, let's take a look at the GetRegInternal function used to allocate registers, as shown in lines 6.2.2 2nd to 18. The width parameter of line 2nd represents the width of the expected register, it can be 1 byte (corresponding to registers al, cl, and dl), or 2 byte (corresponding to registers ax, cx, dx, bx, si, and di ), it can also be 4-byte registers (corresponding to eax, ecx, edx, ebx, esi and edi ). Row 3 calls the FindEmptyReg function to obtain unallocated registers. If no empty register exists, select a register to be written back through the SelectSpillReg function in Row 3, then, write the value stored in the register back to the memory through the SpillReg function of Row 3, so that the Register can be allocated again. Row 17th sets the corresponding flag in the variable UsedRegs, indicating that the I-th register has been used. Before the UCC compiler generates an assembly command for an intermediate code, the variable UsedRegs is cleared to 0. This can be seen later when the EmitBlock function is analyzed. The 1st-row variable UsedRegs is used to record various registers that have been assigned to the current intermediate code.

 

Figure 6.2.2 GetRegInternal ()

Figure 6.2.2 FindEmptyReg, a function of rows 19th to 28, is used to find unallocated registers. The condition "X86Regs [I] For row 22nd is displayed. = NULL "will exclude the two stack registers esp and ebp. The condition of row 23rd indicates that the Register does not save the value of temporary variables (empty register EmptyRegister ), the condition in row 24th indicates that the Register has not been assigned to the current intermediate command. If no empty register is found, NO_REG is returned in row 27th. When all the registers are allocated, we need to use the SelectSpillReg function from 29th to 48 rows to select a register to be eliminated. This is similar to the page replacement algorithm in the request paging system. We need to remove some pages by using FIFO or LRU algorithms. The UCC compiler selects based on "the total number of times the register corresponds to the temporary variable references". The registers with the least number of referers are written back, and 32nd to 45 rows are processed. The SpillReg function of lines 48th to 58 writes the values of the temporary variables stored in the register back to the memory. This action is often called "register overflow ". Figure 6.2.2 when the value of "p-> needwb" in row 52nd is not 0, it indicates that the value of the Temporary Variable p in the register is inconsistent with that in the memory, "p-> ref> 0" indicates that the Temporary Variable p needs to be used again. When both conditions meet, we will call the 53-line StoreVar function to generate memory writing commands. Of course, in the current UCC compiler, a register usually only stores the value of a temporary variable. Therefore, the loop bodies of the while statements of the 38th and 50th rows are only executed once. In other words, the number of elements on the X86Regs [I]-> link and reg-> link of the 37th-row chain table cannot exceed 1. In row 51st, p-> reg is set to NULL, indicating that the value of the Temporary Variable p is no longer stored in the register.

Intel also provides a floating-point coprocessor X87 to speed up floating-point computation. X87 provides a stack consisting of multiple floating-point registers, but for simplicity, the UCC compiler uses only the floating point register at the top of the stack to save a floating point temporary variable. The pointer variable X87Top in the UCC compiler is used to point to this temporary variable. When X87Top is not NULL, the value of the temporary variable is stored at the top of the coprocessor X87 stack.

Static Symbol X87Top;

On this basis, let's take a look at the EmitBlock function of "generate assembly code for the basic block", as shown in rows 1st to 24 in 6.2.3. The while loop of row 3rd traverses all the intermediate code in the basic block, the EmitIRInst function called in line 1 implements the operation of generating assembly commands for the inst intermediate code. From rows 25th to 29 in Figure 6.2.3, we can find that the corresponding function is called through the Table query, And the EmitJump function name of row 27th is stored in the table Emitter of row 33rd. The EmitJump function of lines 33rd to 38 is used to generate assembly code for unconditional jump commands.

 

Figure 6.2.3 EmitBBlock ()

Although we only reuse the public subexpressions in the same basic block in the intermediate code generation stage, when the conditional expression "(a> 0? B: c) ", there is a situation where" temporary variables in a basic block may be used in other basic blocks ", as shown below, the temporary variable t0 is assigned to multiple basic blocks. When we exit the basic block BB3 through "gotoBB5", we need to write back the value of the Temporary Variable t0.

D = (a> 0? B: c) + 1;

//////////// Corresponding intermediate code /////////////////

If (a <= 0) goto BB4;

BB3:

T0 = B; // assign a value to the Temporary Variable t0

Goto BB5;

BB4:

T0 = c; // assign a value to the Temporary Variable t0

BB5:

T1: t0 + 1;

D = t1;

Therefore, when the control flow needs to exit the basic block through a jump statement, or when a function call occurs, we need to perform a write-back operation on the register, figure 6.2.3 "SaveX87Top" called by row 8th is used to write back the top register of X87, while the "write back" Operation on the X86 register is performed, we will postpone the execution in functions such as EmitJump, EmitBranch, EmitIndirectJump, and EmitCall, corresponding to "unconditional jump" and "Conditional jump" respectively ", "jump through the jump table" and "function call ". According to the C standard, when calling a function, the main function only needs to write back the three registers eax, ecx, and edx. When a jump statement is encountered, we call the ClearRegs function to write the six registers "eax, ebx, ecx, edx, esp, and ebp" back and forth, as shown in row 6.2.3 37th.

The code for the SaveX87Top function is shown in lines 6.2.3, 48th to 57. The PutASMCode function called by line 3 produces the floating point write-back instruction. We will analyze the PutASMCode function in the subsequent sections, if X87Top is set to NULL in row 56th, no temporary variable needs to be written back.

We also found that when the control flow leaves the above basic block BB4, we should also write the value of the Temporary Variable t0 back to the memory, this can be achieved through the ClearRegs function called by row 22nd in Figure 6.2.3 and the SaveX87Top function called by row 23rd. After an intermediate code in Figure 6.2.3 10th generates an assembly command, the reference count of each operand can be reduced by 1 in lines 14th to 18, this will affect the SelectSpillReg function of the overflow register selected previously.

In subsequent chapters, we will discuss functions that generate assembly instructions for intermediate code, such as EmitBranch, EmitIndirectJump, and EmitCall.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.