TCC, short for tiny C compiler (http://bellard.org/tcc/), is a special C compiler, you can use it as a C language interpreter, you can also embed your own application into a dynamic code generator. Yes, that's what we do. In our project, the motion rules of the particle system are described in C language, and then the native code is dynamically generated by TCC to run. In this way, both efficiency and high dynamic capability are maintained.
However, since a third-party library is used, it is necessary to prepare and swallow bugs while enjoying the results. This time, we had a very worried bug.
As we all know, the floating point Computing Unit (FPU) of the x86 CPU has eight floating point number registers, which are organized in the form of stacks. If a floating point is loaded into a register, it is occupied. You need to release it with a pop-like operation before you can use it again.
For TCC, if a function uses floating-point operations, the code generated by the function will leave a garbage in the FPU stack when the function returns (why?
This is the post and the main purpose of this Article .), In this way, only seven registers are available. If all your programs are compiled with TCC, it is okay, but it is mixed with GCC or msvc. These compilers always think that there will be eight floating-point registers available when they just enter any function. If the optimization switch is enabled, they may generate some cool B code, all eight registers are fully occupied at once. This is terrible. There are only seven pitfalls (the other TCC occupies the right place), and eight of them will be squatted at once, therefore, it triggers the "moukeng usage exception" (named FPU invalid operation) in FPU.
Exception: # IE ). The key is that this bird exception is typically covered by FPU (mask), we do not know at all, think the world is peaceful, but from the floating point register
The value retrieved from the bucket is wrong.
This bug has plagued us for several days, and our colleague Yun Feng has already said this. Next, I want to take you three steps to find the correct location, kill your feet with pain, and step on this bug (you can see it is still happening ).
Our analysis specimen is tcc-0.9.25, is also currently the latest official release, the source code here: http://download.savannah.nongnu.org/releases/tinycc/tcc-0.9.25.tar.bz2.
TCC uses the 'fstp % ST (1) 'command to leave garbage on the FPU stack. St is short for FPU stack. St (n) refers to the nth register of the floating point stack. The stack numbers are 0, 1, 2... at the top of the stack. This command copies the content of ST (0) to St (1), and then releases ST (0), that is, the top of the stack, the original ST (1) becomes the new stack top. After this command is completed, the top of the FPU stack must be occupied.
The 'fstp % ST (1) 'command is generated in two places (the binary code is 0xd9dd): tccgen. 689 rows of C (in the vpop function); tccgen. the second row of C (in the save_reg function ). First, we will generate the 'fstp % ST (0) 'command (the binary code is 0xd8dd ). 'Fstp % ST (0) 'indicates that the content of the top register of the FPU stack is popped up, so that St (0) is not occupied and no extra work is required. Originally, the vpop and save_reg functions generate related commands only to release registers. Such a change is in line with the original intention of the function.
Is that all right? Obviously it is not enough. If it is so simple, do I need to write this article? Let's try to compile the following functions with modified TCC:
Void Foo ()
{
Double Var = 2.7;
VaR ++;
}
It generates the following machine code:
. Text: 08000000 public foo
. Text: 08000000 Foo proc near
. Text: 08000000
. Text: 08000000 var_18 = qword PTR-18 h
. Text: 08000000 var_10 = qword PTR-10 h
. Text: 08000000 var_8 = qword PTR-8
. Text: 08000000
. Text: 08000000 push EBP
. Text: 08000001 mov EBP, ESP
. Text: 08000003 sub ESP, 18 h
. Text: 08000009 NOP
. Text: 0800000a 1_l_0
. Text: 08000010 FST [EBP + var_8]
. Text: 08000013 fstp ST (0)
. Text: 08000015 bytes [EBP + var_8]
. Text: 08000018 FST [EBP + var_10]
. Text: 0800001b fstp ST (0)
. Text: 081_1d FST [EBP + var_18]
. Text: 08000020 fstp ST (0)
. Text: 08000022 FLD L_1
. Text: 08000028 FADD [EBP + var_10]
. Text: 0800002b FST [EBP + var_8]
. Text: 083472e fstp ST (0)
. Text: 08000030 leave
. Text: 08000031 retn
. Text: 08000031 Foo endp
. Text: 08000031
. Text: 08000031 _ text ends
--------------------------------------------------
. Data: 08000040; segment type: pure data
. Data: 08000040; Segment permissions: read/write
. Data: 08000040; Segment alignment '32byte' can not be represented in assembly
. Data: 08000040 _ Data Segment page public 'data' use32
. Data: 08000040 assume Cs: _ DATA
. Data: 08000040; org 8000040 H
. Data: 08000040 l_0 DQ 400599999999999ah
. Data: 08000048 L_1 DQ 3ff0000000000000h
. Data: 08000048 _ DATA ends
Note the instruction snippets from 0800000a to 08000013:
// Double Var = 2.7; load a constant into st (0)
. Text: 0800000a 1_l_0
// Double Var = 2.7; copy the content of ST (0) to the variable 'var'
. Text: 08000010 FST [EBP + var_8]
// Double Var = 2.7; poping ST (0), this will clear the floating point Stack
. Text: 08000013 fstp ST (0)
The subsequent commands are generated by TCC by calling the 'void Inc (INT post, int c) 'function (line 1 of tccgen. C. The commands from 08000015 to 082.161b are generated through the call chain inc-> gv_dup:
// Load the content of the variable 'var 'into st (0)
. Text: 08000015 bytes [EBP + var_8]
// Copy the content of ST (0) to a temporary location in the memory
. Text: 08000018 FST [EBP + var_10]
// Poping ST (0), which clears the floating point Stack
. Text: 0800001b fstp ST (0)
Next, call the chain (gen_op ('+')-> gen_opif ('+')-> gen_opf ('+')-> GV (rc = 2) -> get_reg (rc = 2)-> save_reg (r = 3 ))
Generate commands from 08100001d to 08000020:
// Copy the content of ST (0) to a temporary location in the memory. However, please note that the entire floating point stack is empty and there is no legal content in St (0!
. Text: 081_1d FST [EBP + var_18]
// Poping ST (0). Similarly, the entire floating point stack is empty!
. Text: 08000020 fstp ST (0)
In actual operation, command 08100001d will cause FPU invalid operation exception (# IE ).
Why does TCC generate such silly code? Read the 'gv _ dup' function called by INC. Note that the function contains the following lines:
(1): r = GV (RC );
(2): r1 = get_reg (RC );
(3): Sv. r = R;
Sv. C. UL = 0;
Load (R1, & Sv);/* Move R to R1 */
(4) vdup ();
(5)/* duplicates value */
Vtop-> r = R1;
First of all, it is important to explain how to assign variables to register in machine command generation. This is because most hardware commands
At least one operand is required in the register. To do this, TCC organizes the information of all local variables into a stack, called vstack,
Stack top is called vtop. This includes information such as the location of the variable (in which register or a memory address. TCC pair
The register hypothesis is very conservative. Three General registers and one floating-point register are defined and identified using the enum value. treg_st0 represents the floating-point register. Why do we make such conservative assumptions? I guess it's because of some architectures (such as arm ?) The CPU is not so powerful, and for cross-platform, TCC selects a minimum public register set that all architectures share.
Explain the behavior of several functions. GV will try to load the variable represented by vtop into a register (for example, generating commands), and the register must belong to RC (I guess it is the abbreviation of register class, that is, register class) the specified category. If vtop is already in a register of this category, GV does not have to do anything. In either case, GV returns the register label of vtop. Get_reg (RC) wants to get a free RC class register (that is, it is not occupied by any variable). If all these registers are occupied, this process will cause a variable that has occupied this type of register to be squeezed into the memory (a temporary address), thus releasing a register. The return value of get_reg is the free register number. Vdup copies the vtop and makes the newly copied elements the top of the new vstack stack, that is, the new vtop.
Now let's analyze it by line. (1) I will try to load the vtop into a floating-point register. Because there is only one floating-point register, (1) the vtop will occupy treg_st0 and the return value R is equal to treg_st0. (2) try to get a free floating-point register, but because there is only one floating-point register, it is occupied by vtop, so get_reg will force vtop to enter the memory, and return treg_st0 to R1. (3) In fact, we want to generate a command to move the content of the r register to the R1 register. However, since R is equal to R1, nothing is done in the end. (4) Copy vtop. Finally (5) the new copied vtop
The position is specified as treg_st0. TCC considers that the new vtop legally occupies the unique floating-point register.
Hey, have you found any problems? Please note that the old vtop has been squeezed out of the memory, and the new vtop copied later should be in the same location as the old one, also in the memory, not in the register. However, TCC uses 'vtop-> r = r1' to assign the floating-point register to the new vtop, but does not generate any code to load the new vtop into the register. Then, because 'gen _ OP ('+') 'requires at least one operand in the Register, and because it does not correctly think that treg_st0 has been occupied by vtop, therefore, commands 08100001d and 08000020 are generated to put the vtop in a temporary memory area.
What gv_dup originally wanted to do was to load the vtop into the r register, copy the vtop on the top of the stack, and load the new vtop into the R1 register. However, when R is equal to R1, this semantics cannot be guaranteed. If it is a floating-point register, then R will be equal to R1. Therefore, for floating-point registers, gv_dup cannot complete tasks as required.
So far, I can make a bold guess. The author of TCC may also want to write 'fstp % ST (0) 'in the locations where the 'fstp % ST (1)' command was generated,
Because of the bug of register allocation in gv_dup, writing this will lead to errors in the Code Compiled by many floating point operations, so the author tried tricky
The 'fstp % ST (1) 'command seems to solve the problem, so the code is retained and the real error is hidden. We have suffered from these users!
What is the solution? It's easy. If R is equal to R1, do not make the new vtop illegally occupy the R1 register. Change (6)
If (R! = R1)
{
Vtop-> r = R1;
}
Use the modified TCC to compile our example. The generated code is as follows:
. Text: 08000000 push EBP
. Text: 08000001 mov EBP, ESP
. Text: 08000003 sub ESP, 10 h
. Text: 08000009 NOP
. Text: 0800000a 1_l_0
. Text: 08000010 FST [EBP + var_8]
. Text: 08000013 fstp ST (0)
. Text: 08000015 bytes [EBP + var_8]
. Text: 08000018 FST [EBP + var_10]
. Text: 0800001b fstp ST (0)
. Text: 081_1d 1_l_1
. Text: 08000023 FADD [EBP + var_10]
. Text: 08000026 FST [EBP + var_8]
. Text: 08000029 fstp ST (0)
. Text: 082.162b leave
. Text: 0800002c retn
It is simpler, more intuitive, and more importantly, it has no errors.
If you need a test case, the following program is suitable:
Void Foo ()
{
Double Var = 2.7;
VaR ++;
}
Int
Main ()
{
// Unmask FPU # IE (invalid operation exception) Flag of control word
Unsigned short custom = 0;
ASM ("fstcw % 0"
:
: "M" (custom ));
Custom & = 0 xfffe;
ASM ("fldcw % 0"
:
: "M" (custom ));
// Before Foo (), FPU registers stack is empty
Foo ();
// After Foo (), ST (0) is left unclean
ASM ("fld1 ;"
"Fld1 ;"
"Fld1 ;"
"Fld1 ;"
"Fld1 ;"
"Fld1 ;"
"Fld1 ;"
"Fld1 ;");
// Fnop will throw a FPU # ie exception if the bug exists
ASM ("fnop ");
Return 0;
}
If this bug exists, you will surely receive sigfpe when running this program with TCC. The key point of it is to turn on the # ie exception switch (note that the GCC Embedded Assembly syntax is used), and it will never happen quietly again. The second is to use eight consecutive fld1 commands to simulate floating point code optimized by other compilers.
Correction scheme I have submitted to the TCC Development Team, the latest development version has been labeled with this patch, the reader can be from here (http://repo.or.cz/w/tinycc.git? A = log; H = refs/heads/MOB) download. If this bug is found during use, please let me know,
I want to fight against it, so I don't believe I can't step on this bug!