Intel x87 FPU is designed for scalar floating point computing. It can compute single precision Floating Point (32 bits), Double Precision Floating Point (64 bits), and extended Double Precision Floating Point (80 bits, and complies with the ieee754 standard.
X87 FPU can work in 32-bit compatibility mode and 64-Bit mode. In both cases, the number of FPU data registers is the same, with only 8. The access method to the data register of x87 FPU is different from that of the general register. It is stack-based access. When you move external data to the data register of x87 FPU through the dig command, the x87 FPU will be based on the length of the migrated data (32-bit, 64-bit, 80-bit) the input data is treated as single-precision floating point, double-precision floating point, and extended double-precision floating point respectively, and then unified into double-precision expansion mode and placed on the top of the data stack. Therefore, the data register length of x87 FPU is 80 bits, and subsequent floating point calculations are based on extended dual precision.
If the index of the data register at the top of the stack is 0, the following one is 1, followed by 2, and so on, and 7 at the bottom of the stack. The FST or fstp command is used to output data to the memory. The former is only based on the length of the target memory (32-bit, 64-bit, 80-bit) convert the extended double-precision type into single-precision, dual-precision, and extended double-precision types, and then output them to the specified storage location. For fstp commands, in addition to moving data to the outside, the stack is also executed. In addition, swap, Fst, and fstp can also move the data in the FPU internal register. Swap is used to move the data at the specified FPU data register location to the top register of the stack, FST moves the data in the top register of the stack to the position of the specified FPU data register.
Note that the 80-bit (10 bytes) is not an integer multiple of 32-bit (4 bytes, therefore, the double precision expansion floating point of data loading or storage is usually carried out in 96 bits (12 bytes), so "tbyte" refers to "Twelve bytes. Here we can list the byte and word width modifier in Intel assembly format-byte (byte, 8 bits), word (word, 16 bits), DWORD (dual-word, 32-bit), qword (four-character, 64-bit), tbyte (twelve-byte, 96-bit), mmword (64-bit, but only applicable to MMX instruction sets), xmmword (128-bit, used for SSE instruction sets), ymmword (256 bits, used for the latest avx instruction sets ).
The following code demonstrates the calculation of sin (10.05) + cos (20.05). Parameters in trigonometric functions are expressed in radians.
/* <Br/> * Hi. c <br/> * test <br/> * created by zenny Chen on 2/11/11. <br/> * copyright 2011 greengames studio. all rights reserved. <br/> */<br/> # include <stdio. h> <br/> void test (Long Double * P, long double * q) <br/>{< br/> _ ASM __(". intel_syntax "); </P> <p> _ ASM _ ("using tbyte PTR [RDI]/n/t" // use double extended float <br/> "using tbyte PTR [RSI]/n/t "// use double extended float <br/>" fcos/n/t "// cos <br/>" fstp ST (2) /n/t "// mov ST (0) to St (2) and pop the value <br/> "FSIN/n/t" // FSIN <br/> "fadd st (0), ST (1) /n/t "// Add ST (0) and ST (1) <br/> "fstp dword ptr [RDI]/n/t" // automatically truncate to single-float, write to the first Arg and pop the value <br/> ); </P> <p> _ ASM __(". att_syntax "); <br/>}< br/> int main (void) <br/>{< br/> Long Double A = 10.05; <br/> Long Double B = 20.05; </P> <p> // calculate: <br/> // sin (10.05) + cos (20.05) =-0.2233334 <br/> test (& A, & B); </P> <p> printf ("the answer is: % F/R/N ", * (float *) & A); <br/>}< br/>
The above inline assembly is written in at&t Assembly syntax format as follows:
. Text <br/>. align 2 </P> <p>. globl _ asm_test </P> <p> _ asm_test: <br/> fldt (% RDI) <br/> fldt (% RSI) <br/> fcos <br/> fstp % ST (2) <br/> FSIN <br/> FADD % ST (1), % ST (0) <br/> fstp (% RDI) </P> <p> RET