C language floating point data type from the ieee754 Standard

Last Update:2018-12-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Let's take a look at the following questions. If you can answer them accuratelyArticleIt will not suit you:

How do I represent floating-point numbers in a computer? What is the difference between the two methods?
The float type with 32-bit precision and the double type with 64-bit precision indicate the maximum range of floating point numbers?
This C language statement printf ("% d \ n", 2.5); what is the output result, why?

The answer is as follows:

The signed integer type in the computer is represented by a supplementary code. The floating point type has never been thought.
Float type can be-232-1 ~ 232, double type can represent-264-1 ~ 264.
The output format must be an integer type, while the number is a floating point type. After the type is converted, the output result is 2.

It is clear that all my answers are incorrect. Well, the following is my summary of some materials, hoping to clearly explain the "Mysteries".

The ieee754 standard (hereinafter referred to as the "standard") is the most widely used floating point calculation standard, which is used by many CPUs and floating point processors. The standard defines the format of floating point numbers, as shown in:

We will only discuss the representation of the binary floating point, which is divided into three parts:

Symbol bit, index, and ending number. Their meanings can be analogous to scientific notation. For example:

Scientific counting method:

(102.35045) 10 = + 1.0235045 × 102 The positive symbol is, the exponent is 2, and the ending number is 1.0235045.

(-0.00023103) 10 =-2.3103 × 10-4 The symbol bit is negative, the index is-4, and the ending number is 2.3103.

Also in normalized binary floating point:

(1001.0111010) 2 = + 1.001011101 × 23 the positive symbol is, the exponent is 3, and the ending number is 1.001011101.

(-0.0001010011) 2 =-1.010011 × 2-4 The symbol bit is negative, the index is-4, and the ending number is 1.010011.

The preceding example shows that after the binary floating point is normalized, the tail number format is 1. * ** the number of digits that the decimal point is moved to indicates the number of digits that can be normalized (1 for the left index and 1 for the right index). Therefore, it can be positive or negative.

The standard also stipulates:

The sign bit is represented by 1 bits, 0 represents a positive number, and 1 represents a negative number;
The index is represented by a shift code (the original actual exponent value is obtained by adding a fixed value). This fixed value is 2e-1-1 (E is the length of some bits in the index). The reason for this offset is added, it is to convert a negative number to a non-negative number, so that the sizes of the two indexes can be easily compared.
The ending number is represented by the original code. As mentioned above, the highest bits of the normalized binary floating point number are 1, so there is no need to store this before the decimal point. We already exist by default, it is called "hidden bit".

The Standard specifies four floating point Representation Methods: single precision (32-bit), double precision (64-bit), extended single precision (more than 43 bits, rarely used) and extended dual precision (more than 79 bits, usually 80 bits ). The float and Double Floating Point types in C Language correspond to single-precision and double-precision floating point numbers respectively. The following describes the storage formats of these two floating point numbers:

In the preceding two examples, the single precision and double precision are used as follows:

(1001.0111010) 2 = + 1.001011101 × 23 single precision: The symbol bit is 0, the index bit is 3 + 127 = 130 (10000010), and the ending number is 1.001011101. After hiding the highest bit 1, it is 001011101. Therefore, it is expressed: 0 10000010 00101110100000000000000 dual precision: only the offset difference on the index bit, 3 + 1023 = 1026 (10000000010), expressed as: 0 10000000010 0010111010000000000000000000000000000000000000000000 (-0.0001010011) 2 =-1.010011 × 2-4 single precision: The symbol bit 1, The exponent bit is-4 + 127 = 123 (1111011), and the tail number 1.010011 hides the highest bit 1 and is 010011, so it is expressed: 0 01111011 01001100000000000000000 dual precision: The exponent bit is-4 + 1023 = 1019 (1111111011), indicating: 0 01111111011 0100110000000000000000000000000000000000000000000000

At this point, we have already explained the storage formats and methods of floating point numbers in the computer, which is equal to answering the first question above. As for the second question, if we understand what we have mentioned above, the following table shows the extreme values of a Single-precision floating-point number:

For the last question, let's write a C LanguageProgramTest:

 # Include <stdio. h> int main () {printf ("% d \ n", 2.5); Return 0 ;}

The compilation and running results are as follows:

[Guohl @ guohl] $ gcc-O test. C-G [guohl @ guohl] $./test 0

The running result is different from expected 2. Use GDB for debugging, insert a breakpoint at the main function, and disassemble the main function to get the following result:

 (GDB) Break mainbreakpoint 1 at 0x8048415: file test. c, line 5. (GDB) runstarting program:/home/guohl/documents/AS/test breakpoint 1, main () at test. c: 55 printf ("% d \ n", 2.5); (GDB) disassemble dump of javaser code for function main: 0x0804840c <+ 0>: push % EBP 0x0804840d <+ 1>: mov % ESP, % EBP 0x0804840f <+ 3>: and $0xfffffff0, % ESP 0x08048412 <+ 6>: sub $0x10, % ESP => 0x08048415 <+ 9>: fldl 0x80484e0 0x0804841b <+ 15>: fstpl 0x4 (% ESP) 0x0804841f <+ 19>: movl $0x80484d8, (% ESP) 0x08048426 <+ 26>: Call 0x80482f0 <printf @ PLT> 0x0804842b <+ 31>: mov $0x0, % eax 0x08048430 <+ 36>: Leave 0x08048431 <+ 37>: Ret end of specified er dump.

The fldl ADDR command loads the double-precision floating point number in the memory ADDR to the FPU register stack. fstpl value outputs the double-precision data from the FPU register stack and saves it To the value. Therefore,

 0x08048415 <+ 9>: fldl 0x80484e0 0x0804841b <+ 15>: fstpl 0x4 (% ESP)

 First take out the memory 0x80484e0 at the Double Precision Floating Point Number loaded to the FPU register st0, and then from the st0 removed to the esp-4. Run the GDB-x command to view the content at 0x80484e0 in the memory:

 (GDB) x/FG records: 2.5 (GDB) x/2xw 0x80484e00x80484e0: 0x00000000 0x40040000 (GDB) X/8 TB 0x80484e00x80484e0: 00000000 00000000 00000000 00000000 00000000 00000000 00000100

 As you can see from the above, the result is 2.5 in double decimal places, because our platform uses the small-end format storage (little-edian, low-byte storage in low memory location ), therefore, the results obtained by viewing in bytes are restored to the following representation:

 01000000 00000100 00000000 00000000 00000000 00000000 00000000

We use the ieee754 standard dual-precision format to parse the above binary, the symbol bit is 0, that is, positive; the index bit is 10000000000 (1024) minus the offset 1023 is 1; the ending number is 0100... 000, plus the hidden digit 1, which is 1.01 (that is, decimal 1.25 ). Therefore, the result is + 1.25 × 21 = 2.5, which meets our expectation.

Then the fstpl command loads the floating point number to the esp-4 as a parameter to the printf function, and then the command "movl $0x80484d8, (% ESP) "Save the pointer of the output format controller" % d "to the position pointed to by ESP as a function of the printf function. We can use GDB to check whether the format controller string at memory 0x80484d8 is:

 (GDB) x/4cb 0x80484d80x80484d8: 37' % '000000' D '10' \ n'0' \ 000'

 The function stack structure before calling printf is as follows:

Go to the printf function and parse the control string of the first parameter output format. When % d is encountered, the function extracts an integer from the parameter of the previous pressure stack and obtains the value at ESP + 4, output in integer type, 0. This is the output result of running./test above, instead of forcing the 2.5 type to be converted into an integer to get 2!

References:

Http://zh.wikipedia.org/wiki/IEEE_754

Richard Blum, professional Assembly Language

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

C language floating point data type from the ieee754 Standard

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

C language floating point data type from the ieee754 Standard

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support