The IEEE binary floating-point Arithmetic standard (IEEE 754) is the most widely used floating-point calculation standard since 1980s. It is used by many CPUs and floating-point controllers. This standard defines the format of floating point numbers (including negative zero-0) and the abnormal value (denormal number), some special values (infinity (INF) and non-Numeric (NAN )), and the "floating-point operator" of these values. It also specifies four numerical rounding rules and five exceptional conditions (including the timing and processing method of exceptions ).
IEEE 754 provides four methods to represent floating point values: single precision (32 bits), dual precision (64 bits), extended single precision (more than 43 bits, rarely used) and extended dual precision (more than 79 bits, usually 80 bits ). Only 32-Bit mode has mandatory requirements, and others are optional. Most programming languages provide IEEE floating point format and arithmetic, but some columns are not required. For example, the C language originally existed before IEEE 754, which now includes IEEE arithmetic, but is not mandatory (float in C language usually refers to IEEE single precision, while double refers to dual precision ).
This standard is fully known as the IEEE binary floating point Number Arithmetic standard (ANSI/IEEE Std 754-1985), also known as IEC 60559: 1989, the binary floating point arithmetic of the microprocessor system (originally numbered IEC 559: 1989) [1]. Later, there was also the IEEE 854-1987 standard for floating-point numbers unrelated to the base, where the base number was set to 2 and 10. The latest standard is "IEEE 854-2008 ".
In the and S, various computer models of various computer companies had very different floating point numbers, but there was no general standard in the industry. This causes great inconvenience to data exchange and computer collaboration. The IEEE floating point number professional team began to develop floating point number standards in the end of 1970s. In 1980, Intel launched a single-chip 8087 floating point coprocessor. Its floating point representation and definition operations are reasonable and advanced. It was adopted by IEEE as a floating point standard, released on July 15, 1985. Prior to this, the content of this standard had been widely used by computer companies in the early 1980s S and became a de facto industrial standard.
Three fields of the IEEE 754 floating point number
- Single-precision binary decimal places, which are stored in 32 bits.
1 |
8 |
23-bit long |
S |
Exp |
Fraction |
31 |
30 to 23 Positive Value (actual index size + 127) |
22 to 0 digits (0 from the right) |
S indicates the symbol bit, exp indicates the number, and fraction indicates the valid number. The index section uses the so-calledPositive ValueThe positive value is the sum of the actual index size and a fixed value (127 in 32 bits. The purpose of this representation is to simplify the comparison. Because the index value may be positive or negative. If it is expressed by a complement code, all the symbol bits s and exp will not be able to compare the size simply. Because of this, the exponential part is usually stored with an unsigned positive value. The index of a single precision is − 126 ~ + 127 plus the offset value 127, which indicates that the value ranges from 1 ~ 254 (0 and 255 are special values ). When a floating point decimal number is calculated, the exponent value minus the positive value is the actual exponential size.
- Dual-precision binary decimal places, which are stored in 64 bits.
1 |
11 |
52-bit long |
S |
Exp |
Fraction |
63 |
62 to 52 Positive Value (actual index size + 1023) |
51 to 0 digits (0 from the right) |
S indicates the symbol bit, exp indicates the number, and fraction indicates the valid number. The exponent part is expressed in the form of a positive value. The positive value is the sum of the actual index size and a fixed value (1023 in 64-bit situations. The purpose of this representation is to simplify the comparison. Because the index value may be positive or negative. If it is expressed by a complement code, all the symbol bits s and exp will not be able to compare the size simply. Because of this, the exponential part is usually stored with an unsigned positive value. The double-precision index is − 1022 ~ + 1023 plus 1023, the value ranges from 1 ~ 2046 (0 (2 carry all 0) and 2047 (2 carry all 1) are special values ). When a floating point decimal number is calculated, the exponent value minus the positive value is the actual exponential size.
- Binary and decimal conversion of IEEE floating point using C language
1 # include "stdafx. H "2 # include" stdio. H "3 # include" time. H "4 # include <iostream> 5 # include <bitset> 6 7 using namespace STD; 8 9 int _ tmain (INT argc, _ tchar * argv []) 10 {11 float a = 1.25; 12 float B; 13 _ ulonglong nmem = * (_ ulonglong *) &; // obtain the value of 14 bitset <32> bita (nmem) of a stored in the memory. // The value of 15 cout is expressed in the 32-bit floating point number format. <"The Conversion Result is binary: "<Endl; 16 cout <bita <Endl; 17 18 B = * (float *) & bita; // obtain the input value stored in the memory and express 19 cout <"converted to decimal:" <Endl; 20 cout <B <Endl; 21}