IEEE 754 floating-point numbers represent standard

Source: Internet
Author: User

The scientific counting method of binary number

Floating-point numbers used in C + + include floating-point number representations that are based on IEEE standards. We know that in mathematics, any decimal number can be written in the form of a 10-based scientific notation, as follows

It is obvious, because if a is greater than 10 or smaller than 1, it can be written again in the form of a 10 exponent, as

However, in order to write numbers in the binary world in the form of a 10-based scientific notation, it is a bit of a hassle, because you first need to first convert the binary number into a 10-binary representation before you can write the form of scientific notation. But if we tweak the notation of scientific notation a little bit, the problem becomes particularly simple. The reason why the scientific notation used in mathematics is to use 10 as the base is because we typically use a decimal notation. The number we use in the world of computers is binary, so in this world we should switch to the scientific notation based on 2, instead of the 10 base. The scientific notation we use at this point is expressed in the following form,

For a binary number is self-evident, for example

Storage policies under the IEEE 754 standard

Floating-point storage under the IEEE standard consists of three basic components: The sign bit, the exponent, the mantissa (the symbol, the exponent, the mantissa), and the mantissa is composed of a fractional part and an implied leading digit. The reason for the implication of the preamble is simple (explained below).

The table below shows the hierarchy of computer storage single precision and double precision floating point numbers, including bits per part (bit ranges are enclosed in square brackets, and 00 is the lowest)

Floating point Components
Sign Exponent Fraction
Single Precision 1 [31] 8 [30-23] 23 [22-00]
Double Precision 1 [63] 11 [62-52] 52 [51-00]
    • Sign bit

The sign bit is very simple and is located at the highest bit of the storage floating-point number and occupies only 1 bits. 0 indicates a positive number and 1 indicates a negative number. By changing the value of the bit, you can change the symbol for that floating-point number.

    • decimal point

Because exponential bits need to be able to represent both positive and negative exponents, in order to be able to do this, you need to add an offset value to the real exponential value to get the exponential values to store. For single-precision floating-point numbers under IEEE standards, this offset value is 127. So when the real index is 0, we store an exponential bit of 127. If the stored exponential value is 200, then the real exponential value should be (200-127), or 73. The following reasons indicate that the exponent is 127 (the digits are all 0) and +128 (the digits are all 1) are used to store special values.

For double-precision floating-point numbers, the digit length bit is 11 bits, and the offset bit is 1023.

    • Tail

The mantissa is also known as the effective digit (significand), which determines the accuracy of the floating-point number. It consists of the implied leading digits (the part to the left of the decimal point) and the fractional part (the part to the right of the decimal point), because we use the scientific notation of 2 as the binary number, then the left part of the decimal point is naturally a fixed value of 1 (), so the leading digits we do not need to express explicitly, We just need to store the decimal part of the mantissa.

Floating Point Storage example

The following is the storage strategy for floating-point numbers with single-precision floating-point numbers.

The decimal number 0.1562510 is written as a binary in the form of 0.001012. The decimal point is shifted 3 bits to the right by multiplying the index by 2.

At this point we are able to determine the fractional part of its mantissa and the number of indices respectively, and the decimal portion of the mantissa. 012, the exponent is-3. For specific storage methods, see

Under the IEEE 745 standard, we use three parts to represent a floating-point number:

    • Sign = 0, because the floating-point number is positive (1 for negative numbers);
    • The real index is-3, but the index we use to store it is offset by the real exponent. In a single-precision floating-point number, this offset is 127, which is 1023 in a double-precision floating-point number, so the exponent we use here should be ( -3+127), or 124.
Range of floating-point numbers

Let's consider the range of single-precision floating-point numbers first. Notice that we used to store a double-precision floating-point number with a length of 32bits of memory, and we re-explained the storage rules for the fast in-memory numbers, which greatly increased the range of representations. But let's see what's wrong with that.

For a 32bits unsigned integer, it can represent any integer in the 0~232-1 range. However, single-precision floating-point numbers cannot do this because the length of the floating-point storage strategy used to store the mantissa is only 24bits, when a single-precision floating-point number truncates a portion of the bottom bit, for example

11110000 11001100 10101010 10101111  // 32-bit Integer= +1.111000011001100101010112^31     // single-precision Float=   11110000110011001010101100000000   //  corresponding Value

Such a method can approximate a value of 32bits, but does not give accurate results. Ignoring the problem of accuracy, floating-point numbers can represent a range of 2127, while a 32bits integer represents a range of 232.

Special values
    • Zero

According to the above floating-point representation method, we find that we cannot indicate the size of the value 0. Because we think that the value of the preamble is always 0, the value of the floating-point number will not be 0, regardless of the fractional and exponential portions of the mantissa. For this we stipulate that when the exponent is all 0 and the decimal place of the mantissa is all 0 o'clock, this time the value of the floating-point number is 0. Note that +0 and-0 o'clock two different floating-point numbers, even if their values are the same, but the floating-point numbers are not represented the same way.

    • Non-normalized values

When the exponential portion is all 0, but when the decimal point is not all 0, the value represented by the float is a non-standard value. At this point we think that the leading digit of the floating-point number is 0, so the single-precision floating-point size at this time is (−1)s x0. F x2−126, the size of the double-precision floating-point number is (−1)s x0. F x2−1022, where S is the number on the sign bit, and the2-base exponent is-126 and-1022, not 127 and -1023, respectively. The exact reason is simple, because the minimum value that normalization can represent is (−1)s x1x2−126 and (−1)s x1x2−1022. The goal of non-standardization is to represent smaller values and improve accuracy.

    • Infinity

When the digits are all 1, and the decimal portion of the mantissa is all 0, the +∞ and −∞ are represented, and the +∞ and −∞ are distinguished by the sign bit. Therefore, the use of the IEEE 754 standard indicates that floating-point numbers can handle infinitely large cases very well.

    • Non-numeric (NaN)

Nan (not a number) is used to represent a value that is not numeric, which represents a Nan value when the digits are all 1 and the decimal portion of the mantissa is not 0. There are two classes of NaN values, static non-numbers (Qnan, Quiet nan), and warning non-numbers (Snan, signalling Nan).

In the value of a Nan, the Qnan is represented if the first bit of the mantissa fraction is placed. Qnan is a very important class of non-numbers, arithmetic often pass qnan values, which usually represent the results of an operation that is not mathematically defined, such as when the divisor is zero.

In the value of a Nan, the Snan is represented if the first digit of the mantissa fraction is set to 0. It is not used to represent an exception in an operation that can be used to represent a premature use of a variable that is not initialized.

Reference:

[1] http://steve.hollasch.net/cgindex/coding/ieeefloat.html

[2] https://en.wikipedia.org/wiki/IEEE_754-1985

IEEE 754 floating-point numbers represent standard

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.