How to understand the requirements of the IEEE 754 standard for float values and double values in Java

Source: Internet
Author: User

In the Java language, we can use the two basic data types of float and double to represent specific data.

These two data types are essentially floating-point numbers (floating-point number), which is a numerical representation of the approximate value of a real number, represented by a valid digit plus a power.

Floating-point numbers are used because the computer cannot accurately convert all decimal decimals to binary in the process of using binary operations, and can only be represented by approximate values.

There are many ways to use floating-point numbers for numeric values, and in Java, like the C language, both float and double use the most widely used IEEE 754 standard.

The IEEE 754 standard is all known as the IEEE binary floating-point arithmetic standard (ansi/ieee Std 754-1985), where ANSI is the American National Standards Institute (American Nation standards Institute), IEEE is an abbreviation for the Institute of Electrical and Electronics Engineers (Institute of Electrical and Electronics Engineers), 754 is the number of the standard, and 1985 is the year the standard was released.

IEEE 754 stipulates that a binary floating-point number consists of three parts in storage:

The first part is the symbol bit (sign bit), the second part is the exponential offset value (exponent bias) for storing the exponential portion of the floating-point number, and the third part is the score (fraction) The decimal part of the valid number used to store floating-point numbers.

For example, in a specific float value, the float value in Java is in the single-precision standard in IEEE754, which is stored using 32 bits. For example:

0100 0100 1010 0110 1001 1110 0000 0000

which

0100 0100 1010 0110 1001 1110 0111 0100

The first digit 0 is the sign bit , it indicates that the positive and negative value of this number is positive, corresponding, if the first bit is 1 indicates this is a negative;

00100 1010 0110 1001 1110 0000 0000

The 2nd to 9th 8 bits store an exponential offset value that records the exponential portion of the float value, with a value range of -126~+127 plus an offset value of 127 (2 of 7 minus 1), and the result is 1~254 (0 and 255 are reserved for special values). In this example, 1000 1001 has a value of 137 in decimal, but 137 is the result of the offset, minus 127 of the offset value, which actually represents the exponential value of 10.

0100 0100 1010 0110 1001 1110 0000 0000

The 23 bits of the 10th to 32nd store the fractional /fractional portion of the float value. 010 0110 1001 1110 0000 0000 It actually represents the binary 0.010 0110 1001 1110 0000 0000 plus 1, which is 1.010 0110 1001 111. This is actually the result of the binary 1010 0110 1001 111 Right Shift 14 bits. 1010 0110 1001 111 The value in decimal is 21327, so 1.010 0110 1001 111 is equal to 21327*2^ (-14).

In the case of:

0100 0100 1010 0110 1001 1110 0000 0000

It is actually equal to:

+ 21327*2^ (-14) * 2^10 = 1332.9375

The first sign bit indicates that the 10th to 32nd decimal point represents the 2nd to 9th exponential bit

Conversely, for example, if we want to know the form of a float a = 1000 in memory, then you need to do the following:

First, 1000 is an integer, so the first bit stores 0.

The decimal 100 is 11 1110 1000 in the binary, which is actually the result of 1.111101 left shift 9 bits,
So, actually, the result of the digital storage should be 9 + 127 = 136 that is 1000 1000

The fractional bits are stored in 1.111101 minus 1, 23 bits, 111 1010 0000 0000 0000 0000

Fully integrated float a = 1000 in-memory storage as:

0100 0100 0111 1010 0000 0000 0000 0000

Also, as mentioned earlier, the exponential offset value of 0 or 255 represents a special value:

When the exponent value is 0, that is 0000 0000, if the decimal part is divided into 0, then the number is ±0 (plus or minus by the sign bit);

When the exponent value is 255, which is 1111 1111, if the fractional part is zero, then the number is ±∞ (plus or minus is determined by the sign bit);

When the exponent value is 255, or 1111 1111, if the fractional part is nonzero, the number is represented as not a number (not a digit).

The above is the part of the float, double the same, but double with the IEEE 754 double precision standard, using 64 bits of storage, where the 2~12 11 bits is the digit, the 13~64 52 bits is the decimal place.

References:

IEEE 754-wiki
https://zh.wikipedia.org/zh-cn/IEEE_754

Floating-point number-wiki
Https://zh.wikipedia.org/zh-cn/%E6%B5%AE%E7%82%B9%E6%95%B0

How to understand the requirements of the IEEE 754 standard for float values and double values in Java

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.