The precision problem of float and double in Java

Source: Internet
Author: User

This article explains why the range of float is larger than int (same as 4 bytes), but some int is not correctly expressed by float (loss of precision)


The precision problem of float and double in Java

1. Background Knowledge
In Java there is no detail, just said float accounted for 32 bits (bit), double accounted for 64 bits.
For a computer, it is appropriate to use a number of digits. But someone likes to use byte notation. A byte occupies 8 bits.
1 byte = 8 bit.
So float accounts for 4 bytes, and double accounts for 8 bytes.
But I still like to use the number of digits. This is more straightforward and easier to understand how computers store these types.

For accuracy and scope, refer to C + +.


2. Storage Knowledge
Calculator How to store floating-point numbers: (stored in scientific notation)
The number of bytes to be saved is first converted to a decimal (0.xxxxxx) x10 in the form of N power.
For example:
3.1415 will be converted to: 0.31415 x 10^1
100000 will be converted to: 0.1 x 10^6

First, make a point, the first range (valid digits, including integer digits and decimal digits), and then precision.


3, the following cut to the chase
=====================
Problems with single-precision float type and double-precision type in C + +

The single precision is represented by a float, represented by a 4-bit byte (32 bit) in the computer, with a 7-bit valid number ""

Float type storage 1 bit is the sign bit, 8 bit is the digit, the remaining 23 bit is a valid digit bit.
2 of the 23 is 8388608, or 7 digits, and the accuracy (10 binary).

A single-precision floating-point number occupies 32 bit bits in memory, according to the standard of floating-point number, the highest bit represents the symbol, and this 32-bit part is used to denote the order code, part of which is used to represent the fractional part.
After this representation is converted to 10, the maximum precision it can represent is 7 digits.

Like what
The float a=3.14159;a in memory is actually represented as 0.314159 times 10 of the 1 (0 is the sign bit), and the storage unit allocated to a is divided into two parts, a portion of 0.314159, a portion of the index 1, and is also converted to 2 binary to save.

==================
Float, 1-bit sign bit, 8-bit exponential bit, 23-bit tail digit
double,1 bit sign bit, 11-bit exponential bit, 52-bit tail digit

float tail digit 23 bit, 2^23=8.3e6,7 bit, so different compiler rules different, some are 7 bit, some 8 bit
Double mantissa 52 bit, 2^52=4.5e15,15 bit, so the number of significant digits of double is 15 bits


Postscript:
Count the number of significant digits (integer digits + decimal digits), within 7 bits of the float,15 bit within the double
But there's a little bit of a difference:
float f = (float) 62345678.912345; = 6.234568E7 Total 7 bit
float F2 = (float) 12345678.912345; = 1.2345679E7 Total 8 bit

(accuracy problem, float precision is 7--8 bit, 8 bit case is the first bit is 1, when is 2 o'clock carry after the precision is lost?)

The precision problem of float and double in Java

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.