Loss of precision in Java float type subtraction operation

Source: Internet
Author: User
Tags number sign

Package test1;

public class Test2 {

/**
* @param args
*/
public static void Main (string[] args) {
Float xx = 2.0f;
Float yy = 1.8f;
Float tt = XX-YY;
System.out.println ("tttttt-----" + TT);

}

}

Sure enough, the result is: TTTTTT-----0.20000005

Again testing several types of float subtraction, in addition to *.0 such a subtraction without objection, there is this problem, that is, float in subtraction when the accuracy is lost. Later on the internet to find a way to solve this problem, recorded here:

Package test1;

Import Java.math.BigDecimal;

public class Test2 {

/**
* @param args
*/
public static void Main (string[] args) {
Float xx = 2.2f;
Float yy = 2.0f;
Float tt = XX-YY;

BigDecimal B1 = new BigDecimal (float.tostring (xx));
BigDecimal b2 = new BigDecimal (float.tostring (yy));
float SS = b1.subtract (B2). Floatvalue ();
System.out.println ("ssss----" + ss);
System.out.println ("tttttt-----" + TT);
}
}
The output is:

SSSS----0.2
TTTTTT-----0.20000005

In this contrast, the difference is obvious.

Solve the problem and then find out why this difference occurs:

There is an article on the internet written in detail, titled "Analysis of the float type of memory storage and loss of precision," the full text of the following:

Questions raised: 12.0f-11.9f=0.10000038, "Lose The Endless" why?

Now let's analyze in detail why a floating point operation causes loss of precision.

1, Decimal binary representation problem

First of all, we need to figure out the following two questions:

(1) How decimal integers are converted to binary numbers

The algorithm is simple. For example, 11 is represented as a binary number:

11/2=5 Yu 1

5/2=2 Yu 1

2/2=1 Yu 0

1/2=0 Yu 1

0 End 112 binary represented as (from bottom to top): 1011

Here is a point: as long as the result of the meeting except for the end of 0, we think, all the integers divided by 2 is not sure to be able to finally get 0. In other words, will all integers be converted into binary numbers without an infinite loop? Absolutely not, integers can always be expressed in binary precision, but decimals are not necessarily.

(2) How decimal decimals are converted into binary numbers

The algorithm is multiplied by 2 until there are no decimals. For example, 0.9 is represented as a binary number

0.9*2=1.8 Take integer Part 1

0.8 (fractional part of 1.8) *2=1.6 take integer portion 1

0.6*2=1.2 Take integer Part 1

0.2*2=0.4 take integer part 0

0.4*2=0.8 take integer part 0

0.8*2=1.6 Take integer Part 1

0.6*2=1.2 take integer part 0

......... 0.9 binary is represented as (from top down): 1100100100100 ...

Note: The above calculation process loops, that is, * * can never eliminate the fractional part, so that the algorithm will be indefinitely. It is clear that the binary representation of decimals is sometimes impossible to be precise. In fact, the reason is very simple, decimal system can be accurately expressed in 1/3? The same binary system cannot accurately represent 1/10. This explains why floating-point subtraction has a "lost" precision loss problem.

2. Float type storage in memory

As we all know, the Java float type occupies 4 bytes in memory. The 32 bits structures of float are as follows

Float Memory storage structure

4bytes--------0

Represents the real number sign bit exponential sign bit digit digit significant digit

Where the sign bit 1 indicates positive, 0 means negative. The significant digits are 24 bits, and one of them is the real number sign bit.

The steps to convert a float to a memory storage format are:

(1) The absolute value of this real number is converted into a binary format, noting that the binary method of the integer and fractional parts of the real number has been explored above.
(2) Move the decimal point of the binary format real number to the left or right by n bits until the decimal point moves to the right of the first valid digit.
(3) The first digit to the right of the decimal point begins with a number of 23 digits placed in the 22nd to No. 0 place.
(4) If the real number is positive, put "0" in the 31st place, otherwise put "1".
(5) If n is left-shifted, the exponent is positive and the 30th position is placed in "1". If n is right-shifted or n=0, the 30th bit is placed in "0".
(6) If n is left-shifted, then n minus 1 is converted to binary, and "0" is added to the left to complement seven bits, placed 29th to 23rd digits. If n is right-shifted or n=0, then n is added to the left with "0" to complement the seven-bit, then you seek the reverse, and then put the 29th to 23rd place.

Example: 11.9 Memory storage format

(1) It is approximately "1011" after the 11.9 is converted into binary. 1110011001100110011001100 ... ".

(2) Move the decimal point to the left three bits to the right of the first significant bit: "1. 011 11100110011001100110 ". Ensure that the effective number of digits is 24 bits, and the right side of the excess intercept ( error is generated here ).

(3) This already has 24 valid figures, the leftmost one "1" is removed, get "011 11100110011001100110" a total of 23bit. Place it in the 22nd to No. 0 position of the float storage structure.

(4) Since 11.9 is a positive number, put "0" in the 31st bit of the real sign bit.

(5) Since we shifted the decimal point to the left, we put "1" in the 30th digit exponent sign bit.

(6) Because we are moving the decimal point to the left 3 bits, so 3 minus 1 to 2, to the binary, and the top 7 bits to get 0000010, put in 29th to 23rd place.

The last indication is 11.9:0 1 0000010 011 11100110011001100110

One more example: 0.2356 of memory storage formats
(1) 0.2356 is converted to binary after about 0.00111100010100000100100000.
(2) Move the decimal point to the right by three bits to get 1.11100010100000100100000.
(3) 23 significant digits from the right of the decimal point, i.e. 11100010100000100100000
Into the 22nd to No. 0 place.
(4) Since 0.2356 is positive, put "0" in the 31st place.
(5) Since we shifted the decimal point to the right, we put "0" in the 30th place.
(6) Because the decimal point is shifted to the right 3 bits, so 3 to the binary, on the left side of the "0" to complement seven
Bit, get 0000011, you take the counter, get 1111100, put in 29th to 23rd place.

The last indication is 0.2356:0 0 1111100 11100010100000100100000

To convert the float binary format of a memory store into a decimal step:
(1) write the binary number from 22nd to No. 0, and fill a "1" on the leftmost side to get 24 valid digits. Place the decimal point to the right of the "1" on the far left.
(2) Remove the value n represented by the 29th to 23rd bits. When the 30-bit is "0" the n will be reversed. Increase n by 1 when 30 bits is "1".
(3) Move the decimal point to the left N bit (when 30 bits are "0") or right shift n bits (when 30 bits are "1"), get a binary representation of the real number.
(4) The binary number is a decimal, and according to the 31st bit is "0" or "1" plus a positive or negative sign.

3. Subtraction of floating-point type

The process of floating-point subtraction is more complicated than fixed-point operation. The process of completing the floating-point subtraction operation is broadly divided into four steps:
(1) 0 The check of the operation number;

If you determine that two floating-point numbers that need to be reduced have a number of 0, you will be able to know the results of the operation without having to order some of the column operations.

(2) Compare the size of the order (digit) and complete the order;

To add and subtract two floating-point numbers, the first thing to do is to see if the digits of the two digits are the same, that is, whether the decimal position is aligned. If the two digits are the same, indicating that the decimal point is aligned, you can perform the addition and subtraction of the mantissa. Conversely, if the two-order code is different, indicating that the decimal place is not aligned, at this point must be two number of the same order, this process is called the order .

How to order (assuming that the exponent of two floating-point numbers is ex and Ey ):

Change ex or ey by shifting the mantissa to make it equal. Since the number of floating-point representation is normalized, the left shift of the mantissa causes the highest bit loss, resulting in a large error, while the mantissa right shift causes the loss of the least significant bit, but the error is small, so the order operation rules make the mantissa right, and the mantissa right moves to increase the order code accordingly, and its value remains unchanged. Obviously, an increased order is equal to the other, and the added order code must be a small order. Therefore, in the order, always make the small order to the large order , that is, the small order of the mantissa shifted to the right (equivalent to the left of the decimal point), each right to move one bit, its order plus 1, until the two number of the order of equal, the right to move the number of bits equal to the order E.
(3) The mantissa (the effective digit) carries on the addition or subtraction operation;

After the completion of the order, it is possible to sum the digits effectively. Both the addition and subtraction operations are performed by addition, and the method is exactly the same as the fixed-point addition and subtraction operation.
(4) The result is normalized and rounded.

Slightly

4. Calculation 12.0f-11.9f

The 12.0f Memory storage format is: 0 1 0000010 10000000000000000000000

The 11.9f Memory storage format is: 0 1 0000010 011 11100110011001100110

It can be seen that the digits of the two numbers are identical, as long as the effective digits are subtracted.

12.0f-11.9f results: 0 1 0000010 00000011001100110011010

Restore the result to decimal: 0.000 11001100110011010= 0.10000038

Analysis:

(6) Because we are moving the decimal point to the left 3 bits, so 3 minus 1 to 2, to the binary, and the top 7 bits to get 0000010, put in 29th to 23rd place.

The last indication is 11.9:0 1 0000010 011 11100110011001100110

It says this because we're moving the decimal point to the left 3 bits, so 3 minus 1 is 2.

I don't know why I have to lose 1??? Through a variety of Google and degrees Niang acquired knowledge:

The first floating-point number is the

In the IEEE754 standard, the 32 bits of float are stated as follows:
Sign Bit (S) 1 Order Code (E) 8 Mantissa (M) 23
and the order code is the shift code representation, the shift code is the complement of the first to take the reverse can be obtained. In positive numbers, the decimal point is preceded by the first digit 0, the second bit 1, the third digit 2 .... For example, 100,005-bit 1,10000 can represent 1x2^3, that is, I mean 11.9 of the code is 10000010 because it's binary is about "1011." .1110011001100110011001100 ... ", while moving to the left three bits, the first move is 0, the second bit is 1, the third is 2, so 3 minus 1. This also shows that the 0.2356 to the binary is about 0.00111100010100000100100000 why the code is shifted to 0 1111100, because the first bit after the decimal point is 2^-1, so the right shift is starting from-1.

Loss of precision in Java float type subtraction operation

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.