Conversion of floating point precision

Source: Internet
Author: User
Tags rounds

In the x86/x64 system, because the x87 FPU hardware uses the extended double precision format, it is bound to encounter the interchange between the Single/double precision format and the double extended-precision format. convert to Extended double precision number

When converted from a single-precision or double-precision to an extended double-fine degree, the exponent part must be adjusted based on the biased code of the extended double-precision number. The exponent value of the extended double precision number is:

① from single-precision conversion: exponent–127 + 16383.

② conversion from Double precision: exponent–1023 + 16383.

The significand portion of the extended double precision number is ported by the significand portion of the single/double precision number.

In the case of single-precision number 1.11...x2120, the process of converting it to extended double precision is as follows.

The encoding value of the single-precision number 1.11...x2120 is 0X7BFFFFFF, its exponent value is 0xf7 (11110111B), and the significand part is all 1 values.

The exponent value of the extended double precision number is 0xf7-127 + 16383=0x4077 (0000 0111 0111B), and the single-precision 23-bit significand portion is moved directly to the extended double-precision bit62 to the BIT40 bit, and the low 40-bit 0.

The final extended double-precision coded value is 0x4077_ffffff00_00000000. For double-precision numbers: The significand portion of the 52-bit will be moved directly to the extended double-precision bit62 to bit11 bit. the extended double precision number is converted to a single precision number

It is much more complicated to convert from extended double to single/double precision, which involves the Precison (precision) Problem of the target format. The rounded (rounding) operation occurs when the value of the extended double significand part exceeds the precision of the destination format, which throws a precision exception.

To check if the significand exceeding the precision is a value of 0, as shown below.

Destination format

Out of precision Section

Note

Single-precision number

Bit 39~bit 0

is a value of 0

Double precision Number

Bit 10~bit 0

The rounded operation occurs when this part is not a value of 0.

Below, we describe the conversion of the extended double-precision number 1.11...x2120 to the single-precision format as an example. When 1.11...x2120 is an extended double-precision format, its encoded value is 0X4077_FFFFFFFF_FFFFFFFF.

The exponent portion of the destination format is calculated as follows.

① Single precision number: exponent-16383+127.

② Double precision number: exponent-16383+1023.

The conversion process is more complex, as shown below.

The shaded portion of the figure is the significand portion of the precision (bit 39~bit 0), which has a value of not 0 and requires a rounded operation, which is dependent on the rounded control bit in the x87 FPU.

IEEE754 defines the following 4 rounding modes.

①round to nearest mode: Rounds toward ±∞ (infinity value in positive and negative directions).

②round down mode: Positive numbers are rounded toward the maximum normal value, and negative numbers are rounded toward-∞.

③round up mode: Positive numbers are rounded toward +∞, and negative numbers are rounded toward the maximum normal value.

④round Zero mode: both positive and negative numbers are rounded toward the maximum normal value.

Rounding in the figure above is rounded in the direction of +∞, as shown in the figure: bit 39 has a value of 1, it rounds to bit 40, and the effect equals +1 value. The result value of the significand partial rounding in the destination format is 0.

The exponent portion of the destination format is the extended double Exponent-16383+127=0xf7 (11110111B), but the final exponent portion of the destination format is 0xf8 (plus 1 value) because the significand part is a carry value.

Therefore, the single-precision value of the final conversion is 0x7c000000, the resulting floating-point number is 1.0...x2121, and the result is greater than the original extended double-precision floating-point. to convert an extended double to a double-precision number

This is consistent with converting to single-precision numbers. In a double-precision format, its precision is 52 bits, so the accuracy is bit10 to bit0 bits.

The calculation of exponent is the exponent-16383+1023 of the extended double precision.

This article extracts from "X86x64 system exploration and programming"

Publishing Industry Publishing House

Deng Zhi

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.