Conversion of floating point precision

Last Update:2018-07-26 Source: Internet

Author: User

Tags rounds

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In the x86/x64 system, because the x87 FPU hardware uses the extended double precision format, it is bound to encounter the interchange between the Single/double precision format and the double extended-precision format. convert to Extended double precision number

When converted from a single-precision or double-precision to an extended double-fine degree, the exponent part must be adjusted based on the biased code of the extended double-precision number. The exponent value of the extended double precision number is:

① from single-precision conversion: exponent–127 + 16383.

② conversion from Double precision: exponent–1023 + 16383.

The significand portion of the extended double precision number is ported by the significand portion of the single/double precision number.

In the case of single-precision number 1.11...x2120, the process of converting it to extended double precision is as follows.

The encoding value of the single-precision number 1.11...x2120 is 0X7BFFFFFF, its exponent value is 0xf7 (11110111B), and the significand part is all 1 values.

The exponent value of the extended double precision number is 0xf7-127 + 16383=0x4077 (0000 0111 0111B), and the single-precision 23-bit significand portion is moved directly to the extended double-precision bit62 to the BIT40 bit, and the low 40-bit 0.

The final extended double-precision coded value is 0x4077_ffffff00_00000000. For double-precision numbers: The significand portion of the 52-bit will be moved directly to the extended double-precision bit62 to bit11 bit. the extended double precision number is converted to a single precision number

It is much more complicated to convert from extended double to single/double precision, which involves the Precison (precision) Problem of the target format. The rounded (rounding) operation occurs when the value of the extended double significand part exceeds the precision of the destination format, which throws a precision exception.

To check if the significand exceeding the precision is a value of 0, as shown below.

Destination format	Out of precision Section	Note
Single-precision number	Bit 39~bit 0	is a value of 0
Double precision Number	Bit 10~bit 0

The rounded operation occurs when this part is not a value of 0.

Below, we describe the conversion of the extended double-precision number 1.11...x2120 to the single-precision format as an example. When 1.11...x2120 is an extended double-precision format, its encoded value is 0X4077_FFFFFFFF_FFFFFFFF.

The exponent portion of the destination format is calculated as follows.

① Single precision number: exponent-16383+127.

② Double precision number: exponent-16383+1023.

The conversion process is more complex, as shown below.

The shaded portion of the figure is the significand portion of the precision (bit 39~bit 0), which has a value of not 0 and requires a rounded operation, which is dependent on the rounded control bit in the x87 FPU.

IEEE754 defines the following 4 rounding modes.

①round to nearest mode: Rounds toward ±∞ (infinity value in positive and negative directions).

②round down mode: Positive numbers are rounded toward the maximum normal value, and negative numbers are rounded toward-∞.

③round up mode: Positive numbers are rounded toward +∞, and negative numbers are rounded toward the maximum normal value.

④round Zero mode: both positive and negative numbers are rounded toward the maximum normal value.

Rounding in the figure above is rounded in the direction of +∞, as shown in the figure: bit 39 has a value of 1, it rounds to bit 40, and the effect equals +1 value. The result value of the significand partial rounding in the destination format is 0.

The exponent portion of the destination format is the extended double Exponent-16383+127=0xf7 (11110111B), but the final exponent portion of the destination format is 0xf8 (plus 1 value) because the significand part is a carry value.

Therefore, the single-precision value of the final conversion is 0x7c000000, the resulting floating-point number is 1.0...x2121, and the result is greater than the original extended double-precision floating-point. to convert an extended double to a double-precision number

This is consistent with converting to single-precision numbers. In a double-precision format, its precision is 52 bits, so the accuracy is bit10 to bit0 bits.

The calculation of exponent is the exponent-16383+1023 of the extended double precision.

This article extracts from "X86x64 system exploration and programming"

Publishing Industry Publishing House

Deng Zhi

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More