The format of floating point in memory (understanding the conversion from floating point to fixed point)

Last Update:2018-12-05 Source: Internet

Author: User

Tags decimal to binary skia

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Skfixedtofloat or the opposite operation is often encountered in skia. It is easy to convert a single-precision floating point to a fixed number. Simply multiply 65536 and then force the next type. This is what float2fixed does. I forgot where I saw it. It is said that skia is a speed issue.

Float and double types are used to store data of the floating point type. float data occupies 32 bits and double data occupies 64 bits, how do we allocate memory when declaring a variable float F = 2.25f? In fact, whether float or double type, the storage methods in computer memory comply with IEEE specifications, and float complies with IEEE
R32.24, while double follows r64.53.

Whether single precision or double precision, there are three parts in the memory storage:

1) Sign (sign): 0 indicates positive, and 1 indicates negative;

2) exponent: used to store index data in scientific notation and stored in shift mode;

3) mantissa: mantissa;

Shows the storage method of float:

The dual-precision storage method is:

R32.24 and r64.53 both use scientific notation to store data. For example, 8.25 is represented as 8.25 * in decimal notation, and 120.5 is represented as 1.205 *. While our silly computer doesn't know the decimal data at all, it only knows the 0 and 1, so in the computer memory, we must first change the above number to the binary scientific notation, 8.25 can be expressed as 1000.01 in binary format, and 120.5 in binary format as 1110110.1. The binary scientific notation 1000.01 can be expressed as 1.00001 *, 1110110.1 can be expressed as 1.1101101 *, and the scientific notation of any number is expressed as 1. xxx *,
The ending part can be expressed as XXXX, and the first part is 1. Why do we need to express it? The first digit of the decimal point can be omitted. Therefore, the accuracy of the 23bit ending part can be changed to 24bit. The truth is that the 24bit can be precise to the last digit of the decimal point, we know that the binary value of 9 is 1001, SO 4 bits can be precise to one decimal point in decimal, and 24bits can make float accurate to 6 digits after the decimal point, and for the exponent part, because the index can be positive and negative, the index range represented by the eight-digit index position should be-127-128,
Therefore, index storage adopts shift storage, and the stored data is metadata + 127.

Let's take a look at the real storage methods of 8.25 and 120.5 in the memory:

First, let's take a look at 8.25, which is represented by the binary scientific Notation: 1.0001 *

According to the above storage method, the symbol bit is 0, indicating positive. The index bit is 3 + 127 = 130, and the number of digits is 1.00001. Therefore, the storage method of 8.25 is as follows:

0xbffff380: 01000001000001000000000000000000

Decomposition: 0--10000010--0000000000000000000000

The symbol bit is 0, the index part is 10000010, And the digit part is 00001000000000000000000.

Similarly, the storage format of 120.5 in memory is as follows:

0xbffff384: 01000010111100010000000000000000

Decomposition: 0--0000101--11100000000000000000000000

How do you know the decimal value of a piece of data in the memory that is stored in a single precision? It is actually the reverse push process, for example, the following memory data is given:

01000001001000100000000000000000

Step 1: The sign bit is 0, indicating a positive number;

Step 2: Convert the index bit to 10000010 in decimal format, so the index is 130-130 = 3;

Step 3: Convert the ending number to 01000100000000000000000 in decimal format (1 + 1/4 + 1/64 );

So the corresponding decimal value is: 2 ^ 3*(1 + 1/4 + 1/64) = 8 + 2 + 1/8 = 10.125

Let's look at another example and observe its output:

C language: temporary self-use code 01 int main (void)
02 {
03 float F1 = 2.2;
04 float F2 = 2.25;
05
06 double d1 = (double) F1;
07 double D2 = (double) F2;
08
09 printf ("d1
= %. 13f, D2 = %. 13f \ n ", D1, D2 );
10
11 return 0;
12}

[Doyle @ phuang algorithm] $./A. Out

D1 = 2.2000000476837, D2 = 2.2500000000000

The output may be confusing. After 2.2 of the single precision is converted to double precision, it is accurate to the first 13 digits after the decimal point to 2.2000000476837. After 2.25 of the single precision is converted to double precision, it is changed to 2.2500000000000, why is the value 2.2 After conversion changed, but 2.25 is not changed? Strange, right? In fact, we can find the answer through the introduction of the two storage results above. First, let's take a look at the 2.25 Single-precision storage method: 0
10000000 00100000000000000000000, and 2.25 dual precision: 0 100 0000 0001 0010 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 2.25, so that during forced conversion, the value does not change. Let's look at the single-precision and double-precision memory representation of 2.2. The scientific notation of 2.2 should be: Convert decimal to binary decimal to * 2 decimal places, and take the integer part, so 0.282 = 0.4, so the first digit of the binary decimal is the integer part of 0.4, 0, 0.4 × 2 = 0.8, and the second digit is
0, 0.8*2 = 1.6, the third digit is 1, 0.6 × 2 = 1.2, the fourth digit is 1, 0.2*2 = 0.4, and the fifth digit is 0, in this way, it is never possible to multiply to = 1.0. The obtained binary is an infinite loop of 00110011001100110011 ..., for single-precision data, the ending number can only represent the precision of 24 bit, so 2.2 of Float storage is: 0 10000000 00011001100110011001101

However, in this storage mode, the value converted to decimal is not 2.2. It may be inaccurate when the decimal is converted to binary, such as 2.2, the double type data also has the same problem, so there will be some errors in the floating point representation. When the single precision is converted to the double precision, there will also be errors, for decimal data that can be expressed in binary, such
2.25, this error will not exist, so the above strange output results will appear.

The storage of floating point numbers in the memory is simulated and approached by a negative power of 2. If the fractional part of the floating point number can be perfectly represented in binary, if the floating point is converted to binary storage, there will be no loss of precision. Otherwise, this method in the memory that represents the floating point will cause the loss of the floating point precision, as shown in the above 2.2;

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More