C language float, double, long double in memory storage mode

Source: Internet
Author: User

The two mechanism in the storage format is converted to floating-point numbers:

Floating-point variables occupy 4 bytes (4 byte) in computer memory, i.e., 32-bit, a floating-point number consists of 2 parts: base m and exponent e;

Base section: The actual value of this floating-point number is represented by a 2 binary

Exponent part: Occupy 8=bit space to represent, indicate the value range: 0-255; Later, we introduce the exponential part used in the storage science notation, and adopt the shift storage mode;

Specific analysis:

Floating-point data is stored in 4 bytes in the format of the following table:

Address+0 address+1 address+2 address+3 Contents

Seee eeee emmm MMMM MMMM MMMM MMMM MMMM
S part: Indicates that the floating-point number is positive or negative, 1 is negative, and 0 is positive. One can

Part E: The binary number of the value after the exponent plus 127 (why is the value after 127 added?) Since the index should be positive, the IEEE stipulates that the sub-square to be calculated here is the true exponent minus 127. So the float index can be from 126 to 128.)

M part: 24-bit base (the base part is actually a value that occupies 24-bit, because its highest bit is always 1, so the highest bit is omitted from storage, only 23-bit in storage. )

  Exception: The floating-point number is 0 o'clock, and the index and base are 0, but the previous formula is not true. Because 2 of 0 is 1, so 0 is a special case. This special case does not have to think of interference, the compiler will automatically identify.

Example: Look at 12.5 specific data stored in the computer: 0xC1 0x48 0x00 0x00

Binary: 11000001 01001000 00000000 00000000

Format: seee eeee emmm MMMM MMMM MMMM MMMM MMMM

Visible:

S: 1, is a negative number.

E: (8-bit) to 10000010 to 10 for 130,130-127=3, that is, the actual exponent portion is 3.

M: (23-bit) is 10010000000000000000000. The base is actually: 1.10010000000000000000000

Now we adjust the value of the base part m by the value of the exponent part E.

The adjustment method is: If the exponent e is negative, the decimal point of the base shifts to the left, and if the exponent e is positive, the base point shifts to the right. The number of digits moved by the decimal point is determined by the absolute value of index E.

Here, E is positive 3, use the right shift 3 to get it: 1100.10000000000000000000

Conversion process: The 1100 to the left of the decimal point is represented as (1x2^3) + (1x2^2) + (0x2^1) + (0x2^0), with a result of 12.

To the right of the decimal point. 100 ... represented as (1x2^-1) + (0x2^-2) + (0x2^-3) + ... with the result of. 5.

The two value above and 12.5, because S is 1, is used as a negative number, that is-12.5. Therefore, the 16 binary 0xc1480000 is a floating-point number-12.5.

Binary number of floating-point-to-store format:

Here's how to replace a float with a binary number in a computer storage format. For example, 17.625 is converted to float type.

1, into the binary: 10001.101

2, decimal point, shift left 4 bits, become 1.0001101

3, the base is: 1.0001101, the index is: 4+127=131, bits: 1000011

4, the sign bit is 0, because is positive;

5, Merger: 0 1000011 0001101 after the complement of 0, 32-bit;

6, turn 16 into: Convert to 16:0x41 8D 00 00

Floating-point numbers are converted into binary code form codes:

1#include <iostream>2 using namespacestd;3 4 #defineUchar unsigned char5 6 voidBinary_print (Uchar c)7 {8          for(inti =0; I <8; ++i)9         {Ten                 if((c << i) &0x80) Onecout <<'1'; A                 Elsecout <<'0'; -         } -cout <<' '; the } -  - intMain () - { +         floatA; -Uchar c_save[4]; + Uchar i; A         void*F; atF = &A; -  -cout<<"pls input a float num:"; -          for(i=4; i!=0; i--) -Binary_print (c_save[i-1]); -cout<<Endl; in  -         return 0; to}

The C standard stipulates that the float type must represent at least 6 valid digits, just like the first 6 digits after the decimal point of a number such as 33.333 333, so whyfloat can represent a 6-bit valid number?

The explanation is as follows: 9 in decimal, the representation in binary is 1001, which is to say: A decimal number in the binary is required to 4bit, so we now have 24bit precision in float, so float in decimal has 24/4=6, so in decimal, Float can be accurate to 6 digits after the decimal point;

What about double? In fact, the principle of float is the same, just double the number of bits longer;

        

Note that double type data operations are much slower than float type operations;

  

C language float, double, long double in memory storage mode

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.