The two mechanism in the storage format is converted to floating-point numbers:
Floating-point variables occupy 4 bytes (4 byte) in computer memory, i.e., 32-bit, a floating-point number consists of 2 parts: base m and exponent e;
Base section: The actual value of this floating-point number is represented by a 2 binary
Exponent part: Occupy 8=bit space to represent, indicate the value range: 0-255; Later, we introduce the exponential part used in the storage science notation, and adopt the shift storage mode;
Specific analysis:
Floating-point data is stored in 4 bytes in the format of the following table:
Address+0 address+1 address+2 address+3 Contents
Seee eeee emmm MMMM MMMM MMMM MMMM MMMM
S part: Indicates that the floating-point number is positive or negative, 1 is negative, and 0 is positive. One can
Part E: The binary number of the value after the exponent plus 127 (why is the value after 127 added?) Since the index should be positive, the IEEE stipulates that the sub-square to be calculated here is the true exponent minus 127. So the float index can be from 126 to 128.)
M part: 24-bit base (the base part is actually a value that occupies 24-bit, because its highest bit is always 1, so the highest bit is omitted from storage, only 23-bit in storage. )
Exception: The floating-point number is 0 o'clock, and the index and base are 0, but the previous formula is not true. Because 2 of 0 is 1, so 0 is a special case. This special case does not have to think of interference, the compiler will automatically identify.
Example: Look at 12.5 specific data stored in the computer: 0xC1 0x48 0x00 0x00
Binary: 11000001 01001000 00000000 00000000
Format: seee eeee emmm MMMM MMMM MMMM MMMM MMMM
Visible:
S: 1, is a negative number.
E: (8-bit) to 10000010 to 10 for 130,130-127=3, that is, the actual exponent portion is 3.
M: (23-bit) is 10010000000000000000000. The base is actually: 1.10010000000000000000000
Now we adjust the value of the base part m by the value of the exponent part E.
The adjustment method is: If the exponent e is negative, the decimal point of the base shifts to the left, and if the exponent e is positive, the base point shifts to the right. The number of digits moved by the decimal point is determined by the absolute value of index E.
Here, E is positive 3, use the right shift 3 to get it: 1100.10000000000000000000
Conversion process: The 1100 to the left of the decimal point is represented as (1x2^3) + (1x2^2) + (0x2^1) + (0x2^0), with a result of 12.
To the right of the decimal point. 100 ... represented as (1x2^-1) + (0x2^-2) + (0x2^-3) + ... with the result of. 5.
The two value above and 12.5, because S is 1, is used as a negative number, that is-12.5. Therefore, the 16 binary 0xc1480000 is a floating-point number-12.5.
Binary number of floating-point-to-store format:
Here's how to replace a float with a binary number in a computer storage format. For example, 17.625 is converted to float type.
1, into the binary: 10001.101
2, decimal point, shift left 4 bits, become 1.0001101
3, the base is: 1.0001101, the index is: 4+127=131, bits: 1000011
4, the sign bit is 0, because is positive;
5, Merger: 0 1000011 0001101 after the complement of 0, 32-bit;
6, turn 16 into: Convert to 16:0x41 8D 00 00
Floating-point numbers are converted into binary code form codes:
1#include <iostream>2 using namespacestd;3 4 #defineUchar unsigned char5 6 voidBinary_print (Uchar c)7 {8 for(inti =0; I <8; ++i)9 {Ten if((c << i) &0x80) Onecout <<'1'; A Elsecout <<'0'; - } -cout <<' '; the } - - intMain () - { + floatA; -Uchar c_save[4]; + Uchar i; A void*F; atF = &A; - -cout<<"pls input a float num:"; - for(i=4; i!=0; i--) -Binary_print (c_save[i-1]); -cout<<Endl; in - return 0; to}
The C standard stipulates that the float type must represent at least 6 valid digits, just like the first 6 digits after the decimal point of a number such as 33.333 333, so whyfloat can represent a 6-bit valid number?
The explanation is as follows: 9 in decimal, the representation in binary is 1001, which is to say: A decimal number in the binary is required to 4bit, so we now have 24bit precision in float, so float in decimal has 24/4=6, so in decimal, Float can be accurate to 6 digits after the decimal point;
What about double? In fact, the principle of float is the same, just double the number of bits longer;
Note that double type data operations are much slower than float type operations;
C language float, double, long double in memory storage mode