Float occupies 4 bytes (32bit) in memory, 32bit= sign bit (1bit) + digit digit (8bit) +The number of digits (23bit) in the exponential portion of the exponent is 8bit, which indicates that the range of values is 0-255(indicates 0~255 altogether 256 numbers), because the exponent can be negative, so the IEEE stipulates that the second side of the calculation here must be minus 127 is the real index,
So the index range of float is-127(0-127) to + -(255-127). The scientific notation of any number in the base section can be represented as 1.xxx*2^n, the fractional part can be expressed as XXX, the integer part is always 1, does not affect the accuracy, so it can not be stored. 23bit binary can be represented by the
The maximum value is 2^ at=8388608, a total of 7 bits, which means that there can be up to 7 significant digits (cannot represent a value greater than 8388608), but it is guaranteed to be 6 bits, that is, the precision of float is 6~7 significant digits. Take 8.25 as an example:8.25=1000.01 +The science notation for. 01 is 1.00001*2^3sign bit, positive number is 0, negative number is 1 exponent=3, that is, X-127=3, so x= the, so the digits should be 120, the corresponding binary number is 10, the,010Radix=00001, less than 0, so the 23bit corresponding binary number is 00,001, the, the, the, the, the, theso eventually the binary number in memory is0 Ten, the,010 xx,001, the, the, the, the, the, theA value of 4 bytes is $ 4 0 0
Description: I host the byte order is big endian byte order, so the reverse order
Double is stored in the same way as float, except that the 64bit= sign bit (1bit) + digit (11bit) + bottom digit (52bit)
The exponent must be minus 1023 to be the real exponent.
The maximum value that can be represented by the fractional portionof a double is 2^4,503,599,627,370,496 , which means that there can be up to 16 significant digits, but it is guaranteed to be 15 bits, that is, a double with a precision of 15~16 digits.
C language float, double the way data is stored in memory