Csapp in-depth understanding of computer Systems Chapter II META

Source: Internet
Author: User
Tags mathematical functions

1, you can use the arithmetic of the number to move right, and then use the 0XFF such a number to do the mask operation, you can get to a number of sign bit. One of the most important functions of a computer's shift operation is to extract a piece of information from a bit pattern using a mask operation.

2, in the C language of the conditional statement, as well as the three purpose of the conditional operator, can be used in a shift way to do.

3, in the bit expansion operation, compared to a 32-bit signed number expansion to 64-bit, then in the case of ensuring that the original value is unchanged, the 31 low-level expansion to 64-bit low, and the highest sign bit expansion to high 33 bits.

4, the use of bit mode can only represent some 2 of the n power of some floating-point numbers, and other more special rational number, will approximate the expression

5, because the use of the standard floating-point number form, so in the storage of floating-point numbers, will not store the integer part of the floating-point, because we have already stipulated that the floating-point integer part is 1, so there is no need to store, only the mantissa after the decimal point can be saved.

6, floating-point normalized value, that is, the standard form we already know what the situation is, then in the process of operation, there are some special values, or the wrong value of how we deal with it.

0.0 The OS will change the sign bit, the mantissa, the order code all to 0

When the order code is 1, if the mantissa is all 0, then we should look at the sign bit, if the sign bit is 1, then it is negative infinity. If the sign bit is 0 then it is positive infinity.

When the mantissa is not 0, then he is not a valid number, in the case of Nan, Incredibles is not a. This value is also useful for some illegal calculations, such as 1.

7, because in the complement of the floating-point number, all 0 or 1 is a special value, so in the expression of the whole floating point value can not use these, so in order to compensate for these errors, with a deviation value bias. Use exp (bit representation of the order)-bias to get the order value of the true floating-point number.

Bias Range is 1-2^ (k-1)-1, K is the mantissa of the Order code, the mantissa should be because the relationship between the machine is also fixed.

I joined a bias, I did not read the beginning of the book when I did not understand, and then watch the video when I understand, because both in the single-precision or double-precision values, your index should be able to represent negative numbers, because the order is stored in the sign bit and the middle of the mantissa, so the order code is an unsigned number, If it is a single precision, the order of 8 bits, then the value can be expressed is 1---254, so that in order to be able to represent a negative exponent, it is necessary to reduce the bias, so as to be able to represent the -126~127 range of the exponent.


To normalize a floating-point number:

(1) A floating-point number into a binary, and written in 1.x scientific notation form, note is 2, the base is 2. In this case, after the decimal point is the end of the part, you can make up the mantissa portion 0, if the single precision, to ensure that the mantissa of a 23-bit.

(2) because E = Exp-bias is used when calculating the order code, EXP = e+ Bias. What I want to emphasize here is that when we are writing numbers, the order that we see is E, such as 1.00000*2^12, then 12 is e not the value represented by exp, the bit pattern.

(3) At the end of the corresponding three positions on the bit mode to write to the corresponding location.



Operation of floating-point numbers:

(The floating-point number is not like an integer, the integer is able to express the exact value, but the floating-point number can not, it is possible to do some rounding operations, so just coding, does not represent the exact value, because the mantissa and the order and so on are precision restrictions, some of us can be handwritten floating point number, But computers are not necessarily able to express them. )

In the floating-point operation, there may be two problems, one is the addition of the time, two number of different orders, so adjust the mantissa, so that the decimal point of two numbers can be aligned, so as to add.

The second problem is multiplication, which is likely to overflow because the two-digit exponent may be large when multiplying. So, this time people come up with a way, that is, regardless of overflow other problems, first calculate the calculation of accurate results, and then in the rounding operation, so that the results can be controlled within the specified accuracy.


(a) The type of rounding operation: 1. Rounds to 0 2. Infinite to negative infinity Round 3. Infinite to positive infinity Round 4. To the nearest rounding (seemingly the best approach, but a number in the middle of the two integers, this situation will be problematic) 5. Rounding to even numbers

(b) It is important to note, however, that the combination law and the distributive law are not used when the operations with rounding operations are performed. These two mathematical algorithms are not applicable in floating-point number operations with rounding operations.

(3.14 + 1e10)-1e10! = 3.14 + (1E10-1E10) 1e20 * (1E20-1E20)! = (1E20 * 1e20)-(1E20 * 1e20)

By default, a floating-point number is rounded to an even number.



Floating-point numbers in the C language:

The first point is also familiar to me, the previous time in the ACM competition when the teacher often stressed that do not use the = = symbol to compare floating point, that will be problematic. In fact, a lot of mathematical functions and operators used in floating point numbers will be problematic, so it is prudent to be good. The best way to judge whether the two floating-point numbers are equal is to use subtraction, and if the result of both is a very, very small amount, you can approximate that the two floating-point numbers are equal.



Conversions between floating-point numbers and integers:

Rounding may occur from int---> float, but overflow does not occur

from int--double, if the value of int is below 53 bits (including 53 bits), an exact conversion is obtained.

From float---double, we get an exact conversion because the precision of double is much greater than the precision of float

From the float, double--> int, such conversions can be problematic, one from a single-precision floating point number to an int, because of the existence of the order code, to adjust the mantissa mantissa, so in the shift operation may lose some low-effective bits. And such conversions will process the data as they are rounded to 0. Another problem is that floating-point numbers can be far greater or smaller than integers to represent ranges, so we change the minimum value of the two types of floating-point numbers to tmin, and some special values in floating-point numbers, and we convert them to tmin or Tmax.






The representation of floating-point numbers is basically a few ways:





Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.