Put C # in detail: compromise and trade-offs, deconstruct the decimal operation in C,

Digress

Before the beginning of the text, I would like to thank the blog Park for providing this outstanding platform. I am honored to have won the Microsoft MVP award for 15 years by communicating with many like-minded friends and helping each other on this excellent platform. It also makes me more confident that code has changed the world. Thank you! Thank you! Effort! Come on!

0x00 Preface

Murong often encounters people who are superstitious about machines in his life and work. Many of them believe that machines are the most rational. Without any feelings, they are truly selfless, therefore, the answer given by machine operations is always correct. If the answer is incorrect, it must be the question of the operator. But is the machine's operation correct? In fact, it is not uncommon for a machine to encounter an operation error. A typical example is the fractional operation. Next let's talk about a related topic. How is the small and medium numbers processed on machines or specifically in C # Language processed?

0x01 start with an incorrect answer

Since we want to talk about how machines do wrong arithmetic problems, we should first look at a small example of machine operation errors.

#include <stdio.h>void main(){ float num; int i ; num = 0; for(i = 0; i < 100; i++) { num += 0.1; } printf("%f\n", num);}

This is a small program written in C language. The logic is very simple and easy to understand. The result is nothing more than to add 0.1 and then output 100 times. I don't want a computer to calculate it, so we can get the answer-10 right away. What kind of answer will the computer give us? Next we will compile and run the C code.

The answer is surprising as soon as it is output. How Can computers be inferior to human computing? If those friends mentioned in the preface by Murong may be confused about whether the code is wrong or what is wrong with Murong's computer. But in fact the code is correct and the machine runs normally. So why is computing lost to human computing? This raises the next question: how does the computer process decimals? If we look at the computer's mechanism for processing decimal places, then all these fans will be able to solve it. (Of course, if a friend uses C # To add 0.1 to 100 times, then the result is 10. But that is some of the hidden work C # has done behind the scenes for us. It is essentially the same for computers to process decimals ).

0x02 numeric format

A program can be seen as a digital model of the real world. Everything in the real world can be transformed into a digital re-emergence in the computer world. Therefore, a problem that has to be solved is how numbers are expressed in computers. This is also the meaning of the digital format.

As we all know, machine languages are all numbers, but this article will naturally not care about all binary formats. Here we only care about how meaningful numbers are represented in computers. In short, meaningful numbers can be divided into the following three formats.

Integer format

Most of the numbers we encounter during the development process are actually integers. Integers are also the easiest to represent in computers. All integers we encounter can be expressed by 32-bit signed integers (Int32 ). Of course, if necessary, there is also a signed 64-bit integer data type (Int64) available. The decimal point corresponds to the integer, And the decimal point mainly has two representation methods.

Fixed Point format

The so-called fixed point format means that the decimal points of all data in the machine are fixed. The most common example of a fixed point decimal point is the money type in SQL Server. In fact, the fixed point decimal point is already quite good. It is obviously suitable for many situations where decimal points need to be processed. However, it has an inherent drawback, that is, because the decimal point is fixed, it can indicate that the range is limited. Therefore, the main character of this article is coming soon.

Floating Point format

The solution to the inherent problem of fixed point format is the emergence of floating point format. The floating-point format consists of symbols, ending numbers, base numbers, and indexes, which represent a decimal number. Because the computer is binary, the base is 2 (just as the base of decimal is 10 ). Therefore, the computer usually does not need to record the base number (because it is always 2) in the data, but only represents the symbols, tails, and indexes. Many programming languages provide at least two data types that use floating-point format to represent decimals, that is, double and float. Similarly, in our C # language, there are also two data types that use floating-point format to represent decimals-according to the C # language standard double-precision floating-point numbers and single-precision floating-point numbers correspond to System in C #.. double and System. single. But in fact, there is still a third data type in the C # language that uses the floating point format to represent decimals, that is, the decimal type -- System. Decimal. Note that there are many floating point format representations, while C # follows the IEEE 754 standard:

- Float Single-precision floating point number is 32 bits. The 32-bit structure is: 1bit of the symbol, 8bit Of The exponent, and 23bit of the ending part.
- Double-precision floating point number is 64-bit. The 64-bit structure is: 1bit of the symbol, 11bit Of The exponent, and 52bit of the ending part.

0x03 indicates the range, accuracy, and accuracy.

Now that we have finished talking about several representation forms of numbers in the computer, we have to mention some indicators when selecting the number format. The most common difference is that it indicates the range, precision, and accuracy.

Range of numbers

As the name suggests, the range of the Number Format indicates what the number format can represent.*Minimum value*To*Maximum Value*Of**Range**. For example, a 16-bit signed integer represents a range from-32768 to 32767. If the value of the number to be expressed exceeds this range, the number cannot be correctly expressed using this digital format. Of course, numbers in this range may not be correctly represented. For example, a 16-bit signed integer cannot accurately represent a decimal number, however, there is always a close value that can be expressed in 16 as a signed integer format.

Numeric format precision

To be honest, precision and accuracy make a lot of people feel very vague. They seem to be the same but different. But Murong should remind you that there is a huge gap between precision and accuracy.

In general, the precision of the number format can be considered as the amount of information in the format used to represent a number. Higher precision usually means that more numbers can be expressed. The most obvious example is that the higher the precision, the closer the number that can be expressed in this format to the real number. For example, we know that 1/3 is converted to decimal 0. 3333 .... it is infinite, so it can be written as 0.3333 in the case of five-bit precision, and 0.333333 in the case of seven digits (of course, if seven digits represent five digits, that is 0.333300 ).

The accuracy of the number format also affects the calculation process. For a simple example, we use a precision in the calculation. The entire calculation may be like the following:

0.5*0.5 + 0.5*0.5 = 0.25 + 0.25

= 0.2 + 0.2

= 0.4

If we use two-digit precision, the calculation process will become the following.

0.5*0.5 + 0.5*0.5 = 0.25 + 0.25

= 0.5

Compare the calculation results of the two precision conditions. The difference between the calculation results and the correct results is 0.1 in the case of one precision. However, when two pieces of precision are used, the calculation result is normal. Therefore, we can find out how meaningful it is to ensure accuracy in the computing process.

Numeric format Accuracy

The range and accuracy of the number format have been described. Next we will introduce the accuracy of the number format. As we have already said, accuracy and accuracy are a confusing concept.

Then, let's give a comment on the Accuracy. Simply put, it represents the deviation between the number represented by the numeric format (in a specific environment) and the real number. The higher the accuracy, the smaller the error between the number expressed in the number format and the value of the real number. The lower the accuracy, the greater the error between the number expressed in the number format and the value of the real number.

Note that the precision of the digital format is not directly related to the accuracy of the digital format. This is also a concept that many friends often confuse. A number in a low-precision numeric format is not necessarily less accurate than a number in a high-precision numeric format.

Here is a simple example:

Byte num = 0x05;Int16 num1 = 0x0005;Int32 num2 = 0x00000005;Single num3 = 5.000000f;Double num4 = 5.000000000000000;

At this time, we use five different numeric formats to represent the same number 5, although the precision of the numeric format (from 8-bit to 64-bit) is different, however, the number expressed in the number format is the same as the actual number. That is to say, for number 5, these five digit formats have the same accuracy.

0x04 integer difference

After learning about several common digital formats in computers, let's talk about how computers use digital formats to represent numbers in the real world. As we all know, the computer uses 0 and 1, that is, binary. It is very easy to use binary to represent integers. However, when using binary to represent decimal places, we often have some questions. For example, how much is the binary decimal 1110.1101 converted to decimal? At first glance, it seems confusing to have a decimal point. In fact, the processing is the same as the integer, that is, the sum of the values and bitwise weights of each digit. We are familiar with the bitwise right before the decimal point, which increases from right to left by the power of 0, power of 1, and power of 2, therefore, the binary conversion before the decimal point is:

1*8 + 1*4 + 1*2 + 0 = 14

After the decimal point, the decimal right decimal point is-1 power and-2 power, respectively. Therefore, the binary conversion after the decimal point is:

1*0.5 + 1*0.25 + 0*0.125 + 1*0.0625 = 0.8125

Therefore, the decimal number 1110.1101 is converted to decimal number 14.8125.

By observing the process of converting the decimal point to the decimal point, do you see an interesting fact? That is, the binary after the decimal point cannot represent all decimal numbers. In other words, some decimal numbers cannot be converted to binary. This is easy to understand, because after the decimal point, the bitwise of the binary decimal point is decreased according to the rhythm divided by 2, while the decimal point is decreased according to the rhythm divided by 10. Therefore, if the four digits after the decimal point are represented in binary, that is, from. 0000 ~. In fact, the decimal number corresponding to the continuous binary value in the range of 1111 is not consecutive. All possible results are only a combination of the bitwise weights (0.5, 0.25, 0.125, and 0.0625.

Therefore, if a number in decimal format is accurately expressed in binary format, the number of digits used may be very long or even infinite. A good example is to use a binary floating point number to represent 0.1 in decimal format:

double x = 0.1d;

In fact, the value saved in variable x is not actually 0.1 in decimal format, but a binary floating point number closest to decimal 0.1. This is because no matter how many digits of the decimal point are after the decimal point, the negative power of 2 cannot sum up to get the result of 0.1. Therefore, the decimal number of 0.1 will become an infinite decimal number in the binary.

Of course, binary may not be able to accurately represent a decimal place, because it is a bit similar to that in decimal places, we cannot accurately represent circular decimal places like 1/3.

Now, we have to compromise with the computer. Because we now know that the value used in the computer may not be equal to the value in the real world, but a value that is very close to the original number expressed by the computer in a certain digital format. During the entire program running process, our computer will always use this approximate value for computation. We assume that the actual value is n, in fact, the computer will use another value n + e (of course e is a number that can be positive, negative, and very small) to participate in the computation in the computer. In this case, the value e is the integer error.

This is just a number that uses an approximate value in the computer. If the value is involved in the calculation, it will obviously bring more errors. This is also the reason why the c program was incorrectly calculated at the beginning of this article, because it cannot correctly represent the value involved in the calculation, and it has finally become an approximate value. Of course, the C # language is relatively "advanced". Although it is an approximate value in the computer, it is at least as expected. But in C #, is it true that the decimal number calculation will not go wrong? After all, it seems to be just a blind eye.

0x05: whether the decimal ratio of C # is equal

I don't know whether unexpected problems may occur when you directly use equal signs to compare two decimal places when using some Relational operators. My friends use Relational operators to compare the sizes of two decimal places directly, but to compare the two decimal places directly is not too much. At the same time, I would like to remind you that it is best not to easily compare the two decimal places to determine whether they are equal, even in C # high-level languages, it is still possible to get a "wrong" answer, this is because we actually compare whether two decimal places are "close" to equal, rather than whether the two decimal places are truly equal. The following example may better illustrate this point:

using System;class Test{ static void Main() { double f = Sum (0.1d, 0.2d); double g = 0.3d; Console.WriteLine (f); Console.WriteLine (f==g); } static double Sum (double f1, double f2) { return f1+f2; }}

After compiling and running this code, we can see that the following content is output:

The result of comparing the two decimal places is not true, which is different from our expectation.

The real appearance of Floating Point Numbers

We know that the decimal point 1110.1101 in the preceding figure is actually expressed according to human habits, but computers cannot recognize such decimal points. So the computer will use the previously introduced digital format to represent such a number, so how does a binary floating point number represent in the computer? In fact, I have already introduced the digital format in the previous section, but I cannot understand it directly without looking at it. At the end of this article, let's look at the real appearance of a binary floating point number in a computer.

0100000001000111101101101101001001001000010101110011000100100011

This is a 64-bit binary number. If we use it as a double-precision floating point number, what does each part represent?

Based on the part of the floating point number described above, we can divide it into the following parts:

Symbol: 0

Exponent part: 10000000100 (binary, can be converted to 1028 in decimal format)

Tail: 0111101101101101001001001000010101110011000100100011

Therefore, to convert it to a decimal number expressed in binary, it is:

(-1) ^ 0*1.0111101101101101001001001000010101110011000100100011x2 ^ (1028-1023)

= 1.0111101101101101001001001000010101110011000100100011x2 ^ 5

= 101111.01101101101001001001000010101110011000100100011

If you observe carefully, do you find anything interesting? That is, in the 64-bit digits used to represent the double-precision floating point number in the computer, the number of digits in the ending part is: 0111101101101101001001001000010101110011000100100011

However, after being converted from a computer to a decimal representation in binary format, the number is changed to 1.0111101101101101001001001_10101110011000100100011x 2 ^ 5. Why is one more decimal point?

This is because in the tail part, in order to unify the floating point numbers in various forms into the same expression, it is required to set the value before the decimal point to 1. Since the number before the decimal point is always 1, in order to save a data bit, this 1 does not need to be saved in the computer.

How can we ensure that the decimal point before a binary decimal point is 1? This requires logical shift of binary decimal places. After several shifts to the left or right, the integer part is changed to 1. For example, in the preceding binary decimal point: 1110.1101, let's try to turn it into the ending number of floating point numbers that can be recognized by the computer.

1110.1101 (raw data) --> 0001.1101101 (change the integer part to 1 by right shifting) --> 0001. 11011010000000000000 .... (expand the number of digits to conform to the number format) --> 11011010000000000000 .... (remove the integer part and retain only the fractional part)

Now, we can write about C # Or the decimal computation in the computer. Welcome to your discussion.