Bastard in C #: Compromise and trade-offs, deconstructing decimal operations in C #

Source: Internet
Author: User

Off Topic

Before I begin, I would like to thank the blog Park for providing this excellent platform. By communicating with many like-minded friends on this excellent platform and helping each other, I was honored to have won the 15 Microsoft MVP award. It also makes me more convinced that the code changes the world. Grateful! Thanksgiving! Efforts! Come on!

0x00 Preface

Murong in life and work often encounter some very superstitious machine people, many of them believe that the machine is the most rational, no feelings, is the real untouchables, so the machine's operation is always the correct answer, if the answer is wrong, then it must be the person who operates the machine problem. But the operation of the machine must be correct? In fact, the machine error is not a rare case, a typical example is the decimal operation. Now let's talk about a related topic, in the machine or in particular, how is the C # language decimal number handled?

0X01 starts with a "wrong" answer.

Now that we're going to talk about how the machine does the math, then we naturally have to look at a small example of a machine operation error.

#include <stdio.h>void  main () {    float  num;     int i;     0 ;      for 0 ; i++)    {        0.1;    }    printf ("%f\n", num);}

This is a C language in the small program, the logic is very simple to understand, the result is to achieve nothing more than 0.1 add 100 times after the output. I don't think I need a computer to calculate, we can get an answer--10 our own mental arithmetic. So what kind of answer will the computer give us? Let's compile and run this C code.

The output of the answer is a surprise. How can the computer be inferior to people's mental arithmetic? If the friends mentioned in the preface by Mr. Murong May begin to tangle whether the code is wrong, or what is wrong with Mr. Murong's computer. But in fact the code is correct, and the machine is running as usual. So why is computer computing lost to people's mental arithmetic? This leads to the next question, how does a computer handle decimals? If we look at the mechanism by which computers deal with decimals, then all this mystery can be solved. (Of course, if a friend uses C # to add 100 times to 0.1, then the result is 10.) But that's the hidden work that C # does for us behind the scenes, which is essentially the same thing with computers dealing with decimals.

0x02 The format of numbers

A program can be seen as a digital model of the real world. Everything in the real world can be transformed into numbers in the world of computers to revive. So one problem that has to be solved is how the numbers are expressed in the computer. This is also the meaning of the appearance of digital format.

As we all know, machine language is all numbers, but this article naturally does not care about the entire binary format. Here we are only concerned about how meaningful numbers in the real world are represented in a computer. In simple terms, meaningful numbers can be broadly divided into the following three formats.

Integer format

Most of the numbers we meet in the process of development are actually integers. Integers are also the easiest to represent in a computer. The integers we encounter can be represented using a 32-bit signed integer (Int32). Of course, if needed, there are also signed 64-bit integer data types (Int64) to choose from. As for the number corresponding to the integer is the decimal, and the decimal mainly has two representations.

Fixed-point format

The so-called fixed-point format, that is, all the data in the contract is fixed in the decimal place. The most common example of a fixed-point decimal is the money type in SQL Server. In fact, the fixed-point decimal has been very good, it is obviously suitable for many need to deal with the situation of decimals. But it has a inherent disadvantage, that is, because the position of the decimal point is fixed, so it can indicate that the range is limited. So the protagonist of our article is coming out.

Floating-point format

The solution to the problem of fixed-point formatting is the emergence of floating-point format. The floating-point format consists of symbols, mantissa, cardinality, and exponent, which represent a decimal by these four parts. Since the inside of the computer is binary, the cardinality is naturally 2 (as in the decimal base is 101). So the computer often does not need to record the cardinality in the data (because always 2), but only uses the symbol, the mantissa, the exponent these three parts to represent. Many programming languages provide at least two data types that use the floating-point format to represent decimals, the double-precision floating-point number double and the single-precision floating-point float that we often see. Similarly, in our C # language, there are two types of data that use the floating-point format to represent decimals-in the C # language standard double-precision floating-point numbers and single-precision floating-point numbers correspond to System.Double and System.Single. But in fact there is a third data type in the C # language that uses the floating-point format to represent decimals, which is the decimal type--system.decimal. It is important to note that the floating-point format is represented in many ways, whereas in C # you follow the IEEE 754 standard:

    • Float single-precision floating-point number is 32 bits. The 32-bit structure is: symbol part 1bit, exponent part 8bit and tail number part 23bit.
    • Double doubles the floating point number as 64 bits. The 64-bit structure is: symbol part 1bit, exponent part 11bit and tail number part 52bit.

0x03 range, accuracy and accuracy

Now that we've finished talking about several representations of the numbers in the computer, then we have to mention some of the metrics when choosing a number format. The most common is nothing more than these points: range, accuracy, accuracy.

Representation range for number formats

As the name implies, the representation range of a number format refers to the range of the smallest value that this number format can represent to the maximum value . For example, a 16-bit signed integer representation range is from 32768 to 32767. If the value of the number to be represented exceeds this range, then using this number format does not represent the number correctly. Of course, numbers in this range may not be correctly represented, for example, a 16-bit signed integer cannot accurately represent a decimal, but there is always a close value that can be represented by 16 as a signed integer format.

Precision for numeric formats

To be honest, precision and accuracy make many people have a very vague feeling, seems to be the same but there are differences. But Mr. Murong needs to remind you that accuracy and accuracy are the two concepts that have a huge gap.

In layman's terms, the precision of a numeric format can be thought of as how much information in the format is used to represent a number. Higher precision usually means that more numbers can be represented, and one of the most obvious examples is that the higher the accuracy, the more accurate the numbers in this format are to the real numbers. For example we know 1/3 if converted to decimal 0.3333 .... is infinite, then it can be written in the case of five bits of precision 0.3333, and in the case of seven bits becomes 0.333333 (of course, if seven bits for five bits, then 0.333300).

The precision of the number format also affects the process of calculation. To give a simple example, if we are using a precision in the calculation. Then the whole calculation might turn out to be the following:

0.5 * 0.5 + 0.5 * 0.5 = 0.25 + 0.25

= 0.2 + 0.2

=0.4

And if we are using two-bit precision, then the calculation process will become the following situation.

0.5 * 0.5 + 0.5*0.5 = 0.25 + 0.25

=0.5

Compared with the results of two kinds of precision, the result of calculation and the correct result in one precision case is 0.1 worse. The result is calculated normally using the two-bit accuracy. Therefore, it can be found that it is very meaningful to ensure accuracy in the process of calculation.

Accuracy of digital formats

The representation range and precision of the digital format have been introduced, so let's introduce the accuracy of the digital format. As has just been said, accuracy and accuracy are a couple of confusing concepts.

Then we will be more popular to the accuracy of a comment, simply say it is the number format (the specific environment) represents the number and the real number of errors. The higher the accuracy, the smaller the error between the number represented by the number format and the value of the real number. The lower the accuracy, the greater the error between the number represented by the number format and the value of the real number.

It is important to note that the accuracy of the digital format and the accuracy of the digital format are not directly related, which is also a lot of friends in the conceptual often confusing place. Numbers expressed in low-precision numeric formats are not necessarily less accurate than the numbers represented by the use of high-precision numeric formats.

To give a simple example:

Byte   0x05; Int16  0x0005; Int32    0x00000005  5.000000f5.000000000000000;

At this point, we use 5 different number formats to represent the same number 5, although the precision of the number format (from 8 bits to 64 bits) is different, but the number represented by the number format is the same as the real number. That is, for the number 5, the 5 digital formats have the same accuracy.

0x04 Rounding Error

Now that we've seen several digital formats that are common in computers, let's talk about how computers represent real-world numbers in digital format. It is well known that computers use 0 and 1, or binary, to represent integers in binary notation, but when we use binary notation for decimals, we tend to have some questions. For example, what is binary decimal 1110.1101 converted to decimal? The first glance seemed a little more confusing. In fact, its processing and integer are the same, will be the number of digits and the right to multiply the result sum. Before the decimal point of the right, we are already very familiar with, from right to left is 0 power, 1 power, 2 power to increment, so the decimal point before the binary conversion to decimal is:

1 * 8 + 1 * 4 + 1 * 2 + 0 = 14

And in the right after the decimal point, the corresponding left-to-right is-1 power, 2 power in descending order. So the binary converted to decimal after the decimal point is:

1 * 0.5 + 1 * 0.25 + 0 * 0.125 + 1 * 0.0625 = 0.8125

So 1110.1101 of this binary decimal is converted to decimal, which is 14.8125.

By observing the binary conversion of the decimal point to the decimal process, did you crossing find a very interesting fact? That is, the binary after the decimal point does not represent all decimal numbers, in other words, some decimal numbers cannot be converted to binary. This is very well understood, because after the decimal point, the binary bit weights are decremented by the rhythm divided by 2, while the decimal is decremented by the rhythm divided by 10. Therefore, if 4 digits after the decimal point are in binary notation, that is, from. 0000~.1111 the contiguous binary values in this range actually correspond to the decimal numbers that are not contiguous, and all possible results are just combinations of the individual bits (0.5, 0.25, 0.125, and 0.0625) added.

Therefore, a very simple decimal number if the binary to the correct representation, the number of bits used may be very long or even infinite. A good example of this is the use of binary floating-point numbers to represent 0.1 in a decimal:

Double 0.1d;

In fact, the value stored in the variable x is not 0.1 in the true decimal, but a binary floating-point number closest to the decimal 0.1. This is because, regardless of the number of binary digits after the decimal point, 2 of the negative power can not be added to the result of 0.1, so 0.1 this decimal number in the binary will become an infinite decimal.

Of course, binary may not be able to accurately represent a decimal decimal number well understood, because this is a bit like in decimal we also cannot accurately represent 1/3 such a loop decimal.

At this point, we have to compromise with the computer. Because we now know that the values used in the computer may not be equal to the values in the real world, but rather a value that is very close to the original number represented by the computer using a number format. And in the process of running the entire program, our computer will always use this is only approximate value to participate in the calculation, we assume that the real value is n, and the computer will actually use another value n + E (of course e is a positive negative and very small number) to participate in the operation of the computer. At this point, the value E is the rounding error.

And this is just a number in the computer using an approximate value to indicate that if the value is involved in the calculation, then there will obviously be more error. This is the reason why the C program calculated the error at the beginning of this article, because it does not correctly represent the value that participates in the calculation, and eventually becomes the approximate value. Of course, the C # language is relatively "advanced" a lot, although in the computer is also approximate, but the display in front of us at least still more in line with people "expected" value. But in C #, is there really no error in decimal calculation? After all, it all seems to be just a decoy.

0x05, whether the decimal number of C # is equal

I don't know, you crossing. When you use some relational operators, you notice that there are some unexpected problems when you use the equals sign directly to compare two decimals for equality. My friends. Using relational operators to directly compare two decimal sizes is a lot more, and the direct comparison between the two decimals is not too many cases. At the same time, I would also like to remind you that it is best not to compare the two decimal places easily, even though in the high-level language of C # It is still possible to get the "wrong" answer, because we actually compare whether the two decimals are "close" to equal, not whether the two number is true equality. The following example might better illustrate this point:

usingSystem;classtest{Static voidMain () {Doublef = Sum (0.1d,0.2d); Doubleg =0.3d;        Console.WriteLine (f); Console.WriteLine (f==g); }        Static DoubleSum (DoubleF1,DoubleF2) {        returnf1+F2; }}

We compile and run this code, and we can see the output as follows:

Comparing the results of these two decimals is not true, which is not the same as our expectation.

The true shape of floating-point numbers

We know that the binary decimal 1110.1101, like the one above, is actually expressed in terms of human habits, but the computer does not recognize the thing with the decimal point. So the computer uses the number format described earlier to represent such a number, so how exactly does a binary floating point figure behave in a computer? In fact, the above introduction of the number format has been introduced, but there is no actual look at the end is not an intuitive understanding, that in the final part of this article, we will look at a binary floating point number in the computer real look.

0100000001000111101101101101001001001000010101110011000100100011

This is a 64-bit binary number. If you use it as a double-precision floating-point number, what are the parts of it that represent each other?

By following the section on floating point numbers described above, we can divide it into the following sections:

Symbol: 0

Exponential part: 10000000100 (binary, can be converted to decimal 1028)

Number of tail parts: 0111101101101101001001001000010101110011000100100011

Therefore, converting it to a decimal number in binary notation is:

( -1) ^0 * 1.0111101101101101001001001000010101110011000100100011 x 2^ (1028-1023)

= 1.0111101101101101001001001000010101110011000100100011 x 2^5

= 101111.01101101101001001001000010101110011000100100011

If the reader is careful enough, have you found something interesting? That is, in this 64-digit number that is used to represent a double-precision floating-point in a computer, the number of the tail part is: 0111101101101101001001001000010101110011000100100011

But after being transformed from a computer into a form in which humans use binary notation for decimals, the numbers become 1.0111101101101101001001001000010101110011000100100011x 2^ 5, why is there an extra 1 before the decimal point?

This is because at the end of the part, in order to combine the various forms of floating-point numbers into the same representation, it is stipulated that the value before the decimal point is fixed to 1. Since the number before the decimal point is always 1, this 1 does not need to be saved in the computer in order to save a bit of data.

So how do you guarantee that the value before the decimal point of a binary decimal is 1? This requires a logical shift of the binary decimals, which changes the integer part to 1 by moving it left or right several times. For example, in this binary decimal: 1110.1101, let's try to turn it into the mantissa of a floating-point number that the computer can recognize.

1110.1101 (raw data)-->0001.1101101 (change the integer part to 1 by moving right)-->0001.11011010000000000000 .... (Expand the number of digits to match the number format)-- 11011010000000000000 .... (minus the integer part, leaving only the fractional parts)

Well, that's about the decimal calculation in C # or the computer. You are welcome to exchange.

Bastard in C #: Compromise and trade-offs, deconstructing decimal operations in C #

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.