Checksum algorithm Analysis

Last Update:2018-07-26 Source: Internet

Author: User

Tags sin

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Before looking at the computer network related books, every time you see the IP or UDP header checksum, are at a glance, thought quite simple, is not the sum of 16bit data. Recently in the study of "TCP/IP Detailed Volume 1: Protocol" This book, see the checksum is 16bit word binary anti-code and (Halo, did not pay attention to the original is anti-code and, it seems that the previous reading is not careful AH. Sin, sin ~ ~), feel very strange, why would use anti-code and, rather than direct summation it. (Because I think the TCP/IP protocol inside the algorithm and ideas are generally very classic, they do so for a reason) below to explore how this checksum algorithm specifically implemented. http://blog.csdn.net/li_xiang1102/article/details/6901660

First, the IP, ICMP, UDP and TCP headers all have checksum fields, the size is 16bit, the algorithm is basically the same.

In order to calculate the checksum of the packet when the data is sent. You should follow these steps:

(1) Place the checksum field at 0;

(2) The need to verify the data as a 16-bit units of the number of components, followed by binary inverse code summation;

(3) The resulting results are stored in the checksum field.

When receiving data, it is relatively simple to calculate the checksum of the packet, as follows:

(1) The first as a unit of 16 digits of the number, followed by binary inverse code summation, including the checksum field;

(2) Check whether the result of the computed checksum is 0;

(3) If equals 0, the description is divisible, and the checksum is correct. Otherwise, the checksum is wrong, and the protocol stack discards the packet.

Although the checksum algorithm for the above four messages is the same, the scope of action is different:

IP checksum verifies only 20 bytes of IP header;

The ICMP checksum overwrites the entire message (ICMP header +icmp data);

UDP and TCP checksums not only cover the entire message, but also have a 12-byte IP pseudo header, including the source IP address (4 bytes), the destination IP address (4 bytes), the Protocol (2 bytes, the first byte of 0), and the tcp/udp Kanenaga (2 bytes). In addition, the length of UDP, TCP datagrams can be odd number section, so in the calculation of the checksum needs to be added at the end of the padding byte 0 (note that the padding bytes just to calculate the checksum, can not be transmitted).

It is also important to mention that the checksum of UDP is optional, when the checksum field is 0 o'clock, indicating that the UDP message is not using checksums, the receiver does not need to verify and check. What if the UDP checksum evaluates to 0 o'clock? There is a saying in the book: "If the checksum evaluates to 0, then the value deposited is all 1 (65535), which is equivalent in binary inverse code calculation." ”

With so much to say, how does this checksum count?

1. What is binary inverse code summation

For an unsigned number, first seek its anti-code, and then from low to high, bitwise add, there is overflow to the high 1 (as with the normal binary addition rules),if the highest bit has carry, the lowest bit1.
First here the anti-code as if we have learned the symbol number of the inverse code is not the same (that is, a positive inverse code is its own, negative anti-code is in its original code on the basis of the symbol bit unchanged, the rest of you take the reverse), here is not a positive negative, direct each bit to take the opposite.
The above-mentioned sentence is not the same as our general addition rules: the highest bit has carry, then the lowest bit into 1. There is some doubt, why do we do it. A careful analysis (for the sake of explanation, 4bit binary Inverse Code summation example), the above operation, so that in the event of an addition carry overflow, the overflow value is not 10000, but 1111. That isoverflow when the sum result is 1111 full, which can also explain why 0000 and 1111 both represent 0 (you can also see that any number with these two numbers is the sum of the binary inverse codes, which exactly matches the addition of the number 0).

Here is another example of two binary inverse Code summation operation:
The inverse Code addition operation of the original code addition operation
3 (0011) + 5 (0101) = 8 (1000) 3 (1100) + 5 (1010) = 8 (0111)
8 (1000) + 9 (1001) = 1 (0001) 8 (0111) + 9 (0110) = 2 (1101)
As can be seen from the above two examples, when the addition does not overflow, the original code and the inverse Code addition operation results, when there is overflow, the result is not the same, the original code is full 10000 overflow, and the anti-code is full 1111 overflow, so the difference is exactly 1. For example, just to visually observe the arithmetic rules of binary inverse code summation, as to why define such rules and what other features of the operation rules exist, it may be necessary to involve something of algebraic theory (the theory of mathematics is not well learned, only from the superficial analysis).

In addition, the binary inverse code summation operation needs to explain that the first to take the inverse and add the first and then take the inverse, the result is the same. (In fact, almost all of our programming algorithms are added first and then reversed.) ）

2. Implementation of CHECKSUM algorithm

When we talk about binary inverse code summation, the algorithm implementation of the checksum is much simpler. Say less nonsense, directly on the code:

[CPP]view plaincopy

1//COMPUTE CHECKSUM

2 USHORT checksum (USHORT *buffer,int size)

3 {

4 unsigned long cksum=0;

5 while (size>1)

6 {

7 cksum+=*buffer++;

8 size-=sizeof(USHORT);

Ten if(size)

11 {

cksum+=* (UCHAR *) buffer;

13}

14//Convert 32 digits to 16

( cksum>>16)

Cksum= (cksum>>16) + (cksum &0xffff);

return (USHORT) (~cksum);

18}

Buffer is a pointer to the data buffer to be verified, and the size is the total length of the data to be verified (in bytes)

The 4~13 line code sums the data in 16bit increments, since the highest bit carry needs to be added to the lowest bit, so the cksum must be 32bit unsigned long, and the high 16bit is used to hold the carry in the accumulation process, plus the code 10~ Line 13 is the processing of an odd size case.

The purpose of the 14~16 line code is to add the value of the Cksum high 16bit to the low 16bit, that is, add the highest carry in the summation to the lowest bit. Here a while loop is used to determine if the Cksum high 16bit is nonzero, because the 16th line of code is still likely to carry a high 16bit of cksum.

Some of these are implemented by the following two code:

Cksum = (cksum >> +) + (Cksum & 0xFFFF);
Cksum + = (cksum >>16);

Only two additions are made to ensure that the sum of the cksum is 16 bits 0 high and the effect is the same in both ways. In fact, the above loop is executed up to two times.

The 17 lines of code, which is the result of summing the 16bit data, gets the result of the summation of the binary inverse code, and the function returns the value.

3. Why use binary inverse code summation?

Well, the last question, why use binary anti-code to calculate the checksum, instead of directly using the original code or complement.

This question I thought for a long time, because the level of limited really do not understand, so in Baidu on a flurry of search, nothing (do not know Baidu is not to force, or everyone is not concerned about this problem. ）。 Decisive for Google, knocked 3 key words: Why checksum TCP, Hey results The second article is I want the article ...

The link to everyone first:http://www.netfor2.com/checksum.html

This article focuses on the difference between the binary inverse Code summation (the 1 's complement sum) and the complement summation (the 2 's complement sum), as well as the advantages of using the inverse code summation in the TCP/IP checksum.

It may look Awkword-use a 1 ' scomplement addition on 2 ' s complement machines. This method however have its ownbenefits.

Probably The most important was that itis endian Independent. Little Endian Computers store hex numbers with the Lsblast (Intel processors for example). Big Endian Computers put the LSB first (IBM mainframes for example). When carry was added to the LSB to form the 1 ' scomplement sum (see the example) it doesn ' t matter if we add + 01 + The result is the same.

Other benefits include the easiness of checking the transmission and the checksum calculation plus a variety of ways to SP The eed up, the calculation by updating, only has IP fields and is changed.

The above is part of the original text, explaining some of the advantages of using the inverse code summation in the TCP/IP checksum:

A. the system is not dependent on the big or small end. that is, whether you are the sender of the calculation or the receiver checks the checksum, do not need to call htons or NTOHS, directly through the above 2nd section of the algorithm can get the correct results. You can give an example of this problem yourself, when you use the inverse code summation, Exchange 16-digit byte order, the result is the same, but the byte order is also exchanged correspondingly, and if you use the original code or complement sum, the results may be different.

B. Calculating and validating checksums is simple and fast. to tell the truth, this is not how to understand, feel in the calibration and calculation, the original code or complement summation is more simple (from the C language angle), in the verification and validation above, through the same algorithm to determine whether the results are all 0, it is convenient to some, so it may be from the comprehensive consideration of the inverse code summation to be simpler. In addition, the IP packet in the transmission process, the router often only modifies the TTL field (minus 1), when the router forwards the message can directly increase its checksum, without the need to recalculate the entire IP header. Of course, from the perspective of assembly language, anti-code summation there are many efficient places, here is not in-depth investigation ~ ~ ~

Conclusion: Originally a little attention to the place, in-depth inquiry to find so many things. Learning algorithms In fact there is no need to hold the "Introduction to the algorithm" one page to chew (Hei, elder brother also has a), I prefer to explore the TCP/IP protocol or the principles of the Linux kernel to investigate the algorithm and realize the idea, this is more interesting, and this some of the algorithms and ideas are quite classic, slowly experience, must benefit.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More