Summary of kernel construction of SKB data packets

Source: Internet
Author: User
Tags htons

I. IPv4, TCP, and UDP checksum and Calculation

Checksum is the redundant domain used by the network protocol to identify transmission errors. Some checksum not only detects errors, but also automatically fixes some types of errors. The idea of checksum is simple. Before transmitting a data packet, the sender calculates a small, fixed-length domain (checksum) that contains a hash of data. If a few digits of data are changed during transmission, the corrupted data may generate a different checksum. Depending on the function you use to generate a Checksum, the checksum provides different levels of reliability. The Checksum used by the IP protocol is a simple method that includes the sum to obtain the inverse code. This method is too weak and cannot be considered reliable. For more reliable integrity checks, you must rely on L2 CRC or SSL/IPSec message authentication codes.

Different protocols can use different checksum algorithms. The IP protocol checksum only overwrites the IP header. The checksum of most L4 protocols overwrites the header and data. It seems that L2 (for example, Ethernet) has a Checksum, L3 (for example, ip) has another, L4 (for example, TCP) also has a redundant approach, because they are all applied to the overlapping parts of the data, but the check is valuable. An error occurs not only during transmission, but also between layers. Moreover, each protocol is responsible for ensuring correct transmission of its own, and it cannot be assumed that the high or low layer completes this task.

For example, we can imagine that pc a on lan1 sends data to pc B on lan2 over the Internet. Assume that the L2 protocol used in lan1 uses the checksum, but not on lan2. It is important to provide at least one high-level checksum to reduce the possibility of accepting corrupted data.
We recommend that you use a checksum in each Protocol definition, although it is not required. However, it must be acknowledged that the design of a good protocol can remove the overhead caused by overlapping features between different layers of protocols. Because most L2 and L4 protocols provide checksum, it is not strictly necessary to have checksum in L3. For this reason, this checksum is removed from IPv6.

In IPv4, the IP checksum is a 16-bit domain that overwrites the entire IP header, including options. The Checksum is initially calculated by the data packet source. during the entire process to the target, a hop is updated to reflect the head changes of each vro. Before updating the checksum, each Skip must first check the integrity of the package by comparing the checksum in the package with that calculated locally. If the integrity check fails, the packet will be discarded, but the ICMP: L4 protocol will not be generated for processing (for example, a timer that forces resend If no response is received within a given time ).

Here are some situations that will trigger the update checksum requirement:
1. Reduce TTL
The router must reduce the TTL in the IP Address Header before forwarding data packets. Because the IP checksum also overwrites the domain, the original checksum is no longer valid. In Chapter 20th "ip_forward function", you will see that the TTL is reduced by ip_decrease_ttl, and this function also processes the checksum.
Packet destruction (including NAT)
All features that change one or more IP header fields force re-calculation of the checksum. Nat is probably the most famous example.
2. IP Option Processing
Because the options are part of the header, they are also overwritten by the checksum. Therefore, each time they are processed by adding or modifying IP headers (for example, adding timestamps), the checksum is forcibly re-calculated.
3. Parts
When a packet is split, each shard has a different header. Most of the fields remain unchanged, but the fields related to the shards, such as offsets, are different. Therefore, the checksum has to be recalculated.
Since the checksum used for the IP protocol uses the same simple algorithms as TCP, UDP, and ICMP, they share a set of common functions. There is also a function optimized for IP Checksum. According to the definition of the IP address checksum algorithm, the header is divided into 16 characters for sum and complement. Figure 18-13 shows an example of a checksum calculation. In order to sum up only two 16-bit characters. Linux does not sum up 16-bit characters. It sums 32-bit or 64-bit characters to facilitate computation (this requires an additional step between the sum and the inverse code; see the description of csum_fold in the next section ). The function that implements this algorithm is called ip_fast_csum, which is directly used in most architecture languages.


The Checksum algorithm of the grouping header is the inverse code after 16-bit accumulation and UDP data headers also use the same validation algorithm, but the data involved in the calculation is different from the IP grouping header.

The structure of the IPv4 group header is as follows:

0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 4 5 6 7 8 9 0 1
+- +-+
| Version | IHL | type of service | total length |
+- +-+
| Identification | flags | Fragment Offset |
+- +-+
| Time to live | Protocol | header checksum |
+- +-+
| Source address |
+- +-+
| Destination Address |
+- +-+
| Options | padding |
+- +-+

The "header checksum" field is the header checksum part. When an IPv4 packet header verification is to be calculated, the sender first sets it to 0, and then accumulates it to the IPv4 packet header one by one based on 16 bits, accumulating and saving it in a 32-bit value. If the total number of bytes is odd, the last byte is added separately. After the accumulation is complete, add the high 16 bits in the result to the low 16 bits. repeat this process until the high 16 bits are all 0. The following describes the entire calculation process using an IPv4 group (a package with data connected to the DLC) that is actually intercepted:

0x0000: 00 60 47 41 11 C9 00 09 6B 7A 5B 3B 08 00 45 00
0x0010: 00 1C 74 68 00 00 80 11 59 8f C0 A8 64 01 AB 46
0x0020: 9C E9 0f 3A 04 05 00 08 7f C5 00 00 00 00 00 00
0x0030: 00 00 00 00 00 00 00 00 00 00 00

In the preceding hexadecimal sampling, the start point is the beginning of the Ethernet frame (DLC package. The IPv4 group header starts from the address offset 0x000e. The first byte is 0x45, and the last byte is 0xe9, that is, the IPv4 group header ends with the target IP address. Based on the above algorithm description, we can perform the following calculations:

(1) 0x4500 + 0x001c + 0x7468 + 0x0000 + 0x8011 + 0x0000 (accumulate and set the position to 0 first) + 0xc0a8 + 0x6401 + 0xab46 + 0x9ce9 = 0x3a66d

(2) 0xa66d + 0x3 = 0xa670

(3) 0 xFFFF-0xa670 = 0x598f

Note that in the first step, we use 0x0000 to set the header checksum. We can see that the checksum of this grouping header is exactly the same as the received value. The preceding process is only used by the sender to calculate the initial checksum. In practice, for the intermediate forwarding router and the final receiver, you can directly add the received IPv4 packet header checksum part according to the same algorithm, if the result is 0 xFFFF, the verification is correct.

For TCP and UDP datagram, the header also contains a 16-bit checksum. The verification algorithm is exactly the same as that of the IPv4 packet header, but the data involved in the verification is different. In this case, the checksum not only contains the entire TCP/UDP datagram, but also overwrites a virtual header. The virtual header is defined as follows:

0 7 8 15 16 23 24 31
+ -------- +
| Source address |
+ -------- +
| Destination Address |
+ -------- +
| Zero | Protocol | TCP/UDP length |
+ -------- +

There are IP source addresses, IP destination addresses, protocol numbers (TCP: 6/UDP: 17), and the total length of TCP or UDP datagram (header + data ). The purpose of adding a virtual header to the verification is to verify whether the datagram reaches the correct destination and prevent spoofing attacks ). The protocol type of the preceding packet at 0x0018 = hexadecimal 11, that is, the packet is a UDP packet, the length of the two bytes is stored at the beginning of 0x0027 (including the source port address 4 bytes + UDP length 2 bytes + checksum 2 bytes = 8 bytes, and the length of UDP data: therefore, the UDP data length of this packet is actually 0 bytes. The IP Source Address is stored in eight bytes from 0x0x1a to 0x0x21. First, set the two bytes at the checksum and 0x002a to 0, calculate the UDP packet checksum as follows:

(1) 0xc0a8 + 0x6401 (source IP address before) + 0xab46 + 0x9ce9 (destination IP address before) + 0x0011 (zero and Protocol) + 0x0008 (UDP length) + 0x0f3a (source port) + 0x0405 (destination port) + 0x0008 (UDP length) + 0x0000 (checksum preset is 0) +... (No data here: the UDP data length is actually 0 bytes) = 0x28038

(2) 0x28038 => 0x8038 + 0x0002 = 0x803a

(3) 0xffff-0x803a = 0x7fc5

The calculation result is the same as that at 0x0028. Note that the UDP length appears twice.

2. UDP data packets are constructed in the kernel, and checksum calculation requires attention.

Two functions are involved.
Skb_checksum (const struct sk_buff * SKB, int offset,
Int Len, _ wsum csum)
Four parameters:
SKB: Needless to say
Offset: IP header length
Len: IP payload length
Csum: 0. The Checksum is 0.

Csum_tcpudp_magic (_ be32 saddr, _ be32 daddr,
Unsigned short Len,
Unsigned short proto,
_ Wsum sum)

Saddr: source IP address
Daddr: Destination IP address
PROTO: transmission protocol
Sum: IP payload checksum

The two functions are used together. The former is to calculate the UDP payload checksum, and the latter is to calculate the entire IP payload checksum.

Because the checksum does not involve the link layer, if SKB is obtained directly from the NIC Driver, you need to set SKB-> data to iph.
Before calling skb_checksum, you must set udph-> check to 0. This is what the agreement stipulates.

Iii. Construction of SKB data packets in different situations

1. After the network adapter receives the data packet:

Its job is to strip the MAC header, assign values to some fields, and call netif_rx to send the datagram (such as IP packet) after the MAC header is stripped to the upper layer protocol. It is processed by the protocol stack. Taking snull in ldd3 as an example, although snull is unrelated to hardware, this process is similar.

Struct sk_buff * SKB; <br/> struct snull_priv * priv = netdev_priv (Dev); <br/> SKB = dev_alloc_skb (Pkt-> datalen + 2 ); <br/> If (! SKB) {<br/> If (printk_ratelimit () <br/> printk (kern_notice "snull RX: low on mem-packet dropped/N "); <br/> priv-> stats. rx_dropped ++; <br/> goto out; <br/>}< br/> skb_reserve (SKB, 2 ); /* align IP on 16B boundary */<br/> memcpy (skb_put (SKB, Pkt-> datalen), Pkt-> data, Pkt-> datalen ); <br/>/* write metadata, and then pass to the receive level */<br/> SKB-> Dev = dev; <br/> SKB-> protocol = eth_type_trans (SKB, Dev); <br/> SKB-> ip_summed = checksum_unnecessary; /* don't check it */<br/> priv-> stats. rx_packets ++; <br/> priv-> stats. rx_bytes + = Pkt-> datalen; <br/> netif_rx (SKB );

The packet format received at the moment is as follows: Mac + IP + UDP/udp + Data

At this time, the processing is to strip the MAC header and then update some domain values. These are all done in the eth_type_trans function. Note that SKB-> Dev = dev; is very important. If this statement is not available, it will cause a system error and crash (at least on my board ).

Note: The eth_type_trans () function is mainly assigned the following values:

SKB-> Mac. Raw, SKB-> protocol and SKB-> pkt_type. See whether the following code has a MAC header.

2. construct a new SKB packet from a string.

Previously, I only saw how to modify the data packet and construct the data packet by myself. This was the first time. At the beginning, it was really hard for me to see the kernel code and explore it myself. The code I wrote myself was as follows:
/* Assume that data is a pointer to a string, and data_len is the data length */<br/> struct 6hdr * ipv6h 6h 6h; <br/> struct udphdr * udph; <br/> struct SK _ buff * new_skb; <br/> int length = data_len + sizeof (struct rj6hdr) + sizeof (udphdr ); <br/> new_skb = dev_alloc_skb (length); <br/> If (! New_skb) <br/>{< br/> printk ("low memory... /n "): <br/> return-1; <br/>}< br/> skb_reserve (new_skb, length); <br/> memcpy (skb_push (new_skb, data_len), data, data_len); <br/> new_skb-> H. uh = udph = (struct udphdr *) skb_push (new_skb, sizeof (struct udphdr); <br/> memcpy (udph, & udph_tmp, sizeof (struct udphdr )); // Note: At this moment, my udph_tmp is the UDP header of the data packet intercepted in another process. If you construct the data packet by yourself, you need to fill in the fields in the UDP data header by yourself. <Br/> udph-> Len =...; // assign a value to udph-> Len. Note that udph-> Len is. During storage, the numbers you want to update are converted into hexadecimal numbers, which are then swapped and assigned to UDH-> Len. <Br/> udplen = new_skb-> Len; <br/> new_skb-> NH. ipv6h 6h = ipv6h = (struct ipv6hdr *) skb_push (new_skb, sizeof (struct ipv6hdr); <br/> memcpy (ipv6h 6h, & ipv6h_tmp, sizeof (struct ipv6hdr )); // same as UDP header comment. <Br/> ipb6h-> payload_len = ..........; // The same as udph-> Len. note that the length mentioned here does not include the length of the IPv6 Header, but the length after the IPv6 Header is removed. <Br/> udph-> check = 0; <br/> udph-> check = csum_00006_magic (& 00006h-> saddr, & 00006h-> daddr, udplen, ipproto_udp, csum_partial (char *) udph, udplen, 0); <br/> /// // note that if it is IPv4, you also need to calculate the IP address checksum, but IPv6 does not need to calculate the IP address checksum. Therefore, there is no IPv6 Header checksum here. /// // <Br/> new_skb-> Mac. raw = new_skb-> data; // because there is no MAC header <br/> new_skb-> protocol = htons (eth_p_ipv6 ); // indicates that the packet is an IPv6 packet <br/> new_skb-> pkt_type = packet_host; // indicates that the packet is sent to the local machine <br/> new_skb-> Dev = & can_control; // This is very important. If this statement is not used, the kernel will die. At least on my board. Can_control is my net_device struct variable. <Br/> netif_rx (new_skb );

3. When you need to change the data domain of the original SKB.
There are two methods:
First, we can judge the tailroom of SKB. If the space is large enough, we can put the data to be added in the tailroom of SKB. If the tailroom is not big enough, you need to call the skb_copy_expand function to expand the tailroom or headroom.
For example, if we need to add a 16-byte string to the end of SKB, the code is similar to the following:
If (skb_tailroom (SKB) <16) <br/>{< br/> nskb = skb_copy_expand (SKB, skb_headroom (SKB), skb_tailroom (SKB) + 16, gfp_atomic ); <br/> If (! Nskb) <br/>{< br/> printk ("low memory .... /n "); <br/> dev_kfree_skb (SKB); <br/> return-1; <br/>}</P> <p> else <br/>{< br/> kfree_skb (SKB); // note that if the hook function is found at this time, SKB cannot be released here, otherwise it will cause a crash. <Br/> SKB = nskb; <br/>}</P> <p> memcpy (skb_put (SKB, 16), ipbuf, 16 ); // ipbuf is the string to be added to the end of SKB </P> <p> udplen = SKB-> len-sizeof (struct rj6hdr ); </P> <p> udph-> Len + = 0x1000; // convert to + 16 in decimal format. <br/> ipv6h 6h-> payload_len + = 0x1000; </P> <p> udph-> check = 0; <br/> udph-> check = csum_00006_magic (& ipv6h 6h-> saddr, & ipv6h 6h-> daddr, udplen, ipproto_udp, csum_partial (char *) udph, udplen, 0); </P> <p> SKB-> Mac. raw = new_skb-> da Ta; // because there is no MAC header <br/> SKB-> protocol = htons (eth_p_ipv6); // indicates that the packet is an IPv6 packet <br/> SKB-> pkt_type = packet_host; // indicates the package sent to the local machine <br/> SKB-> Dev = & can_control; // it is very important. If this statement is not used, the kernel will die. At least on my board. Can_control is my net_device struct variable. <Br/> netif_rx (SKB); <br/>}

Note: After you call skb_copy_expand or modify the data field of SKB, you must update udph-> Len and ipv6h 6h-> payload_len. Otherwise, the data packets received by upper-layer applications (such as UDP sockets) are the original data packets instead of the modified data packets, because of the udph-> Len.








Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.