Partition and reorganization of IP datagram (Linux kernel)

Source: Internet
Author: User

As mentioned above, the ip_append_data function implements the sharding of IP data packets. This method is incorrect and needs to be corrected, the main task of ip_append_data is to create a socket buffer (SKB) for sending network data. It splits the application data that exceeds the MTU Length Based on the MTU of the output network device interface obtained by the output route query, multiple skbs are created and put into the sending Buffer Queue (sk_write_queue) of the socket, but it does not add the network layer header to any SKB data, and then in the ip_push_pending_frames function, append all the skbs in the sending Buffer Queue to the struct behind the end member of the first SKB in the form of a linked list.
The frag_list in the skb_shared_info struct adds only the network layer header to the first SKB. Therefore, in fact, the entire application data is only in a SKB, ip_append_data is only used to prepare for the next real IP data partition.
After completing SKB assembly, ip_push_pending_frames handed it over to the ip_output function. ip_output calls the ip_finish_output function, which judges the length of SKB again, if the length exceeds the MTU value of the output device and meets other partition conditions, ip_fragment is called to split the data. Otherwise, ip_finish_output2 is called to output the data to the data link layer.
The sharding of IP data involves two fields in the IP header, namely, frag_off, a member of the struct iphdr. The third is three flag bits, and the second is not allowed to frag_off, this indicates that the IP datagram cannot be sharded. If such a datagram is sent and its length exceeds the limit of MTU, an ICMP error message is sent to the sender, the packet type is inaccessible to the target (3). The Code is the bit (4) that requires sharding but is set to disallow the fragment. If the third bit is set to 1, there are still parts next to it, 0 indicates that this part is the last part of a complete IP datagram. The 13-bit low of frag_off indicates the offset of the first byte of the shard in the whole IP datagram. The unit is the number of bytes divided by 8. Therefore, the 13-bit value must be shifted to three places, is the real number of offset bytes.
With the previous ip_append_data work, ip_fragment's sharding work is much easier. The CB member of struct sk_buff is stored in the struct inet_skb_parm In the inet field. Its definition is as follows:
Struct inet_skb_parm
{
Struct ip_options OPT;
Unsigned char flags;
# Define ipskb_forwarded 1
# Define ipskb_xfrm_tunnel_size 2
# Define ipskb_xfrm_transformed 4
# Define ipskb_frag_complete 8
# Define ipskb_rerouted 16
};
An IP data packet completed by the shard. The CB-> flags of each of its SKB is set to the ipskb_frag_complete flag. Ip_fragment first assigns the same value as the first SKB to the sk and destructor members of each skb in the frag_list to make them normal SKB. Copy the metadata and network layer header from the first SKB for each SKB, set the correct value of iphdr-> frag_off, and output them to the data link layer one by one. At this point, all data transmission at the network layer is complete.
Each part of the IP address datagram after the split is transmitted independently in the network. Therefore, it is generally not the same as the IP address datagram to the target host and may be out of order. In addition, they may be assembled on the intermediate transmission path or split again. Frag_off in the first part of the network layer makes correct re-Assembly possible. Next, let's take a look at how IP data is reassembled after it is sliced to the target host.
The protocol stack receives an IP datagram. The first function at the network layer is ip_rcv. After ip_rcv checks the correctness of the datagram, it submits it to ip_rcv_finish and ip_rcv_finish to dst_input after querying the input route, dst_input calls SKB-> DST-> input. If it is an IP datagram received locally, this function is ip_local_deliver. ip_local_deliver checks the frag_off Member of the IP header of the datagram from the beginning, if it is found that the low 13 BITs are not 0, or the third bits in the third bits are set to 1, it indicates that this is an IP shard (the low 13 BITs in the first shard are 0, however, the third part in the third part of the third part is set to 1, indicating that there are still parts. The last part indicates that the position is 0, but the 13-bit low is not 0, both are not 0 ). For IP sharding, this function calls ip_defrag to split it with the received IP address and wait for the subsequent IP address sharding until a complete IP datagram is formed.
All IP fragments of a complete IP datagram are stored in a struct ipq, which stores sufficient information. After the last shard arrives, it is restored to an IP datagram. The complete definition of struct ipq is as follows:
Struct ipq {
Struct hlist_node list;
Struct list_head lru_list;
U32 user;
U32 saddr;
U32 daddr;
2017-11-id;
U8 protocol;
U8 last_in;
# Define complete 4
# Define first_in 2
# Define last_in 1
Struct sk_buff * fragments;
Int Len;
Int meat;
Spinlock_t lock;
Atomic_t refcnt;
Struct timer_list timer;
Struct timeval stamp;
Int IIF;
Unsigned int RID;
Struct inet_peer * peer;
};
User is a identifier used to identify the source of the IP sharding group. The protocol stack receives the IP sharding from other hosts on the network or the local loopback interface. The Identifier value is ip_defrag_local_deliver, saddr, daddr, ID, protocol value all come from the IP header, used to determine that these IP fragments are indeed from a unique IP datagram. Because the protocol stack of a host may communicate with multiple hosts in the network at the same time, at a certain time point, the protocol stack usually has multiple groups of IP segments waiting for restructuring, that is, there will be multiple struct
An instance of the ipq structure. Multiple struct ipq instances are organized in a hash table ipq_hash. When an IP segment is received, a corresponding field in the IP header is used to calculate a hash value, find an item in the ipq_hash hash table, then match a struct ipq that exactly matches the above fields, and add the parts to the ipq. If the entries in the hash table do not match the current IP segment header, create a new struct ipq instance and add it to the hash table. All newly created ipq instances have a timer (timer Member). The default timeout value is ip_frag_time (30 seconds). If the parts of an IP datagram are not all received after 30 seconds, this ipq times out, And the timeout processing function is executed. The timeout processing function deletes this ipq and sends an ICMP error message to the receiving end. The message type is timeout (11 ), the Code indicates that the survival time during the datagram assembly is 0 (1 ).
After obtaining an ipq, start to put the SKB of the received parts into this ipq. First, check the last_in of ipq. If its value is complete, it indicates that this part group is complete, the newly received IP address slice is incorrect and is discarded directly. Check the CB-> flags of the newly received SKB because the flag of each slice is set to ipskb_frag_complete when the datagram is sent for sharding.
Next, check the ip_mf (third place in frag_off's third place) of the received IP part. If it is 0, it indicates that this is the last part. Set last_in of ipq to last_in, len is always updated to the offset value of the shard with the largest offset in the received IP Fragment plus the length. If it is correct, it is the length of the entire IP datagram. After the SKB is removed from the network layer header, it is added to the fragments linked list. The linked list is organized in the order of the Offset in frag_off, that is, the real IP segment order, if the offset of the received part is 0, set the value of last_in to first_in. Meat is always updated to the total length of the received IP part. Finally, if it is correct, meat should be the same as Len.
After the IP parts are added, if meat is determined to be the same as Len, you can consider restructuring. The ip_frag_reasm function completes the reorganization and obtains the fragments linked list, re-place the linked list starting with the second skb to the frag_list linked list of the struct skb_shared_info struct behind the end of the first SKB, and reset the IP header, A complete IP datagram is reorganized. Release ipq before return.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.