TCP/IP explanation Volume 2: Implementation note-IP sharding and reinstallation

Source: Internet
Author: User

The IP header has three fields for partitioning and reinstallation: The identification field (ip_id), the flag field (three high bits of ip_off), and the offset field (13 low bits of ip_off ).

BITs ). A flag field consists of three 1bit flags. Bit 0 indicates that the reserved value must be 0. Bit 1 indicates "no partition" (DF), and bit 2 indicates "many other fragments" (MF.

In net/3, the flag is combined with the offset field, which is asked by ip_off, for example, as shown in the following figure:


The other 13bit of ip_off indicates the location of the shard in the original datagram, measured in 8 bytes. Therefore, except for the last part, all other parts are expected to be one

Data in multiples of 8 bytes, so that the subsequent parts start from the 8-byte boundary. Displays the byte offset relationship in the original datagram and the IP address header of the shard.

The offset of the shard.


In addition to the last part, the MF bit of the remaining parts is set.

Ip_id uniquely identifies the shard of a specific datagram. The source system uses the same source address (ip_src), Destination Address (ip_dst), and Protocol (ip_p) values

Sets the ip_id of each datagram to a unique value.

1. multipart

In the ip_output function (IP: Internet Protocol), if the group is suitable for the MTU of the selected interface, it is sent in a link-level frame. Otherwise, you must

Group parts and send them to multiple frames. The group can be a complete datagram or it is also a part created by the Front-side system. The main parts of the Code

It consists of three parts:

1. Determine the Part Size

If DF is set, ip_output discards the group and returns it. If the datagram is generated locally, the transport layer protocol returns the error to the process.

If the group is forwarded, an ICMP Destination unattainable error message is generated.

NET/3 does not implement the "path MTU discovery" algorithm.

Each new part includes an IP header, options in some original groups, and data with a maximum length of Len. In the calculation of Len, MTU of the interface is used to subtract the group.

After the length of the header, remove the 3 bits of the low position and then become a multiple of 8 bytes.

2. Construct a partition table

Create a partition table from the second partition. After the table is generated, the original group is converted to the first partition (reduce the generation of an mbuf). For each partition,

The ip_output function outputs the following actions:

Allocate a new group Cache

Bytes

Not all options are copied from the original group to the new group. If ipopt_copied indicates that the copied bit is

The option is copied to the chip.

Bytes

Set MF bit and offset fields (ip_off)

Bytes

Set the part length

Bytes

Copy data from the original group to the group

Bytes

Adjust the mbuf group header of the newly created shard so that it has the correct full length, clear the interface pointer of the new Shard, convert ip_off to the network byte order, and calculate the new

Partition check. Link this part to the previous part through m_nextpkt of mbuf.

3. Construct the first shard and send the shard

After the excess data at the end is truncated, the original group is converted to the first shard. At the same time, the MF bit is set and ip_len and ip_off are converted to the network byte order for calculation.

New inspection. In this part, all IP addresses are retained. When the destination address is reinstalled, only the IP Option of the first part of the datagram is retained. Some options, such

The source route option must be copied to each shard, even if it is discarded during reinstallation.

Finally, send each part. All errors encountered during sending will cause the subsequent parts to be discarded.


2. reinstall

The ipintr reinstalls the parts and submits the whole datagram to the transport layer for processing. Ipintr must try to reload the parts into a complete datagram.

NET/3 Records incomplete data packets on a global bidirectional linked list ipq. The ability to insert or delete a linked list is not limited to the end. Ipintr

The function performs a linear search on the table to find the appropriate datagram for the current shard. Remember that the parts are uniquely identified by 4 tuples {ip_id, ip_src, ip_des, and ip_p. Ipq

Each entry of is a partition table. If ipintr Zhao has a match, FP points to the matched table. The data structure of ipq is as follows:


A lot of work is done by ip_reass. If ip_reass combines the current part with the previously received part, it can reload it into a complete datagram and then return

Pointer to the reinstalled datagram. If the part is not reinstalled, ip_reass saves the part and ipintr processes the next part.

Assuming that the reinstallation process generates a complete data packet, ipintr uploads the complete data packet to the appropriate transport layer.


3. ip_reass Function

Ipintr transmits a shard to be processed and a pointer to ip_reass. The Pointer Points to the reheader matched in ipq. Ip_reass may be reinstalled successfully and return

A complete datagram may link this part to the reload linked list of the datagram, and reload it after other parts arrive. The header of each reloaded linked list is an ipq structure.

It identifies the four segments of a datagram Shard, ip_id, ip_src, ip_dst, and ip_p, and is saved in the ipq structure of each reloaded linked list header. NET/3 with next and Prev

Construct a datagram linked list, and use ipq_next and ipq_prev to construct a shard linked list.

Before the IP header of the group is placed in the reloaded linked list, it is first converted into an ipasfrag structure.


Ip_reass collects parts of a datagram on a bidirectional cyclic linked list connected by ipf_next and ip_prev. Displays the partition header linked list ipq and

Relationship between slice ipasgrag.


The full complexity of the reinstallation structure is not displayed. Re-installation relies entirely on pointing the pointer to three different structures on the underlying mbuf. Displays the mbuf, ipq structure,

The relationship between the ipasfrag structure and the IP structure.


The figure contains a large amount of information:

1. All structures are placed in an mbuf data zone.

2. The ipq linked list consists of the ipq structure connected by next and Prev. Each ipq structure stores four fields that uniquely identify an IP datagram.

3. Each ipq structure is considered as an ipasfrag structure when it is used as the header of the shard linked list. These parts are linked by ipf_next and ipf_prev, respectively

The ipq_next and ipq_prev members of the ipq structure are overwritten.

4. Each ipasfrag structure overwrites the IP structure that arrives at the shard. The data that arrives with the Shard is cached after the shard structure.

The reinstallation structure is explained from the logic point of view. This figure shows the reinstallation of the three datagram, and the relationship between the ipq linked list and the ipasfrag structure. The shadow part is missing.

.



Reinstallation is described in the following parts:

1. Create a reinstallation table

If no ipq meets the condition, create a reinstallation table with the first part of the new datagram. It allocates an mbuf to store the header of the new table (an ipq structure ),

And insert the structure into the linked list of the reloaded table.

2. reinstallation timeout

In net/3, the life period field ipq_ttl is used to manage the re-Timeout. The value of the initial reload value is set to 60. When ip_slowtimo is called each time, ipq_ttl is subtracted from 1,

The kernel calls ip_slowtimo every Ms. Assuming that the system has not assembled a complete IP datagram 30 seconds after receiving any part of the datagram

The system discards the IP address and reinstalls the linked list.

3. datagram identifier

On the target host, the byte range included in the shard may overwrite each other. The reason is that when a transport layer protocol retransmits a data report, the shard uses

Different routes from the original datagram, And the sharding mode may be different, which leads to project coverage on the target host. The Transport Protocol must force the IP address to use the original

Id field to ensure that the target host recognizes that the datagram is retransmitted.

NET/3 does not provide a mechanism for transport layer protocols to ensure that ip id fields are reused in retransmission data packets. When preparing new data, ip_output adds the global integer ip_id

To assign a new value. Even so, the net/3 system allows the transport layer to retransmit IP data packets with the same ID field to receive overlapping fragments.

It indicates that the slice may overlap the reached slice in different ways. Parts are numbered according to the order they arrive at the target host, and the reloaded parts are listed at the bottom of the figure.

It is displayed that the shadow part of a shard is the excess byte discarded.


4. Rebuild the datagram header

Ip_reass points the IP address to the first partition of the linked list and restores the ipasfrag structure to the IP address structure. Remove the entire group from the reloaded linked list and discard the ipq structure of the header,

Adjust m_len and m_data in the cache to include the header and options of the first hidden part.

5. Calculate the group Length

Calculate the number of bytes of data in the cache chain and save the value in m_pkthdr.len.

TCP/IP explanation Volume 2: Implementation note-IP sharding and reinstallation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.