We read the world wrong but say it deceives us.
"We read the world wrong, but say the world deceives us."
reference : TCP/IP Primer Classic (fifth edition)
TCP/IP Detailed Volume One: protocol
First, Introduction
UDP (User Datagram Protocol) is a simple datagram-oriented transport layer protocol in which each output of a process produces exactly one UDP datagram and is assembled into a copy of the IP datagram to be sent. Its position in the protocol stack is as follows
Feature: UDP is a datagram- oriented protocol, so it does not provide reliability : It sends the application to the IP layer of data sent out, but does not guarantee that they can reach the destination
Second, UDP header
Because it does not provide reliability, the header of UDP is relatively simple
The fields have the following meanings:
Port number: is a positive integer for the Transport Layer recognition application
16-bit UDP length: UDP length refers to the total length of UDP header and UDP data. Note that the minimum UDP length is 8 bytes, which means that a UDP datagram with no data can be sent . Remember in the TCP/IP protocol-IP protocol that there is a first ministerial degree in the IP header, then the total length of the IP datagram minus the length of the IP header can be a UDP datagram lengths.
16-bit UDP inspection and: Regarding the inspection and, should pay special attention to several points.
First of all, the UDP test and overwrite UDP header and UDP data , and the IP header of the test and only cover the IP header, second, the UDP test and is optional , and TCP test and is necessary; the length of the UDP datagram can be Odd number section , so at the end of the calculation to fill 0, but these 0 may not be transmitted; the UDP check and calculation method is the same as the IP header (16 bit binary inverse Code summation)
Finally, UDP includes a 12-byte-long pseudo-header when it calculates the test and is only for the purpose of calculating the test and setting. The various fields used in the UDP verification and calculation process are as follows
If the test and the calculated result is 0, the value of the deposit is all 1. If the transmitted test and is 0, the sending end is not calculated and verified. Thus, if the receiving end of the calculation of the test and not full 0, but the test and the field is full 0, the error in the transmission process, the UDP datagram will be discarded, and do not produce any error messages
Although UDP inspection and is optional, but they should always be used, after all, still have to confirm the correctness of data transmission
Third, IP fragmentation and reassembly
Why are IP shards discussed here? The previous mention of UDP is not to provide transmission reliability, but at least to ensure that the data can be fully transmitted, so the responsibility can only be left to the IP. Let's look at some of these concepts.
maximum UDP datagram length
The maximum length of an IP datagram is 65,535 bytes, minus the 20 bytes of the IP header and the 8 bytes of the UDP header, and the maximum length of the UDP datagram should theoretically be 65507 bytes. In reality, however, most implementations do not reach this length for the following reasons:
Application Interface Limitations : The socket API provides a function that can be called by an application to set the length of the receive and send caches, and most systems provide a UDP datagram that reads and writes more than 8192 bytes by default.
Kernel implementations : There may be some implementation features (or errors) that make the IP datagram length less than 65535
Although IP can send such a long datagram, the application is not guaranteed to be able to read data of that length. Therefore, the UDP programming interface allows the application to set the maximum length that can be processed. When the datagram length exceeds the maximum length that the application can handle, different APIs have different processing strategies, which requires the sending side to control the size of the datagram
Path MTU
MTU (maximum transmission unit), refers to the data link layer to transmit the data frame length of a high value, different network types have different values. The path MTU refers to the smallest MTU in the path of two communication hosts . The path MTU between the two hosts is not necessarily a constant, because routing is not necessarily symmetric and the network is changing at any time
Icmp
ICMP (Internet Control Message Protocol) is a protocol located at the network layer. It is often considered an integral part of the IP layer because the main function of ICMP is to transmit error messages and other information that needs attention, and the ICMP packets are encapsulated inside the IP datagram.
A few common error messages:
Port unreachable: If a UDP datagram is received by the receiving side and the destination port does not match a process being used, then UDP returns an ICMP unreachable message
Shard Conflict Unreachable: When the router receives a datagram that needs to be fragmented, and the IP header also sets a non-fragmented flag, the router returns an ICMP unreachable message. Also, the path MTU of this path can be returned to the source host or router, which can help other routers on the path determine the MTU on each subpath, known as the Path MTU discovery mechanism
Source Station suppression: When a system (router or host) receives a datagram faster than its processing speed, it is possible to return a source station to suppress error messages. Because the source station suppression needs to consume the bandwidth, and actually has no effect, so people do not support the generation of source station suppression messages
IP Shard
As already mentioned, the data link layer typically limits the maximum length (MTU) of each data frame to be sent. At any time the IP layer receives a copy of the IP datagram to be sent, it determines which interface is sent to the local data (routing), and query the interface to obtain the MTU, if the datagram length is greater than the MTU, then it needs to be fragmented. Shards can occur on the original send-side host or on an intermediate router.
referring to the IP header field, the "second row" of the IP header has three fields:16-bit identification ,3-bit flag , and 13-bit offset , which are used in the IP shard process. For each IP datagram sent by the sending side, its literate segments contain a unique value that is copied to each slice when the datagram is fragmented, indicating that they are from the same datagram. The flag field uses one bit to denote "more slices," and the position 1 indicates that there are other shards, so the last piece is going to take that position 0. The Slice offset field refers to the position at which the slice is offset from the beginning of the original datagram. In addition, after the Shard, the total length value of each slice is changed to the length value of the slice. a bit in the flag field is called a "non-fragmented" bit, and if the 1,IP is placed, the datagram will not be fragmented. If the datagram cannot be sent at this time, an ICMP unreachable error message (Shard collision) will be returned
After the Shard, each piece becomes a grouping, where the packet refers to the network layer to the data link layer between the transmission unit, and the IP datagram refers to the IP layer end-to-end transmission unit. Each group has its own IP header
It is important to note that:
In the case of a shard, the data portion of each slice (except for the rest of the IP header) must be an integer multiple of 8 bytes apart from the last slice (for the time being, it is not clear why, the personal guess is to facilitate the calculation of the chip offset, and the receiving end of the reorganization)
Any transport layer header appears only in the first piece of data
A shard may occur on a source host or router on a path, but a reorganization can only occur on the destination host. Because routers on the path are only responsible for forwarding and do not care about the data content of the IP datagram
Even if only one piece of data is lost, the entire datagram needs to be re-transmitted, because the intermediate router may be fragmented again, and the destination does not know how the missing slices are fragmented.
Each piece may be out of order when it arrives at the destination, but the IP header uses the previous three fields to correctly stitch the shards together
The following is a complete description of the transmission process of an IP datagram as a summary, assuming that the transmit layer datagram to be sent needs to be fragmented, and that the sending and receiving ends are not directly connected:
① the transmission layer of the source host sends the datagram to the network layer;
Shard:②
②ip The datagram is too large to be fragmented, the datagram is fragmented: The first piece contains the data at the beginning of the datagram, the data size is 8 bytes of integer multiples, contains the transport layer header, and the Flag field represents the "more slice" field 1, the slice offset field 0, and then gross position length value to the slice length value after the Shard The second piece to the penultimate piece contains data of a certain length, and the data size is 8 bytes of integer multiples, but none of them contain the transport layer header, their "more Slice" field is set to 1, the slice offset field according to the sheet number and the previous piece of data length settings, the total length set with the first piece The last piece contains the remainder of the data section, the data length is not necessarily 8 bytes of integer multiples, does not contain the transport layer header, "more Slices" field is set 0, the slice offset field according to the previous data length settings, the total length value is also set according to the previous slice; all slices have the same identification field
The ③ network layer transmits each slice as a packet to the data link layer;
Transmission:④~⑥
The ④ Data link layer transforms the packet into a data frame and sends it to the next system (router or host);
⑤ the router receiving the data frame is routed up to the network layer;
⑥IP Check the datagram length and MTU, if you need shards, skip to step ②, and if you don't need shards, skip to step ④ until you send to the destination host;
Reorganization:⑦
⑦ destination host successively received each piece, if the destination host found missing a piece, then will return an ICMP error message, the source host will resend the datagram, back to step ①; If there is no missing piece, the destination host will reorganize all slices into a transport datagram based on the identification field, the Flag field, and the slice offset. Transport layer sent to application, data transfer complete
For the implementation of IP shard reorganization, you can refer to implementation of IP fragment recombination on Linux TCP/IP protocol stack
Four, UDP server design
Finally, let's discuss the protocol features of UDP that affect the design and implementation of servers that use this Protocol
Client IP address and port number
The UDP server must save the source IP address and the source port number from the customer datagram, which allows an interactive UDP server to handle multiple clients
Destination IP Address
Some applications need to know who the datagram is sent to, that is, the destination IP address
UDP input Queue
Most UDP servers are duplicate servers, and a single server process processes all client requests on a single UDP port. Therefore, an input queue must be maintained to process requests that arrive almost simultaneously
Restrict local IP addresses
Most UDP servers create UDP endpoints by making their IP addresses wildcard characters, which can be used to receive user requests from any local interface
restricting remote IP addresses
Limit the remote IP address and port number to receive only UDP datagrams for a specific address
Multiple receivers per port
When the UDP data is reported to the destination IP address as a broadcast or multicast address, and there are multiple endpoints at the destination IP address and port number, a copy of the datagram is sent to each endpoint. If a UDP datagram arrives at a unicast address, then only one of the endpoints is sent a copy of the datagram
Summary : UDP is a simple protocol, most of this article is used to introduce the content of IP shards, wait until the introduction of TCP, and then discuss the difference between UDP and TCP and contact
Tcp/ip-udp