A Free Trial That Lets You Build Big!
Start building with 50+ products and up to 12 months usage for Elastic Compute Service
Today, if anyone says they are proficient in the network, the second meaning is "proficient in TCP". In fact, many people who claim to be proficient in TCP are only proficient in socket interfaces and are not very proficient in TCP behavior, I am not proficient, but it is definitely a medium level or above. If you are really proficient in TCP behavior, please do not read this article or directly send an email to me. Let's discuss it. If you only know about the socket interface, we recommend that you read this article, then, let's take a look at TCP's comprehensive analysis of intractable diseases.
UDP is short for the User Datagram Protocol. For the group exchange network, it actually plays the role of the traditional post office, while TCP plays the role of telephone operators and logistics companies, for a group exchange network, UDP is more basic than TCP. It can be said that TCP is a stream-based communication process, over the IP Datagram Protocol, TCP and UDP achieve higher-level "circuit switching" and "group switching" respectively ".
Many people think that it doesn't matter if the UDP connection is not connected, but it doesn't matter in terms of technology unless you can give reasons that do not matter twice as much as you do.
When operating the TCP/IP protocol, to put it bluntly, when writing network-based communication code, you must know where the protocol you are using is implemented. For most operating systems, protocol stacks are part of the kernel, so they run in the kernel space. This involves a problem that will affect the efficiency, that is, the switching process between the kernel space and the user space, for x86 processors and most other processors, this switchover is time-consuming and involves the context save/restore. Therefore, if your application is more efficient, you need to minimize the number of switching times, this is not enough. If the switchover cannot be avoided, we should minimize the number of copies. Because UDP is based on the datagram, data will be called once as long as it is ready, this is determined by the application logic. However, we can decide whether to use send or sendto. If we look at the parameters, we can see that sendto has more parameters than send, so this means that, if sendto is used, more parameters need to be copied. After the parameters are in the kernel space, the kernel also needs to prepare a Data Structure to temporarily accommodate these parameters. Parameters. After the data is sent, the kernel needs to release these temporary storage parameters (of course, you can use the stack to manage these parameters for automatic release ).
If a UDP is connected, you can directly call send after connect. The kernel permanently maintains the UDP connection after application connect, and sends and receives data each time, the kernel no longer needs to allocate/Delete the data, but only needs to query it. It also reduces the Beibei volume of data. Since the connect UDP already has a "connection" in the kernel ", at any time, as long as the communication is not over, the kernel will always be able to track the "connection" at any time, resulting in 1.2.
If it is a UDP without a "connection", after the data is sent by calling sendto, the sender releases any information about the destination, however, if the data is eventually encapsulated by an IP address (or encapsulated by any lower-layer protocol with an error message), ICMP will occur when the data fails to reach the destination when it encounters a certain problem on the road or the destination (for non-IP protocol, it can be another mechanism). However, the sender no longer knows which application the error message is sent to (the source/destination information has been released ).
For UDP communication with "Connections", the kernel protocol stack has maintained one-way connections from the source to the destination. Therefore, when an error message is sent, the kernel protocol stack accurately locates the application to which the request is forwarded.
The last note is that UDP connections are unidirectional and do not generate any communication traffic when connect is called. It is only bound to a five-element group in the kernel protocol stack, the quintuple is UDP protocol, source IP address, source port, destination IP address, and destination port.
If you ask about the difference between TCP and UDP on the Internet, the results are as follows: UDP does not need to be confirmed, UDP is more efficient, based on connection and no connection, TCP resource consumption ,...
However, all of these are myths. They are myths that are to be broken. UDP is not necessarily more efficient than TCP. You must know that since TCP has developed to this day, its algorithms are already very rich, and reasonable configuration is sufficient to cope with various complex environments, in most cases, it is more efficient than UDP.
TCP/IP is only a protocol family. In this protocol family, all the work that should be done is completed. UDP is used as a datagram protocol, its role is to use the port concept for Application Reuse. In terms of implementation, it basically copies the IP protocol and adds an optional checksum Based on the replication, data integrity is guaranteed to some extent. Of course, you can skip it.
The IP protocol perfectly implements the datagram service. No matter what the result is, do your best-although ICMP provides limited feedback to a certain extent. Some people say that the IP protocol is not responsible, but the purpose of the layered model is that each layer only provides a single service, which is also a Unix philosophy. The IP protocol only provides the lowest layer of Packet Exchange communication, which is a communication model completely parallel to the circuit exchange. Sometimes, we really need such a "do not care after sending, do what we can" service. The wonderful solution of TCP/IP is not to directly allow IP addresses to provide this service, instead, a multiplexing of UDP is provided on the IP address. As a result, for the host, the same IP address can carry multiple services that are "no matter what is sent, do what you can to do, the IP address only provides the transmission service.
After Reading Section 3rd, let's go back and talk about what services need to be "post-release, do your best? In reality, we know that pingxin is a business like this. It is worth remembering that pingxin never needs to write again in the 90 s. When I went to college, I want to send at least one letter to my girlfriend in a week, but sometimes the letter arrives within a week, but sometimes the letter is lost, therefore, I have to spend a lot of time and money explaining it to prove that I have actually written a letter, but it has not been delivered. Sometimes, after I have explained it, the letter is inexplicable, a long delay. This is the service that I try my best. To clarify the fact that this letter cannot be lost, I used express delivery, although it did not show any express, however, when I receive a receipt, this is not a service that I don't care about after delivery, but a service with confirmation. Therefore, whether or not it is post-delivery is not necessarily related to communication efficiency, this requires clarification.
Many people think that UDP applications are in a field with high real-time requirements. What else do they say to complete the sequence and re-transmission by themselves? Since we need to preserve the sequence and re-transmission, why not directly use TCP, these people actually know that TCP is inefficient because it needs to process delivery in order and retransmission. If these features are added to UDP, isn't UDP's advantage no longer?
I think these people must have read too many textbooks, and all kinds of textbooks are excerpted from several classic books and a few RFC articles, therefore, "a huge copy of textbooks all over the world" has achieved the effect of public praise. The actual situation is far from that simple. Is TCP really less efficient than UDP? Not necessarily!
We need to know that every communication channel on the Internet that our netizens live in is not a fixed capacity, but can be scaled. However, standard UDP is fixed in length when sending packets, for the sake of simplicity, UDP-based applications all use fixed-length Packet communication, which leads to problems...
In a very simple scenario, each UDP side uses a 512-byte fixed-length Packet communication, which means that the sender sends 512 bytes, the receiver receives 512 bytes, and each 512 bytes encapsulates an IP packet, in addition to port reuse, UDP header encapsulation costs are added. Even if more data can be sent at a time, it has to be sent once every 512 bytes. For TCP applications, after the data reaches the kernel protocol stack, it can be accumulated temporarily in some cases, in this way, the encapsulation cost is reduced, but the Interaction Problem is not caused by a waste of time, because TCP uses a very smart algorithm. when data is found to have to be accumulated, it means that this will not work without accumulation. The complex TCP algorithms will strike a good balance between latency and throughput. For details, see the Panoramic Analysis of TCP protocol problems.
The simplest fact is that the MTU of The Link almost does not affect UDP, but TCP, which directly affects IP fragmentation. This is also an efficiency issue, to the extreme, the packet sent by UDP may be several 1% of the MTU, or several hundred times of the MTU. The former is too inefficient, and the latter transfers the consumption to the IP address.
For TCP, it can use congestion control and traffic control algorithms to intelligently control the transmission rate. For UDP, because there is no validation mechanism, even if the network is congested or the peer machine cannot afford it, sending ends continue to send, which will aggravate the deterioration of the disease. For this deterioration, the UDP sending and receiving ends do not have to bear any responsibility, see 5.1.
Due to the above two problems, it is necessary to make some adjustments to UDP in the user State. However, due to the existence of TCP, whether to adjust or directly use TCP is a problem, the final solution involves both time and money.
Although UDP does not have traffic control or congestion control and does not need to be confirmed, it does not necessarily improve efficiency. As the saying goes, it is necessary to maintain some necessary work, although UDP is much more concise than TCP, the illusion of its high efficiency is inevitable, which means that if a UDP datagram is sent out, you will not receive the confirmation, by default, you have securely reached the peer end. This is like you do not want code errors, so you delete the error prompts in the code.
In fact, in the case of extreme network congestion, UDP packet loss rate is extremely high, because it is caused by no congestion control, because there is also no traffic control, when the two ends do not match the rate, a high packet loss rate will also occur, but TCP does not have this problem because it will adjust itself.
Let's talk about the impact of fairness. TCP is inherently fair, But UDP is not. It is unordered, just like the definition of a true group-based exchange network. Because there is no order, the network conditions will not be fed back to the endpoint at all, which will not only cause high packet loss rate, but also squeeze the bandwidth of TCP traffic.
End this section with an instance, that is, road traffic. Beijing traffic can be seen as UDP, while Shanghai traffic is TCP. Although traffic is congested, you will find that once a car in Beijing is congested, it is almost stuck, although some road sections in Shanghai are still congested than those in Beijing, they will not be stuck, and even if the traffic is slow, they will still move slowly.
For Packet Exchange Network Communication, the cost of the protocol stack is mainly manifested in the following aspects: a. space complexity caused by encapsulation; B. time complexity caused by cache. Encapsulation is undoubtedly directly opposite to caching. If you want to send data immediately, you need to encapsulate the data and send it to the lower layer. This will undoubtedly consume more protocol header space, if you do not want to consume this load, You Can cache the load and issue it at a time when the cache reaches a certain amount. In this way, the "protocol header/load" value will be minimized, which will undoubtedly save space, however, it is a waste of time.
After understanding the above principles, you may think of two types of communication: Short-connection communication and long-connection communication. When considering the communication efficiency, you must consider the impact of such communication continuity. If you only need to send a package and the package can be sent regardless of whether the package has a resend/polling mechanism, in this case, UDP is better. If TCP is used at this time, two packets are required for the optical handshake (data can be carried in the third handshake). The average result is not cost-effective. In this example, DNS query is used. Otherwise, if the connection is persistent, the extra time of the TCP handshake and the handshake will be evenly distributed to the persistent communication. In the persistent communication, the application can obtain additional benefits from the TCP stream, for example, the benefits of accumulated sending and Nagel algorithms. In most cases, validation is not an overhead, because many TCP algorithms use validation with validation or delayed validation, therefore, in most cases, the packet validation will not affect the sending speed and occupy the bandwidth.
Do you really want the data to be discarded after it is sent? If not, use TCP instead of confirming and connecting by yourself. Of course, in the case of short connections, you must carefully weigh the overhead of TCP handshake and the overhead of validation implemented by yourself.
Taking DNS as an example, this is obviously not an application that does not matter after sending, but the DNS Client can send another query after a certain period of time without receiving a response, which does not affect the final result, this is only an infrastructure-type single-point query task. It is not the final task of the user in front of the computer. Therefore, you can use UDP, but it is different for HTTP. HTTP is essentially similar to a content transmission, then, the browser parses the content and displays it. This imposes strict requirements on the format and does not know the result size in advance. The result format is not fixed, therefore, a slight error may affect browser parsing. One HTTP Communication is not one-to-one application communication, but involves many other applications, such as CGI on the server side and script on the client side. Therefore, precise transmission is absolutely required, and UDP cannot be used at this time, TCP is used even for short connections.
We know that TCP is a connected communication protocol. before actual transmission, you must establish a two-way connection with the communication destination and only establish a connection with the unique destination, if we want to transmit data to multiple destinations, we need to establish multiple such connections. It is not easy to implement multi-point communication over TCP, this is determined by the handshake protocol and the wave protocol of TCP.
For UDP, because there is no connection, it is good to implement multi-point communication. For UDP with connection, you can use technologies such as DNAT or Server Load balancer to achieve multi-point communication. Because the UDP protocol does not need to establish a connection, you can send data to a multicast address or repeatedly send the same data to multiple destinations.
UDP is based on datagram. That is to say, every UDP packet has a boundary, which is the biggest difference from stream communication. For TCP, the boundary is defined completely according to the data itself, for UDP, the border is defined completely according to the actual content of each communication between the receiving and receiving sides.
We will encounter TCP functions using UDP in two well-known open source codes. They are OpenSSL and openvpn. For OpenSSL, dtls exists completely to provide security protection for UDP-based applications. Since it is called Transport Layer Security, it is necessary to accommodate the entire transport layer (in fact, the SSL protocol has nothing to do with the TCP/IP transport layer in a layered sense. For openvpn, It is the historical reason for self-confirmation and retransmission. At that time, no dtls can be used for reference, this is helpless. Therefore, we should not take these two events as a reference. When learning one thing, we must consider its historical background and read the history wisely. This is the case.
Therefore, it is wise to use UDP only when you confirm the following facts:
Start building with 50+ products and up to 12 months usage for Elastic Compute Service