Performance of network protocols
Now it's our turn to actually control things.
The performance of network processing is not proportional to the addition of delay time.
This is because the intrinsic operation of most network protocols is bidirectional information exchange.
The remainder of this chapter focuses on understanding why these exchanges of information are generated and how to reduce or even eliminate the frequency of their exchange.
Figure 3: Network protocol
Tcp
Transmission Control Protocol (TCP) is a connection-oriented, IP-based transport protocol. The error-free duplex communication channel under the influence of TCP is indispensable for other protocols, such as HTTP or TLS.
TCP shows a lot of two-way communication that we need to avoid as much as possible. Some of these can be replaced by the use of extended protocols such as the TCP Fast open protocol, while others can be minimized by adjusting the system parameters. For example, initialize the congestion form. In this section, we will explore both methods to provide the background of some TCP internal components at the same time .
TCP Fast Open
Initializing a TCP connection convention requires 3 times of information exchange. That's what we call a 3-time handshake. TCP Fast Open (TFO) is an extension of TCP that eliminates round-trip delays during the usual handshake.
TCP The three handshake negotiation operations on the client and the server make it possible for both parties to make robust two-way communication. The most-Started SYN information (synchronization information) represents the connection request of the client. Assuming that the server accepts the request, it returns a Syn-ack message (synchronizing and receiving the message); The client sends an ACK message to answer the server.
At this point, a logical connection is established and the client is able to send the data.
You're supposed to notice this. A delay of at least one RTT was introduced during the 3-time handshake that would be great.
Figure 4:TCP3 Second handshake
From a traditional point of view. There is no other way to avoid the delay caused by a TCP3 handshake other than to recycle the connection.
However, such ideas occur as the introduction of the TCP Fast Open IETF specification changes.
TFO agree that the client sends data before the logical connection is established. This actually negates the round-trip delay in the 3-time handshake.
The cumulative effect of this optimization is impressive.
According to Google's investigation. TFO can reduce the loading time of page 40%. Although this specification is only a draft, TFO has been supported by the mainstream browser (above CHROME22) and the platform (Linux3.6 above), and other vendors have pledged to fully support it in the near future.
TCP Fast Open is a correction to the 3 handshake protocol, which agrees to have a small amount of data payload (such as HTTP requests) within the synchronous message (SYN messages). This valid responsibility is passed to the application server, otherwise the connection handshake is complete .
Earlier expansion scenarios like TFO finally failed because of security issues.
TFO uses security tokens or cookies to solve the problem. This means that the client is divided into security tokens (Tooken) during the traditional TCP connection handshake, and the security token is expected to be included in the SYN message of the TFO optimization request.
For the use of TFO. Here are some small warnings. The most notable of these is the lack of idempotent guarantees for data requested in the Initialized SYN message. Although TCP guarantees that repeated packets (which often occur repeatedly) are ignored by the recipient, this guarantee does not apply to the handshake process of the connection. The solution is now being standardized in the draft specification, but at the same time TFO can still be safely applied to idempotent processing.
Initial congestion Form
The initial congestion form is a configurable item of TCP and has a huge potential to accelerate small network transactions.
The recent IETF specification facilitates the setup of the usual initial congestion form to grow to 3 message segments (such as packets) to 10 segments. The proposal is based on a wide range of research conducted by Google, which proves that the setting of this parameter has an average 10% improvement in performance. But suppose the congestion form (CWnd) of TCP is not described. The purpose and potential impact of such a setup will not be truly understood.
When operating on an unreliable network, TCP guarantees client and server reliability.
This is equivalent to a promise that all data sent out will be received, or at least appear to be.
, packet loss is the biggest obstacle to meeting reliability requirements, which requires detection, correction, and prevention.
TCP a positive response mechanism is used to detect packet loss, i.e. every packet sent out should be answered by its intended receiver, assuming that no answer means that the packet is lost during transmission.
In the process of waiting for confirmation, the data transmission packet is guaranteed to exist in a special buffer, that is, the congestion form. When this buffer is stuffed. An event called CWnd exhaustion occurs, and all transmissions stop until the receiver answers and frees up a valid space to send many other packets. These events are critical in TCP performance.
In addition to network bandwidth limitations, TCP throughput is essentially limited by the frequency of CWnd depletion events, which may be related to the size of the congested form. When TCP performance peaks, a congestion flush is required to adjust the current network state: Congested forms are too big to add the risk of network congestion--excessive congestion in the network condition adds a lot of packet loss. Too small, the precious network bandwidth will not be fully exploited. Logically, the more you know about the network situation, the more you can choose the best congestion form size.
The reality is. The key network attributes, such as capacity and latency, are very difficult to measure and constantly changing.
Also, it would be more complicated to assume that an Internet-based TCP connection would need to traverse many networks.
Because of the lack of means to accurately determine the size of the network capacity, instead of TCP through network congestion to determine the size of the congested form.
When TCP discovers that a packet is missing, it expands the size of the congested form, suggesting that a network somewhere down the line cannot handle the current transfer rate. By adopting such congestion avoidance mechanisms, TCP finally minimizes the CWnd depletion event to some extent it consumes the allocated capacity for all connections. So now, finally, we have achieved the goal, explaining the importance of the initial congestion form parameters.
Network congestion can only be measured by dropping packets. A new or spare connection because there is not enough packet loss data to justify creating the optimal size of the congested form; TCP takes a sensible approach to starting with the size of a congested form with a possible minimum of network congestion. This initially means that you need to set up 1 shards (about 1480 bytes), and sometimes this is recommended.
And later experiments will demonstrate that a setting of up to 4 is also valid. In practice you also usually find that the initial congestion form is set to 3 message segments (approximately 4KB).
The initial congestion form is not conducive to small network transaction processing.
This effect is very easy to explain. In the table of 3 message segment settings, after sending 3 packets or 4k of data, CWnd depletion time occurs. If the packet is sent continuously. The response response will not arrive before any agreed round trip time (RTT). If the RTT is 100ms, then the effective transfer rate is only a poor 400 bytes/second. Although TCP adjusts its own congestion form to take full advantage of the effective capacity, it will be very slow at the start.
In fact. Such a way is called slow start.
To reduce the performance impact of slow start on smaller downloads, it needs to evaluate the risk return of the initial congestion form again.
That's exactly what Google did, and it found that setting the initial congestion form to 10 segments (about 14KB) would achieve maximum throughput in the smallest network congestion scenario.
The real world also proves that this setting can reduce the loading time of page 10% in total, and the round-trip delay of the connection will be improved greatly.
It is also not easy to change the default value of the initial congestion form.
Under most server operating systems, a system-level configuration can only be set by a privileged user. This number of references is also very small and cannot even be used by the client configuration without permission. It is important to note that a larger initial congestion form can speed up the download on the server side, while the client can speed up the upload. Assuming the inability to change this setting on the client means that special effort should be made to reduce the size of the request payload.
Hypertext Transfer Protocol
This section discusses techniques for reducing high round-trip latency in Hypertext Transfer Protocol (HTTP) performance.
KeepAlive
KeepAlive is an HTTP convention that agrees to synchronize successive HTTP requests to use the same TCP connection.
At least one 3-time handshake required by a single-group round-trip request can be avoided, saving dozens of or hundreds of milliseconds per request. Deeper, keepalive another additional but not mentioned advantage is that it retains the current TCP congestion form size across requests. This will result in fewer CWnd depletion events occurring.
Figure 5:http Pipelining
In fact, a pipeline distributes network latency across HTTP transactions in a network round-trip. For example, an average of 20 milliseconds of round-trip latency is generated when 5 pipeline HTTP requests are connected through a 100-millisecond RTT. Under the same conditions, the average delay of 10 pipeline requests is reduced to 10 milliseconds.
However, HTTP pipeling has obvious drawbacks that prevent it from being widely used, which is the impact of historically misaligned HTTP proxy support and denial-of-service attacks.
Secure Transport Layer Protocol
Transport Layer Security (Transport layer SECURITY,TLS) is a session-oriented network protocol that agrees to securely exchange sensitive information on public networks .
Although TLS is effective in secure communications, its performance degrades under high latency networks.
TLS the use of a complex handshake protocol consists of two Exchange client-service-side information. This is why a TLS-secured HTTP transmission is noticeably slower. Often, discovering a TLS slow is actually complaining about the delay in the multiple round-tripping of its handshake protocol.
Figure 6:dns Query
Usually. The main platform provides a cache implementation to avoid frequent DNS queries. The semantics of the DNS cache are easy: Each DNS response includes a time-to-Live (TIME-TO-LIVE,TTL) attribute to declare how long the result will be cached. The TTL range is usually between a few seconds and a few days, but usually a few minutes. A low TTL value. Usually under one minute. is used to affect load distribution or to reduce the time of server replacement or ISP failover.
Refresh failed
Highly available systems often rely on redundant infrastructure hosts in their IP rooms. A DNS entry with a low TTL value can reduce the time the client points to the failed host, but at the same time it can cause a large number of additional DNS queries.
So the value of TTL should be a tradeoff between reducing downtime and maximizing client performance.
Generally, there is no point in reducing client performance, but when server failure is an exception. There is an easy way to solve the problem, that is, not strictly observe the TTL. Instead, the DNS cache is flushed only when higher-level protocols such as TCP or HTTP are detected to unrecoverable errors. Such techniques simulate TTL in most scenarios to maintain consistent DNS cache behavior. However, this almost eliminates the performance penalty in the DNS high-availability solution.
However, it is important to note that this technical solution is incompatible with other DNS-based distributed load scenarios.
Asynchronous flush
Asynchronous refresh is a DNS caching method that adheres to the TTL rules that have been set but largely eliminates the delay in querying DNS frequently. In this technique, an asynchronous dnsclient library such as C-ares is required.
This method is very easy, and an expired request still returns an old result, but there is a non-clogging DNS query in the background to periodically flush the DNS cache. This scenario assumes that a plug-in (such as synchronous) callback is used to query each old DNS data, so this approach is almost unaffected by DNS query latency, but is still compatible with a very many DNS-based failover scenarios and distributed load scenarios.
Summarize
To reduce the impact of high latency on mobile networks, you need to reduce the number of network round trips that dramatically add to the latency of the mobile network . Using software optimization to focus on minimizing or eliminating round-trip protocol messages is key to overcoming this daunting performance issue.
(Mobile network performance chapter.) Complete the full text. )
1. This paper is translated by the program Ape architecture
2. This document is translated from the performance of Open Source software | Secrets of Mobile Network performance
3. Reprint Please be sure to indicate this article from : Program Ape Architecture (No.:archleaner )
4. Many other articles please scan the code:
"Heavy" mobile network performance Disclosure (next)--network protocol and Performance Improvement practice