"Heavy" mobile network performance Disclosure (next)--network protocol and Performance Improvement practice

Last Update:2014-11-10 Source: Internet

Author: User

Tags ack failover dns entry

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Performance of network protocols

Now it's our turn to control what we actually have.

The performance of network processing is not proportional to the increase in delay time. This is because the intrinsic operation of most network protocols is bidirectional information exchange. The remainder of this chapter focuses on understanding why these exchanges of information are generated and how to reduce or even eliminate the frequency of their exchange.

Figure 3: Network protocol

Tcp

Transmission Control Protocol (TCP) is a connection-oriented, IP-based transport protocol. The error-free duplex communication channel under the influence of TCP is essential for other protocols, such as HTTP or TLS.

TCP shows many of the two-way communications we need to avoid. Some of these can be replaced by using extended protocols such as the TCP Fast open protocol, while others can be minimized by adjusting system parameters, such as initializing congestion windows. In this section, we will explore both approaches and also provide some background for the TCP internal components .

TCP Fast Open

Initializing a TCP connection convention requires 3 times of information exchange, which is what we call the 3-time handshake. TCP Fast Open (TFO) is an extension of TCP that eliminates round-trip delays during the usual handshake.

TCP The three handshake negotiation parameters on the client and the server make it possible for both sides to make robust two-way communication. The initial SYN information (synchronous information) represents the connection request of the client, and if the server accepts the request, it returns a Syn-ack message (synchronizing and receiving the message); Finally, the client sends an ACK message to answer the servers. At this point, a logical connection is established and the client can send the data. One of these, if you notice, has introduced at least one RTT delay during the 3 handshake. That would be nice.

Figure 4:TCP3 Second handshake

From a traditional point of view, there is no other way to avoid the delay caused by the TCP3 handshake than by recycling the connection. However, this idea occurs as the introduction of the TCP Fast Open IETF specification changes.

TFO allows the client to start sending data before the logical connection is established. This actually negates the round-trip delay in the 3-time handshake. The cumulative effect of this optimization is impressive. According to Google's survey, TFO can reduce the load time of page 40%. Although this specification is only a draft, TFO has been supported by the mainstream browser (above CHROME22) and the platform (Linux3.6 above), and other vendors have pledged to fully support it in the near future.

TCP Fast Open is a correction to the 3-time handshake protocol, which allows for a small amount of data load (such as HTTP requests) within the synchronous message (SYN messages). This valid responsibility is passed to the application server, otherwise the connection handshake is complete .

Earlier expansion scenarios like TFO eventually failed due to security issues. TFO solves this problem by using a security token or cookie, which means that the client is divided into a security token (Tooken) during the traditional TCP connection handshake and expects to include the security token in the SYN message of the TFO optimization request.

For the use of TFO, here are some small warnings. One of the most notable of these is the lack of idempotent guarantees for data requested in the Initialized SYN message. Although TCP guarantees that repeated packets (which recur frequently) are ignored by the recipient, this guarantee does not apply to the handshake process of the connection. This solution is currently being standardized in the draft specification, but at the same time TFO can still be safely applied to idempotent processing.

Initial congestion window

The initial congestion window is a configurable item of TCP and has a huge potential to accelerate small network transactions.

The recent IETF specification facilitates the setup of the usual initial congestion window to grow to 3 message segments (such as packets) to 10 segments. The proposal is based on a wide range of research conducted by Google, which proves that the setting of this parameter has an average 10% improvement in performance. However, if the TCP congestion window (CWnd) is not introduced, the purpose and potential impact of this setting will not be truly understood.

When operating on an unreliable network, TCP guarantees client and server reliability. This is equivalent to a promise that all data sent out will be received, or at least appear to be. Among them, packet loss is the biggest obstacle to meet the reliability requirements, which requires detection, correction and prevention.

TCP a positive response mechanism is used to detect packet loss, i.e. each packet sent out should be answered by its intended receiver, which means that the packet is lost during transmission if no response is received. In the process of waiting for confirmation, the transmission packet is guaranteed to exist in a special buffer, that is, the congestion window. When this buffer is filled, an event called CWnd depletion occurs, and all transmissions stop until the receiver responds and frees up a valid space to send more packets. These events are critical in TCP performance.

In addition to network bandwidth limitations, TCP throughput is essentially limited by the frequency of CWnd depletion events, which may be related to the size of the congested window. When TCP performance peaks, a congestion flush is needed to adjust the current network state: The congestion window will increase the risk of network congestion-excessive congestion in the network will increase the loss of a large number of packets, too small, the precious network bandwidth can not be fully exploited. Logically speaking, the more you know about the network, the more you can choose the optimal congestion window size. The reality is that critical network attributes such as capacity and latency are hard to measure and constantly changing. And, if an Internet-based TCP connection needs to traverse many networks, this can be a more complicated thing.

Due to the lack of means to accurately determine the size of the network capacity, instead TCP infers congestion window size through network congestion. When TCP discovers that a packet is missing, it expands the size of the congested window, suggesting that there is a network that cannot handle the current transfer rate somewhere down the line. By adopting this congestion avoidance mechanism, TCP eventually minimizes the CWnd depletion event to some extent it consumes the allocated capacity for all connections. Now, in the end, we have achieved the goal, explaining the importance of the initial congestion window parameters.

Network congestion can only be detected by packet drop testing. A new or idle connection because there is not enough packet loss data to justify creating the optimal size of the congestion window; TCP takes a sensible approach to the fact that the size of the congestion window starts with a possible minimum of network congestion This initially means that you need to set up 1 shards (about 1480 bytes), and sometimes this is recommended. And later experiments will demonstrate that a setting of up to 4 is also valid. In practice you also usually find that the initial congestion window is set to 3 message segments (approximately 4KB).

The initial congestion window is not conducive to small network transaction processing. This effect is easy to explain. In the table of 3 message segment settings, after sending 3 packets or 4k of data, CWnd depletion time occurs. Assuming that the packet is sent continuously, the response will not arrive before any allowed round trip time (RTT), and if the RTT is 100ms, then the effective transfer rate is only a poor 400 bytes/second. Although TCP adjusts its own congestion window to take full advantage of effective capacity, it will be slow at first. In fact, this approach is called slow start.

To reduce the performance impact of slow startup on smaller downloads, it needs to reassess the risk return of the initial congestion window. That's what Google did, and it found that setting the initial congestion window at 10 message segments (about 14KB) would maximize throughput in the smallest network congestion scenario. The real world also proves that this setting can reduce the load time of page 10% in total, and the round-trip delay of the connection will be improved greatly.

Modifying the default value of the initial congestion window is not that simple either. Under most server operating systems, a system-level configuration can only be set by a privileged user, and this parameter is rarely or even cannot be configured by an application that does not have permissions on the client. It is important to note that a larger initial congestion window can speed up the download on the server side, while the client can speed up the upload. Failure to change this setting on the client side means that special effort should be made to reduce the size of the request payload.

Hypertext Transfer Protocol

This section discusses techniques for reducing high round-trip latency in Hypertext Transfer Protocol (HTTP) performance.

KeepAlive

KeepAlive is an HTTP convention that allows continuous HTTP requests to be synchronized to use the same TCP connection. At least one 3-time handshake required for a single-group round-trip request can be avoided, saving dozens of or hundreds of milliseconds per request. Deeper, KeepAlive has an additional but not mentioned benefit that it retains the current TCP congestion window size between requests, which results in fewer CWnd depletion events.

Figure 5:http Pipelining

In fact, a pipeline distributes network latency across HTTP transactions in a network round-trip. For example, a 5-pipeline HTTP request with a 100-millisecond RTT will produce an average round trip delay of 20 milliseconds, and in the same condition, the average delay of 10 pipeline requests will be reduced to 10 milliseconds.

However, the obvious drawbacks of HTTP pipeling prevent it from being widely used, which is historically uneven effects of HTTP proxy support and denial-of-service attacks.

Secure Transport Layer Protocol

Transport Layer Security (Transport layer SECURITY,TLS) is a session-oriented network protocol that allows sensitive information to be exchanged securely on public networks . Although TLS is effective in secure communications, it degrades performance under high latency networks.

TLS the use of a complex handshake protocol includes two Exchange client-server information. This is why a TLS-secured HTTP transmission is noticeably slower. Often, discovering a TLS slow is actually complaining about the delay in the multiple round-tripping of its handshake protocol.

Figure 6:dns Query

Typically, the main platform provides a cache implementation to avoid frequent DNS queries. The semantics of the DNS cache are simple: Each DNS response contains a time-to-Live (TIME-TO-LIVE,TTL) attribute to declare how long the result will be cached. The TTL range is usually between a few seconds and a few days, but usually a few minutes. Very low TTL values, usually less than one minute, are used to affect load distribution or reduce the time of server replacement or ISP failover.

Refresh failed

Highly available systems often rely on redundant infrastructure hosts in their IP rooms. A DNS entry with a low TTL value can reduce the amount of time a customer points to a failed host, but it also results in a large number of additional DNS queries. So the value of the TTL should be a tradeoff between reducing downtime and maximizing client performance.

It is not always meaningful to degrade client performance, but it is an exception when a server fails. There is an easy way to solve this problem, that is, not to strictly observe the TTL, but to flush the DNS cache only when higher-level protocols such as TCP or HTTP detect unrecoverable errors. This technique simulates TTL in most scenarios to keep the DNS cache consistent, but this virtually eliminates any performance loss based on the DNS high availability solution.

However, it is important to note that this technical solution is incompatible with other DNS-based distributed load scenarios.

Asynchronous flush

Asynchronous refresh is a DNS caching method that adheres to TTL rules that have been set but largely eliminates the latency of querying DNS frequently. In this technology, an asynchronous DNS client library, such as C-ares, is required to implement.

This method is simple, an expired request still returns an old result, but there is a non-blocking DNS query in the background to periodically flush the DNS cache. This scenario, which uses a blocking (such as synchronous) callback to query each old DNS data, is almost unaffected by DNS query latency, but is still compatible with many DNS-based failover scenarios and distributed load scenarios.

Summarize

Reducing the impact of high latency on mobile networks is achieved by reducing the number of network round trips that dramatically increase the latency of the mobile network . Using software optimization to focus on minimizing or eliminating round-trip protocol messages is key to overcoming this daunting performance issue.

(Mobile network Performance chapter, complete the full text.) ）

1. This article is translated by the programmer architecture

2. This document is translated from the performance of Open Source software | Secrets of Mobile Network performance

3. Reprint Please be sure to indicate this article from : Programmer Architecture (No.:archleaner )

4. More articles please scan the code:

"Heavy" mobile network performance Disclosure (bottom)-Network protocols and performance improvement practices

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More