Reproduced TCP bottleneck analysis for HTTP requests

Source: Internet
Author: User
Tags ack rfc


See the effects of TCP on HTTP for three handshake, flow control (receive window), slow start (CWnd, congestion window), team header blocking , and more

This article is basically the "Web Performance Authoritative Guide" chapter I and chapter II of the reading notes, plus some extension content, this book really praise, recommended

High bandwidth and low latency

Two things that have a decisive impact on all network traffic: Latency and bandwidth

    • The time it takes for a delay group to send a message from the source to the destination.
    • Maximum throughput of a bandwidth logical or physical communication path
Factors of delay
    • The time required to propagate deferred messages from the sending end to the receiving end (no more than the speed of light)
    • Transmission delay (bandwidth/window) the time required to transfer all the bits in the message to the link, which is the function of the message length and link rate (10m/s and 1m/s lines, time is different)
    • processing delay processing packet headers, checking for bit errors, and determining the time required for grouping targets (router subcontracting)
    • Queued delay The arrival of the packet queued for processing time
Speed delay

Assuming that the light travels through the fiber at a speed of about 200 000 000 meters per second, the corresponding refractive index is about 1.5, and a roundtrip (RTT) from New York to Sydney takes MS, and the distance of the packet travel is much longer. Each hop in this line involves pathfinding, processing, queuing, and transmission delays. As a result, the actual RTT from New York to Sydney is about 200~300 Ms.

Chinese and American backbone network one-way data delay ≈60ms, so China user access to the United States host data transmission delay theoretical value of more than 120ms (RTT)

Bandwidth delay

The backbone or fiber connection of the core data path, capable of processing hundreds of of bits of information per second, such as submarine fiber between China and the US. The optical fiber is a "light conduit" that transmits light signals. Wire is used for transmitting electrical signals, but the signal loss, electromagnetic interference is large, and maintenance costs are also high.

Using Wavelength Division multiplexing (wdm,wavelength-division Multiplexing) technology, the fiber can simultaneously transmit many different wavelengths (channels) of light, at the beginning of 2010, researchers have been able to coupling more than 400 wavelengths of light in each channel, Maximum capacity up to 171gbit/s, while the total bandwidth of a fiber can reach 70tbit/s

Last kilometer delay-tracerouter

Backbone lines can have terabytes of bandwidth, but the capacity of the network edge is much smaller, and much depends on deployment techniques such as dialing connections, DSLs, cables, various wireless technologies, and fiber-to-household. Akamai publishes a global bandwidth report every quarter

With the Tracerouter tool, you can view the routing topology, and the last kilometer delay has a significant relationship with the provider, deployment method, network topology, and even the time of day. As an end user, if you want to improve your internet speed, it is critical that you choose the shortest latency ISP.

Traceroute sends out three packets per TTL increment. Each column corresponds to the
Time is took to get one packet back (round-trip-time).

Traceroute to (,Hops Max,BYTE packets
1198.11.175.248 ( MS0.898 MS0.950 ms ( ms10.107.64.22 ( MS10.107.64.18 ( ms
3198.11.128.162 ( MS198.11.128.154 ( ms198.11.128.178 ( ms
4218.30.53.93 ( ms218.30.53.97 ( ms218.30.53.126 ( ms
5202.97.51.253 ( ms160.Ms160.077 ms
6202.97.35.105 ( ms190.518 ms188.903 ms
7202.97.33.37 ( MS168.109 MS168.016 MS
8202.97.55.14 ( MS192.572 ms192.591 ms
9220.191.135.106 ( MS201.542 ms201.465 ms
10115.236.101.209 ( Ms 211. 305 MS * ( . 211 Ms ( 163. 768 Ms 163. Ms ( 191. 543 Ms ( 248. 825 Ms 248. 910 MS

High bandwidth
There is no reason to think that bandwidth will stop growing, and even if technology stagnates, more cables can be laid.

Low Latency

Reducing latency is much more difficult than bandwidth, and in many ways our infrastructure seems to have reached that limit. It seems to be particularly important to understand and tune network protocols.

TCP Three-time handshake

The client can send the data immediately after the ACK packet is sent, and the server must wait until the ACK packet is received before the data is sent. This process of initiating communication is applicable to all TCP connections, so there is a significant performance impact on all applications that use TCP, which must undergo a full round trip before each transfer of the application data

A minimum of 120 rtt between China and America, assuming you initiate a simple HTTP request that requires a handshake + one data transfer = 240ms, wasting 50% of the time, which means that the key to improving TCP application performance is to try to reuse the connection

Extension: TFO (TCP fast OPEN,TCP fast Open) mechanism dedicated to reducing the performance penalty for new TCP connections

Flow Control (Window Rwnd)

Rwnd is an end-to-end control mechanism that prevents the sending of too much data, and each side of the TCP connection advertises its own receive window, which contains the buffer space size information that can hold the data. The entire lifecycle of a TCP connection: Each ACK packet carries the corresponding up-to-date Rwnd value so that both ends can dynamically adjust the data flow rate to accommodate the capacity and processing power of the sending and receiving ends

The value of the window is only 16 bits, or 65535, so the previous Rwnd maximum value cannot exceed 64K. Now basically has the "TCP Window Scaling" (TCP windows Scaling), the size of the receive window from 65 535 bytes to 1G bytes, in Linux, you can check and enable the Window Zoom option with the following command:

$> sysctl net. IPv4. tcp_window_scaling
$> sysctl-w net. IPv4. tcp_window_scaling=1

Settings for Rwnd

If we consider the transmission performance, of course, the greater the value of the better, Linux in the configuration kernel parameters to receive buffer size, which can control the size of the receiving window:

Shell> Sysctl-a | grep mem

There is also a problem, when a large number of requests arrive at the same time, memory will not burst out? Generally not, because Linux itself has a mechanism to automatically tune the buffer size, the actual size of the window automatically floats between the minimum and maximum values, in order to find a balance between performance and resources.

The state of the buffer size auto-tuning mechanism can be confirmed by the following methods (0: Off,1: ON):
Shell> sysctl-a | grep tcp_moderate_rcvbuf
Slow start (CWnd, congestion window)

Traffic control on both ends does prevent the sending end from sending too much data to the receiving end, but there is no mechanism to prevent any end from sending too much data to the potential network. In other words, the sender and the receiver do not know how much bandwidth is available at the beginning of the connection, and therefore need an estimation mechanism to dynamically change the speed based on changing conditions in the network: Maximum data TCP can transmit = MIN (Rand,cwnd)

The algorithm for slow start is as follows (CWnd full name congestion Window):

    • 1) The connection is set up start by first initializing CWnd = 1, indicating that a MSS-sized data can be passed.
    • 2) Whenever a ack,cwnd++ is received; Linearly rising
    • 3) whenever a rtt,cwnd = cwnd*2; To raise the index
    • 4) There is also a ssthresh (slow start threshold), which is an upper limit, and when CWnd >= Ssthresh, it enters the "congestion Avoidance algorithm" (the algorithm is later said)

Initially, CWnd has a value of only 1 TCP segment. In April 99, RFC 2581 increased it to 4 TCP segment. In April 2013, RFC 6928 again increased it to 10 TCP segment

Slow boot process

The server sends 4 TCP segment to the client, and then it must stop to wait for confirmation. Thereafter, each time an ACK is received, the slow-start algorithm tells the server that it can add 1 TCP segment to its CWnd window. This phase is often referred to as the exponential growth phase, as both the client and the server are rapidly moving closer to the effective bandwidth of the network path between the two

Calculate the time formula required to reach the specified window:

    • The client and server receive Windows are 65 535 kilobytes (KB);
    • Initial Congestion window: 4 Segment (a segment is generally 1460B);
    • Round-trip time is $ MS (London to New York).

To reach the 64KB limit, we will need to increase the congestion window to 45 segment, which will take 224 milliseconds.

Impact of Slow start

Each TCP connection must undergo a slow-boot phase, regardless of the bandwidth. In other words, it is not possible to fully exploit the maximum bandwidth of a connection.

Slow start causes a hundreds of MS between client and server to reach near maximum speed, and the impact on large streaming download services is not significant because the slow start time can be amortized over the entire transmission cycle.

For many HTTP connections, especially for ephemeral, abrupt connections, there are often cases where the maximum window request has not been reached. In other words, the performance of many WEB applications is often constrained by the round-trip time between the server and the client. Because slow boot limits the available throughput, this is very bad for small file transfers.

Slow start a calculation of the HTTP impact

Assuming that a 20K file is transmitted over HTTP, the initial conditions are:

    • Round trip Time:.
    • Client-to-server bandwidth: 5 Mbit/s.
    • Client and server receive Windows: 65 535 bytes.
    • Initial Congestion window: 4 segment (4x1460 bytes ≈5.7 KB).
    • Processing time for server-generated responses: Ms.
    • No grouping is lost, each grouping is confirmed, and GET requests account for only 1 segments.
    • 0 MS: The client sends a SYN packet to start the TCP handshake.
    • MS: The server responds with Syn-ack and specifies its rwnd size.
    • MS: Client confirms Syn-ack, specifies its rwnd size, and immediately sends an HTTP GET request.
      8 MS: The server receives an HTTP request.
    • 124 MS: The server generates a response of 4 KB and sends a TCP segment (initial CWnd size of 4).
      Then wait for an ACK.
    • MS: The client receives 4 segments and sends an ACK acknowledgment separately.
    • MS: The server increments CWnd for each ACK and then sends 8 TCP segments.
    • 208 MS: The client receives 8 segments and sends an ACK acknowledgment separately.
    • 236 MS: The server increments the CWnd for each ACK and then sends the remaining TCP segments.
    • MS: The client receives the remaining TCP segments and sends an ACK acknowledgment separately.

As a comparison, reuse the connection and send a request again

    • 0 MS: The client sends an HTTP request.
    • MS: The server receives an HTTP request.
    • MS: The server generates a KB response, but CWnd is already larger than the 15 segment required to send the file because
      This sends all data segments at once.
    • Full MS: The client receives all 15 segments, sending an ACK acknowledgment respectively.

The same connection, the same request, but no three handshake and slow start, only spent in the range of MS, the performance increase of up to 275%!

Appropriate values for congested windows

Google has done a lot of research in this area, weighing the efficiency and stability, the final recommendation is 10MSS. If your Linux version is not too old, then you can adjust the "cwnd" initial value by the following method:


Need to be reminded that one-sided lifting of the size of the "cwnd" is not necessarily effective, because we said that the actual transmission of the network of unconfirmed data size depends on the "rwnd" and "cwnd" in the small value, so once the receiver's "rwnd" is relatively small, it will hinder the play of "cwnd".

Congestion prevention

Congestion prevention algorithm drops packets as a sign of network congestion, that is, a connection or router in the path has been congested, so that the deletion of packets must be taken. Therefore, the window must be resized to avoid causing more packet loss, thus ensuring that the network is unblocked.

After resetting the congestion window, the congestion prevention mechanism increases the window to avoid dropping packets as much as possible by its own algorithm. At some point, there may be a loss of the package, and the process starts from scratch. If you see a throughput tracking curve for a TCP connection and find that the curve is jagged, it's time to understand why . This is the congestion control and prevention algorithm that adjusts the congestion window, thus eliminating the problem of packet loss in the network.

Initially, TCP uses AIMD (multiplicative decrease and Additive increase, the multiplier boosts) algorithm, that is, when a packet loss occurs, the Congestion window is halved first, then each round trip and then slowly add a fixed value to the window. However, many times the AIMD algorithm is too conservative, so there are a lot of new algorithms, such as Dsack: can let the protocol know what is the reason for packet loss, retransmission or loss

Bandwidth Delay Product

The maximum amount of unacknowledged data in transit between the sending and receiving ends, depending on the minimum value of the Congestion window (CWnd) and the Receive window (Rwnd). The receive window is sent with each ACK, while the congestion window is dynamically adjusted by the sending side based on the congestion control and prevention algorithm.

BDP (bandwidth-delay product, bandwidth delay product)
The product of the data link's capacity and its end-to-end latency. The result is the maximum amount of data that is in the unacknowledged state at any given moment in transit. Whether the data sent by the sender or the received data exceeds the maximum amount of unacknowledged data, it must be stopped waiting for the other party to ACK to confirm certain groupings to continue

So what is the value of the Flow Control window (Rwnd) and Congestion Control window (CWnd)? In fact, the computational process is simple. First, assume that the minimum value for CWnd and Rwnd is at-KB and the round-trip time is:

Regardless of the actual bandwidth of the sender and receiver, the data transfer rate for this TCP connection will not exceed 1.31 Mbit/s! To increase throughput, either increase the minimum window value or reduce the round trip time. The window needs at least 122.1 KB to make full use of the ten Mbit/s bandwidth! If there is no window scaling (RFC 1323), the TCP receive window is up to a maximum of

Delay caused by the first block of the team

Each TCP packet is emitted with a unique sequence number, and all groupings must be delivered sequentially to the receiving end. If there is a packet in the middle that does not reach the receiving end, then the subsequent packet must be stored on the receiving side of the TCP buffer, waiting for the lost packet to be re-sent and reaching the receiving end. All of this happens at the TCP layer, where the application knows nothing about TCP re-sends and queued packets in the buffer, and must wait for the packet to reach all to access the data. Until then, the application can only feel deferred delivery when reading data through sockets. This effect is known as the TCP team head (Hol,head of line) blocking

The delay caused by the first block of the team allows our application to keep the code simple without worrying about grouping rearrangement and reassembly. However, the simplicity of the code also comes at a price, which is that there is an unpredictable delay in the packet arrival time. This time change is often called jitter

In fact, in some scenarios, packet loss is the key to achieving the best performance for TCP. Some applications can tolerate the loss of a certain number of packages, such as voice and game state communication, without the need for reliable transmission or sequential delivery

Recommendations for optimization of TCP

The specifics of each algorithm and feedback mechanism may continue to evolve, but the core principles and their impact are constant:

    • The TCP three handshake adds an entire round trip time;
    • TCP slow start will be applied to each new connection;
    • TCP traffic and congestion control can affect the throughput of all connections;
    • The throughput of TCP is controlled by the current congestion window size.

As a result, the data transfer speed of TCP connection in modern high-speed network is often limited by the round-trip time between receiver and sender. In addition, despite the increasing bandwidth, the delay is still limited to the speed of light and is limited to a very small constant factor of its maximum value. In most cases, TCP bottlenecks are latency, not bandwidth

Server Configuration Tuning
    • Increase TCP's initial congestion window
      Increasing the initial congestion window allows TCP to transmit more data on the first round trip, and subsequent speed increases can be noticeable. This is also a particularly critical optimization for transient connections of a sudden nature.

    • Slow start restart
      Disabling slow startup When the connection is idle can improve the performance of long TCP connections that send data instantaneously.

    • Window scaling (RFC 1323) enables window scaling to increase the maximum receive window size, allowing high-latency connections to achieve better throughput.

    • TCP Quick Open
      Under certain conditions, the application data is allowed to be sent in the first SYN group. TFO (TCP fast open,tcp quick Open) is a new optimization option that requires both client and server support. To do this, start by figuring out whether your application can take advantage of this feature.

The above settings, coupled with the latest kernels, ensure optimal performance: Each TCP connection has a low latency and high throughput.

Application Behavior Tuning
    • Faster and faster than nothing to send, can less hair on less hair.
    • We can't allow data to be transferred faster, but they can be transmitted in shorter distances.
    • Reusing TCP Connections is the key to improving performance
Performance checklist
    • Upgrade the server kernel to the latest version (linux:3.2+);
    • Make sure the CWnd size is 10;
    • Disables slow start after idle;
    • Ensure that the startup window is scaled;
    • Reduce the transmission of redundant data;
    • Compress the data to be transmitted;
    • Place the server close to the user to reduce the round-trip time;
    • Reuse established TCP connections to the fullest extent possible
    • U.S. computer room and China-US network delay status
    • Talking about TCP optimization (measurement and calculation of Rwnd and CWnd)
    • The things about TCP (UP)
    • What is the TCP/IP Stack

Reproduced TCP bottleneck analysis for HTTP requests

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.