TCP-WAIT Status and its impact on busy servers

Source: Internet
Author: User

TCP has a TIME-WAIT Status, which usually takes 2 minutes. On A busy website, there are often thousands of access requests within two minutes. Assume that server A is twice as powerful as server B, but server A has thousands of access requests ~ WAIT Status. Then server B will be under great pressure within these two minutes.

The following explains the TIME_WAIT status:

MSL (maximum segment lifetime) indicates the maximum lifetime of TCP packets over the Internet. A specific MSL value must be selected for each TCP implementation. RFC 1122 is recommended to be 2 minutes.
The maximum retention time of TIME_WAIT status is 2 * MSL, that is, 1-4 minutes. The IP header has a TTL of up to 255. Although the unit of TTL is not second (it has nothing to do with time), we still need to assume that the TTL of TCP packets with TTL 255 cannot exceed MSL on the Internet. During the transmission process, TCP packets may be forced to buffer delay due to route faults, and non-optimal paths may be selected. As a result, the sender's TCP mechanism starts to retransmit timeout. The previous TCP packet can be called "Roaming TCP repeated packets", and the next TCP packet can be called "timeout retransmission TCP repeated packets". As a reliable connection-oriented protocol, the TCP implementation must correctly process such repeated packets because both of them may eventually arrive.

The following figure shows the termination of a common TCP connection:

When a socket is closed, it is completed through the four handshakes of sending messages to each other at both ends. When one end calls close (), it indicates that there is no data to be sent at the local end. It seems that after the handshake is complete, the socket should be in the CLOSED state. But there are two problems,
First, we do not have any mechanism to ensure that the last ACK can be delivered normally.
Second, there may still be residual packets (wandering duplicates, or old duplicate packets) on the network, and we must be able to process them properly.

If the last ACK is lost, the server resends the last FIN it sends. Therefore, the client must maintain a status information to re-Send the ACK. If this status is not maintained, after receiving the FIN, the client will respond to an RST. After receiving the RST, the server will regard it as an error. If the TCP protocol can normally complete the necessary operations and terminate the data stream transmission between the two parties, the four sections of the four handshakes must be completely correctly transmitted without any loss. This is why the socket is still in the TIME_WAIT status after it is closed, because it has to wait to re-release ACK.
If both sides of the current connection have already called close (), it is assumed that both parties have reached the CLOSED status, but there is no TIME_WAIT status, the following situation will occur. Now there is a new connection established, the IP address and port used are exactly the same as the previous one, and the established connection is also called an embodiment of the original connection. It is also assumed that a datagram exists in the network in the original connection, so that the datagram received by the new connection may be the datagram of the previous connection. To prevent this, TCP does not allow a connection from a socket in the TIME_WAIT status. In the TIME_WAIT status, the socket waits for twice the MSL time (because MSL is one-way transmission of a datagram in the network to the time it is determined to be lost, A datagram may become a residual datagram in the transmission diagram or in the response process. It is necessary to determine that the discarded MSL of a datagram and its response must be twice as large as that of MSL. This means that a successful connection will inevitably cause the loss of the residual datagram in the previous network.
Due to problems related to the TIME_WAIT status, we can set the SO_LINGER flag to prevent the socket from entering the TIME_WAIT status. This can replace the normal termination mode of the TCP four-way handshake by sending the RST. However, this is not a good idea. TIME_WAIT is often advantageous for us.

Influence of TIME_WAIT status on HTTP

According to the TCP protocol, the party that initiates the shutdown will enter the TIME_WAIT status, lasting 2 * MSL (Max Segment Lifetime), the default is 240 seconds. It is worth noting that for the TCP-based HTTP protocol, the Server end closes the TCP connection. In this way, the Server enters the TIME_WAIT status. it is conceivable that for the Web Server with a large traffic volume, there will be a large number of TIME_WAIT statuses. If the server receives 1000 requests in one second, there will be a backlog of 240*1000 = 240,000 TIME_WAIT records. Maintaining these statuses will burden the Server. Of course, modern operating systems use Quick search algorithms to manage these TIME_WAIT instances. Therefore, it is not too time-consuming to determine whether a TIME_WAIT instance in hit instances is used for new TCP connection requests, however, it is always difficult to maintain so many statuses.

HTTP 1.1 requires that the default behavior is Keep-Alive, that is, multiple requests/response will be transmitted over TCP connections. One major reason is that this problem has been found. Another way to reduce the TIME_WAIT pressure is to reduce the system's 2 * MSL time, because the time of 240 seconds is really a little longer. For Windows, modify the registry, add a DWORD Value TcpTimedWaitDelay on HKEY_LOCAL_MACHINE \ SYSTEM \ CurrentControlSet \ Services \ Tcpip \ Parameters. Generally, do not set it to less than 60, otherwise it may be troublesome.

During Socket programming, we often ask how many TCP connections can be established on a single machine at most, and how to adjust system parameters to adjust the maximum number of TCP connections on a single machine. In Windows, the number of TCP connections on a single machine is determined by multiple parameters. The following describes them one by one:

Maximum number of TCP connections

[HKEY_LOCAL_MACHINE \ System \ CurrentControlSet \ Services \ Tcpip \ Parameters] TcpNumConnections = 0x00fffffe (Default = 16,777,214)

The preceding Registry Information configures the maximum number of TCP connections allowed for a single machine. The default value is 16 Mb. This value seems to be very large. This is not the only condition for limiting the maximum number of connections. There are other conditions that will limit the maximum number of connections to the TCP connection.

Maximum number of dynamic ports

When the TCP client is connected to the server, the client must allocate a dynamic port. By default, the allocation range of this dynamic port is 1024-5000, that is, by default, the client can initiate up to 3977 Socket connections at the same time. Modify the following registry to adjust the range of the dynamic port.

[HKEY_LOCAL_MACHINE \ System \ CurrentControlSet \ Services \ Tcpip \ Parameters] MaxUserPort = 5000 (Default = 5000, Max = 65534)

Maximum TCB count

The system assigns a TCP control block or TCB to each TCP connection. This control block is used to cache some TCP connection parameters, each TCB needs to allocate a 0.5 KB pagepool and a KB Non-pagepool. That is to say, each TCP connection occupies 1 KB of system memory.

The maximum number of TCBs in the system is determined by the following registry settings:

[HKEY_LOCAL_MACHINE \ System \ CurrentControlSet \ Services \ Tcpip \ Parameters] MaxFreeTcbs = 2000 (Default = RAM dependent, but usual Pro = 1000, Srv = 2000)

For non-Server versions, the default value of MaxFreeTcbs is 1000 (physical memory larger than 64 MB)

Server version. The default value is 2000.

That is to say, by default, up to 2000 TCP connections can be established and maintained at the same time in the Server version.

Maximum number of TCB Hash tables

TCB is managed by Hash table. The following registry settings determine the size of the Hash table.

HKEY_LOCAL_MACHINE \ System \ CurrentControlSet \ services \ Tcpip \ Parameters] MaxHashTableSize = 512 (Default = 512, Range = 64-65536)

This value indicates the number of pagepool memory allocated. That is to say, if MaxFreeTcbs = 1000, the size of pagepool memory is kb.

The MaxHashTableSize must be greater than 500. The larger the number, the higher the redundancy of the Hash table, and the smaller the number of TCP connections allocated and queried each time. The value must be a power of 2 and the maximum value is 65536.

MaxUserPort = 65534 (Decimal) MaxHashTableSize = 65536 (Decimal) MaxFreeTcbs = 16000 (Decimal)

Here we can see that MaxHashTableSize is configured to be 4 times larger than MaxFreeTcbs, which can greatly increase the speed of TCP establishment.

References:

The TIME-WAIT state in TCP and Its Effect on Busy ServersTCP/IP Option TcpTimedWaitDelay settings

IBM Web Sphere Voice Server configuration

Http://www.cnblogs.com/eaglet/archive/2010/09/21/1832233.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.