How to avoid TCP TIME_WAIT status

Source: Internet
Author: User
Tags dmesg

How to avoid TCP TIME_WAIT status

How to avoid TCP TIME_WAIT status
-- Lvyilong316

What is the meaning of the TIME-WAIT Status of a TCP connection?

Let's recall, what is the tcp time-WAIT Status? For example

Before the TCP connection is closed, the first party that initiates the shutdown will enter the TIME_WAIT status (that is, the one that closes the connection will generate TIME_WAIT), and the other party can quickly reclaim the connection. Available

Ss-tan

To view the current status of the TCP connection (Note: The ss Command is faster than the netstat command and has more detailed functions. For details, refer:Http://www.cnphp6.com/archives/66361)

1. Role of TIME-WAIT Status

The TIME-WAIT Status has two functions.

1. It is well known that the new TCP connection will be affected after receiving the delayed data packet from the previous TCP connection (which was disabled but not completed. (The unique connection validation method is set to four tuples: source IP address, destination IP address, source port, and destination port). The package serial number also plays a role, reducing the chance of a problem, but it cannot be completely avoided. Especially for fast (recycle) connections with large windows size reception. RFC1137 explains what will happen when the TIME-WAIT status is not "NOTE 3. If the TIME-WAIT Status connection is not quickly recycled, what problems will be avoided? See the following example:

For some reason, the packet with the serial number 3 is delayed (not lost) in the network, but the sender re-transmits the packet with the serial number 3 because of the supergeneration, shortening the TIME_WAIT time, the delayed SEQ3 will be received by the newly established TCP connection. If the normal TIME_WAIT mechanism is used, SEQ3 can be removed from the Network (why? Because the TIME_WAIT time is 2MSL, if the data packet is not lost, a data packet can be fully reached at this time ).

2. another role is to prevent the last ACK from being lost, when the last ACK is lost, the remote connection enters the LAST-ACK State, if there is no TIME-WAIT Status, if the connection is still valid, the communication will continue, and the connection will be re-opened. When a SYN packet is received remotely, an RST packet is returned. Because the SEQ is incorrect, the new connection cannot be established successfully and an error is reported and terminated.

If the remote stay in the LAST-ACK state due to the loss of the last ACK packet, New TCP connections with the same four tuples will be affected.

RFC 793 emphasizes that the TIME-WAIT status must be twice the msl time (max segment lifetime). in linux, the TIME limit cannot be adjusted, and the write duration is 1 minute, defined in include/net/tcp. h

# Define TCP_TIMEWAIT_LEN (60 * HZ)/* how long to wait to destroy TIME-WAIT * state, about 60 seconds */

# Define TCP_FIN_TIMEOUTTCP_TIMEWAIT_LEN

/* BSD style FIN_WAIT2 deadlock breaker.

* It used to be 3 min, new value is 60sec,

* To combine FIN-WAIT-2 timeout

* TIME-WAIT timer.

*/

Someone once proposed to change the tcp time-wait time to a parameter that can be customized, but it was rejected. In fact, this is a TCP specification, for TIME-WAIT, the advantage is greater than the disadvantage.

2. Influence of TIME_WAIT status

The problem arises. Let's take a look at why this status affects a server that processes a large number of connections. The following three aspects are involved:

L reuse of New and Old connections (same four tuples) in the TCP connection table to avoid repeated slots;

L memory usage of the socket struct in the kernel;

L additional CPU overhead;

Note: The number of connections in the TIME_WAIT status can be: ss-tan state time-wait | wc-l, View

2.1.Connection table slot connection table slot

A TCP connection in the TIME_WAIT status can survive for 1 minute in the chain table slot, meaning that a connection with the same four tuples (Source Address, source port, target address, and target port) cannot appear, that is to say, a new TCP (same quad-tuples) connection cannot be established.

For web servers, the target address and target port are fixed values. If the web server is behind the L7 Server Load balancer, the source address is a fixed value. In LINUX, as a client,The number of client ports that can be allocated is by default.Net. ipv4.up _ local_port_range). This means that, between the web server and the Server Load balancer server, only 500 ports per minute are in the established status, that is, about connections per second..

If a socket in the TIME-WAIT Status appears on the client, this problem is easily discovered. When you call the connect () function, EADDRNOTAVAIL is returned, and related errors are recorded in logs. If the socket in the TIME-WATI status appears on the server, the problem is very complicated because there is no logging and no counter reference. However, you can confirm by listing the number of all the currently connected tuples on the server.

[Root @ localhost ~] # $ Ss-tan 'Sport =: 80' | awk '{print $ (NF) "" $ (NF-1)}' | sed's /: [^] * // G' | sort | uniq-c

696 10.24.2.30 10.33.1.64

1881 10.24.2.30 10.33.1.65

5314 10.24.2.30 10.33.1.66

5293 10.24.2.30 10.33.1.67

3387 10.24.2.30 10.33.1.68

2663 10.24.2.30 10.33.1.69

1129 10.24.2.30 10.33.1.70

10536 10.24.2.30 10.33.1.73

The solution is to increase the scope of the four tuples, which can be implemented in many ways. (The following suggestions make it less feasible)

1) modify the net. ipv4.ip _ local_port_range parameter to increase the range of available client ports.

2) Add Server ports and listen to more ports, such as 81, 82, and 83. The web server has a Server Load balancer before and is user-friendly.

3) increase the Client IP address, especially when serving as the Server Load balancer server, and use more IP addresses to communicate with the backend web server.

4) Add the Server IP address.

5) Of course, the final method is to adjust net. ipv4.tcp _ tw_reuse and net. ipv4.tcp _ tw_recycle. But never do this. I will talk about it later.

2.2. Memory

When a large number of connections are maintained, when each connection is retained for more than one minute, some server memory will be consumed. For example, if the server processes 1 W New TCP connections per second, the server will have 1 W/s * 60 s = 60 W TIME_WAIT status TCP connections in one minute, how much memory will this occupy? Don't worry, teenagers, not that much.

First, from the application perspective, a TIME_WAIT socket does not consume any memory: the socket is off. In the kernel, the TIME-WAIT Status socket has three different structures for three different functions.

(1) "TCP established hash table" is connected to store the hash table (including connections in other non-established States). When a new packet is sent, is used to locate the connection to find the survival status.

The bucket of the hash table contains the sockets in TIME_WAIT status and normally active sockets. The size of the hash table depends on the memory size of the operating system. It is printed during system boot and can be seen in dmesg logs.

Dmesg | grep "TCP established hash table"

[0.169348] TCP established hash table entries: 65536 (order: 8, 1048576 bytes)

This value may be overwritten by the change of the kernel startup parameter thash_entries (set the maximum number of TCP connection hash tables.

In the bucket of the hash, each TIME-WAIT state socket corresponds to a tcp_timewait_sock struct, And the socket in other States corresponds to the tcp_sock struct.

  
  
  1. struct tcp_timewait_sock {
  2. struct inet_timewait_sock tw_sk;
  3. u32 tw_rcv_nxt;
  4. u32 tw_snd_nxt;
  5. u32 tw_rcv_wnd;
  6. u32 tw_ts_offset;
  7. u32 tw_ts_recent;
  8. long tw_ts_recent_stamp;
  9. };
  10. struct inet_timewait_sock {
  11. struct sock_common __tw_common;
  12. int tw_timeout;
  13. volatile unsigned char tw_substate;
  14. unsigned char tw_rcv_wscale;
  15. __be16 tw_sport;
  16. unsigned int tw_ipv6only : 1,
  17. tw_transparent : 1,
  18. tw_pad : 6,
  19. tw_tos : 8,
  20. tw_ipv6_offset : 16;
  21. unsigned long tw_ttd;
  22. struct inet_bind_bucket *tw_tb;
  23. struct hlist_node tw_death_node;
  24. };

(2) A chain table called "death row" is used to terminate connections (sockets) in the TIME_WAIT status, the connections on the linked list are sorted from small to large based on the remaining time of TIME_WAIT. The elements in the Linked List directly reuse the corresponding elements in the hash table (so there is no more memory consumption ), that is, the hlist_node tw_death_node member in the struct inet_timewait_sock. The last and second lines of the code above.

(3) Another related structure is called "hash table of bound ports", which stores the port of the called bind function as its related parameter. The main function of this hash table is to provide an available port when the port needs to be dynamically bound. The memory used for this hash can also be found in the system startup log:

$ Dmesg | grep "TCP bind hash table" [0.169962] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)

Each element in the hash table is an inet_bind_socket struct. Each port that has called bind has an element. For a web server, it is bound to port 80, and its TIME-WAIT connections share the same entry. For clients connected to remote servers, their ports are randomly allocated when connect is called, and they do not occupy elements in the hash table (no bind is called ). Therefore, only the struct tcp_timewait_sock and the struct inet_bind_socket are related to the TIME_WAIT status. Each connection in the TIME_WAIT status consumes a tcp_timewait_sock structure, while only the TIME_WAIT status on the server consumes an inet_bind_socket structure.

The size of the tcp_timewait_sock struct is only 168 bytes, And the inet_bind_socket struct is 48 bytes. Therefore, when connections on the server enter the TIME-WAIT Status, less than 10 MB of memory is used. If the client has remote connections and enters the TIME-WAIT Status, MB of memory is used. Next, let's take a look at the slabtop results. The test data here is the result of a 5 w time-WAIT Status connection, where is a remote connection:

$ Sudo slabtop-o | grep-E '(^ OBJS | tw_sock_TCP | tcp_bind_bucket )'

Objs active use obj size slabs obj/SLAB CACHE SIZE NAME

50955 49725 97% 0.25 K 3397 15 13588 K tw_sock_TCP

44840 36556 81% 0.06 K 760 59 3040 K tcp_bind_bucket

The command execution result is output as it is, and a single character is not moved. TIME-WAIT connections occupy a very small amount of memory. Of course, if your server needs to process thousands of new TCP connections per second, you may need a little more memory to correctly communicate with the client. However, in general, the memory usage of the TIME-WAIT Status connection can be ignored.

2.3. CPU

So how does the TIME_WAIT status affect CPU consumption?

The increase in the TIME_WAIT status only occupies some ports, so that these ports cannot be released in a short time. However, the hash storage structure will make the system quickly find an idle port when a new port is needed, therefore, the CPU overhead will not increase significantly.

3. How to avoid or reduce the influence of TIME_WAIT

Although the above analysis shows that the TIME_WAIT status has little impact on the system, you can use the following three methods to minimize the impact:

L disable socket delay shutdown;

L disable net. ipv4.tcp _ tw_reuse;

L disable net. ipv4.tcp _ tw_recycle;

(1) Disable socket Delay

Normally, when close is called, the SOCKET needs to be delayed (lingering). The residual data in the buffer in the kernel will be sent to a remote address, and the socket will switch to the TIME-WAIT Status. If this option is disabled, the underlying layer will also close after close is called, and the data that has not been sent in Buffers will not continue to be sent. There are two actions about socket lingering delayed Shutdown (related to the setting parameters ):

① After the function is closed, the system will not send the FIN shard. Instead, it will send the RST Shard, and any residual data in buffers will be discarded. In this way, no SOCKET in the TIME-WAIT Status will appear.

② If there is still residual data in the buffer sent by the socket after the close function is called, the process will sleep until all data is sent and confirmed, or the configured linger timer expires. This mechanism ensures that the residual data is sent out within the configured timeout period. If the data is sent normally and the FIN package is also sent normally, the data is converted to the TIME-WAIT Status. If other exceptions occur, RST is sent.

(2) net. ipv4.tcp _ tw_reuse

What is the role of this option? As you can guess by name, this option can reuse connections in the TIME_WAIT status. By default, the TIME_WAIT status is 60 s. If this option is enabled, when the system needs to initiate a new outgoing connection, if the new timestamp is greater than the TIME_WAIT connection timestamp (greater than 1 s), you can directly reuse the original TIME_WAIT connection. That is, the connection in TIME-WAIT status can be reused in only one second.

Here we need to explain two terms: one is the outgoing connection, that is, the external connection actively initiated, that is, the connection initiated as the client.This option is enabled only for the client.. The other is the timestamp. RFC 1323 implements TCP expansion specifications to ensure high availability when the network is busy. It defines a new TCP option-two four-byte timestamp fields, the first is the current clock timestamp of the TCP sender, the second is the latest timestamp received from the remote host. Enable net. after ipv4.tcp _ tw_reuse, if the new timestamp is greater than the previously stored timestamp, linux selects one from the TIME-WAIT Status surviving connection, re-allocate the TCP connection to the new connection.

So how does enabling this option affect system security? We will analyze the two functions of TIME_WAIT. First, TIME_WAIT can effectively prevent old shards from appearing in new connections, while the activation of the timestamp option can greatly avoid this.

In addition, what if the last ACK of the connection is lost? That is, what if the new connection reuses the previous TIME_WAIT connection but receives the FIN package of the previous connection? As shown in, the system will directly reply to the RST and continue the establishment of the original connection.

(3) net. ipv4.tcp _ tw_recycle

This option also depends on the timestamp option, and can also be guessed by the option name. This option can speed up the collection time of TIME_WAIT status connections (the default value is 60 s if not enabled ). If this option is enabled, the recovery time of TIME_WAIT is changed to 3.5 RTO (timeout retransmission time). Of course, this time changes dynamically with the network status and is calculated using RTT. This option affects all TIME_WAIT statuses, including incoming connections and outgoing connections. SoEnabling this option will affect both the customer and the server.. We can use the ss Command to view a connected RTO:

$ Ss -- info sport =: 2112 dport =: 4057 State Recv-Q Send-Q Local Address: Port Peer Address: Port ESTAB 0 1831936 10.47.0.113: 2112 10.65.1.42: 4057 cubic wscale: 7, 7 rto: 564 rtt: 352.5/4 ato: 40 cwnd: 386 ssthresh: 200 send 4.5 Mbps rcv_space: 5792

4. Summary

1. tw_reuse and tw_recycle must be enabled at the client and server timestamps (enabled by default)

2. tw_reuse only applies to the client. After it is enabled, the client recycles it within 1 s.

3. tw_recycle takes effect on both the client and server. After it is enabled, it is reclaimed within 3.5 * RTO, RTO ms ~ 120 s depends on the network status.

L for the client

1) as a client, due to port 65535 problems, too many TIME_OUT operations directly affect the processing capability. Open tw_reuse to solve this problem. We do not recommend enabling tw_recycle at the same time, which is of little help.

2) tw_reuse helps the client to reclaim connections within one second. Generally, a single machine can receive 6 million/s requests. To increase the number of IP addresses.

3) if the client does not need to receive connections in the internal network pressure test scenario, tw_recycle will have a little benefit.

4) the service can also be designed to automatically close the connection by the server.

L for the server

1) enabling tw_reuse is invalid.

2) Do not open tw_recycle in the online environment

After the server is under a NAT load or the client is in a NAT (this is a certain thing, the basic company's home network adopts NAT). Opening the public network service may cause some connection failures, if the Intranet is used, it can be opened as needed. For example, if the external services of my company are placed behind the load, the load will disable the timestamp option, so it does not work even if it is enabled.

3) What if the server TIME_WAIT is high?

Unlike the port restrictions on the client, it is well optimized to process a large number of TIME_WAIT Linux instances. Each connection in the TIME_WAIT status consumes very little memory, and the maximum limit can be configured through tcp_max_tw_buckets = 262144, modern Machines generally do not lack this memory.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.