Time-wait and Close-wait

Source: Internet
Author: User
Tags rfc

System tuning that you don't know about time_wait and close_wait2016-03-11 operation and maintenance help

SOURCE Subscription number: Da Fang said

Da Fang

Have you ever met a time_wait problem?

I believe that many have encountered this problem. Once a user is shouting: The network is slowing down. The first thing is, NETSTAT-A | grep time_wait | Wc-l, ah Ma, thousands of time_wait.

Then, the first thing to do is: open Google or Bing, enter the keyword: too many time wait. Must be able to find solutions, and the first or many people are reproduced everywhere the solution must be:

Open the sysctl.conf file and modify the following parameters:

Net.ipv4.tcp_tw_recycle = 1

Net.ipv4.tcp_tw_reuse = 1

Net.ipv4.tcp_timestamps = 1

You will also be told that opening Tw_recylce and Tw_reuse must be supported by timestamps, and these configurations are generally not recommended, but are useful for solving a lot of time_wait problems.

Next, you directly modified these several parameters, reload, found, hey, not a few minutes, the number of time_wait really reduced, also did not find which user said there is a problem, then there is no then.

To do this, it is believed that 50% or a higher proportion of the development has stalled. The problem seems to be solved, but to thoroughly understand and solve this problem, it may not be so simple, or, there is still a long way to go!

What are time-wait and close-wait?

So to solve the problem, we must understand the problem first. Arbitrarily changed two lines of code, found that the bug "No", is not a bug really no, just hidden in a deeper place, you did not find, or to your knowledge level, you can not find it.

As you know, because the socket is a full-duplex mode of operation, a socket closure requires four handshake to complete.

    • The party that actively closes the connection, calls Close (), and the protocol layer sends fin packets

    • after the passive closed party receives the FIN packet, the protocol layer repliesto the ACK, then the passive closed party enters the close_wait state, and the active closed party waits for the other side to close, then enters the fin_wait_2 state; Passively close one side of the application, call the Close action

    • The passive closed party calls the close () operation after all data is sent, and at this time, the protocol layer sends the FIN packet to the actively closed side, waiting for the other party's ACK, the passive closed side enters the Last_ack state ;

    • the active closed party receives the fin packet, and the protocol layer repliesto the ACK; at this time, actively close the connecting party, enter the TIME_WAIT state, while the passive closed side, into the closed state

    • Wait 2MSL time, active close side, end time_wait, enter closed state

With the previous socket close operation, you can draw the following points:

    1. The party that actively closes the connection-that is, the side of the close operation that actively calls the socket-will eventually enter the TIME_WAIT state

    2. The side of a passive close connection has an intermediate state, that is, close_wait, because the protocol layer waits for the upper-level application to actively invoke the close operation before it actively closes the connection

    3. Time_wait will wait for 2MSL time by default before finally entering the closed state;

    4. This connection cannot be reused until a connection has entered the closed state!

So, here by your intuition, time_wait is not scary (not really, later), close_wait is terrible, because close_wait a lot, said either your application is writing a problem, there is no appropriate to close the socket; Your server CPU is not working (the CPU is too busy) or your application has been sleeping somewhere else (locks, or file I/O, etc.), your application is not getting the proper scheduling time, and your program is unable to actually perform the close operation.

There are two more questions:

    1. The connection reuse mentioned above, what is the concept of connection in the end?

    2. Why does the protocol layer design a time_wait state? Why is this state waiting for 2MSL time by default to enter closed

Explain these two problems first, we look at the beginning of the network configuration is what is the use of, and time_wait sequela problem.

What is the concept of socket connection?

We often mention the socket, so, what is a socket? In fact, the socket is a five-tuple, including:

    1. Source IP

    2. Source Port

    3. Destination IP

    4. Destination Port

    5. Type: TCP or UDP

This five-tuple identifies an available connection. Note that many people define a socket as a four-tuple, which is the source IP: Source port + Destination IP: Destination port, this definition is incorrect.

For example, if your local export IP is 180.172.35.150, then your browser is connected to a Web server, such as Baidu, the four-tuple of this socket connection may be:

[180.172.35.150:45678, TCP, 180.97.33.108:80]

The source IP is your egress IP address 180.172.35.150, the source port is random port 45678, the destination IP is one of Baidu's load Balancer server IP 180.97.33.108, port is the HTTP standard 80 port.

If this time, you open another browser, visit Baidu, will create a new connection:

[180.172.35.150:43678, TCP, 180.97.33.108:80]

The source port for this new connection is a new random port of 43678.

So, if you need to test Baidu, how many connections can you create? I am in the article "Cloud Thinking | Easy to build Tens voting system "also mentioned this question a little bit, did not read this article, you can send a" voting system "to read.

Second question, what's the use of time_wait?

If we are to make an analogy, the appearance of time_wait, corresponding to the exception of your program, it appears, is to solve the network packet loss and network instability caused by other problems:

First, to prevent the previous connection "Five tuples, we continue to 180.172.35.150:45678, TCP, 180.97.33.108:80 as an example" on the delayed packet or lost retransmission of the packet, the connection was reused after the previous connection closed, at this time you visit Baidu again, The new connection may still be represented by the 180.172.35.150:45678, TCP, 180.97.33.108:80, which is the five-tuple, which is the source port coincidence or 45678 "error received (Exception: data lost, or transmission is too slow), see:

    • Seq=3 packet loss, retransmission for the first time, no ACK acknowledgement

    • If there is no time_wait, or time_wait time is very end, then the closed connection "180.172.35.150:45678, TCP, 180.97.33.108:80 status changed to closed, the source port can be reused", immediately re-use " New connection to 180.97.33.108:80, multiplexing of previous random port 45678 ", and continuous sending of seq=1,2 packets

    • At this point, the previous connection on the seq=3 of the packet again, and the SEQ sequence number is exactly 3 (this is very important, otherwise, SEQ sequence number is not on, will be rst off), at this time, the previous connection on the data is a connection error received later

Second, make sure that the connecting party can close its connection within the time frame. In fact, it is also due to the loss of packets, see:

    • The active shut-off side closes the connection and sends the FIN;

    • The passive closed-side reply Ack also performs a close action to send the Fin packet; At this point, the passive closed party enters the Last_ack state

    • The active shut-off party goes back to the ACK and actively closes the party into the TIME_WAIT State;

    • But the last ACK is lost, and the passive-closed side continues to stay in the Last_ack state.

    • At this time, if there is no time_wait existence, or, stay on the time_wait is very short, then the active closed side quickly entered the closed state, that is to say, if a new connection at this time, if the source random port is reused, after connect sends the SYN packet, Since the passive side still believes that the "five-tuple" of the connection is still waiting for an ACK, but receives a SYN, the passive will reply to the RST

    • Causing the party to create the connection actively, the connection cannot succeed because the RST is received

So, you see, the existence of time_wait is very important, if forced to ignore time_wait, there is still a high probability, causing data rough, or transient connection failure.

So, why is the time_wait state going to be 2MSL (twice times max segment lifetime)? Can this time be adjusted by modifying the kernel parameters? First, this 2MSL, which is defined in RFC 793, is described in the Red section of the RfC:



This definition is more of a guarantee (the TTL in the IP packet, that is, the maximum number of hops that the data survives, the real response is the time the data survives on the network), to ensure that the last ACK is lost, the passive-closed party re-sends fin again and waits for the ACK of the reply, one to two back and forth. In the kernel, the time to write this MSL is: 30 seconds (there are readers reminding me that the MSL recommended in RFC is actually 2 minutes, but many implementations are 30 seconds), so time_wait is 1 minutes:

So, again recall the previous question, if a connection, even if the handshake is closed four times, due to the existence of time_wait, the connection, within 1 minutes, also can not be reused, then, if you use a machine to do the test of the client, how many concurrent connection requests can you send in a minute? If this is a Load balancer server, a Load balancer server, how many connections a minute can have to access the backend server at the same time?

Time_wait a lot, scary?

If you pass Ss-tan State time-wait | Wc-l found that there are many time_wait in the system, many people will be nervous. How much is it? Hundreds of thousands of? If this is the magnitude, there really is no need to be nervous. First, this magnitude, because the memory used by Time_wait is very small, because the CPU consumed by logging and finding the available local port is basically negligible.

does it consume memory? Of course ! any data you can see, the kernel needs to have the relevant data structure to save this information. A socket is in the TIME_WAIT state, it is also a "exist" socket, the kernel also needs to have to keep its data:

    1. There is a hash table in the kernel that holds all the connections, and this hash table contains both a time_wait state connection and other state connections. It is mainly used to find this connection quickly from this hash table when new data arrives. Different kernels have different sizes for this hash table, and you can use the DMESG command to find the size of your kernel settings:

    2. There is also a hash table to hold all bound ports, mainly used to quickly find an available port or random port:

Because the kernel needs to save this data, it is bound to occupy a certain amount of memory.

will it consume CPU? of course! each time a random port is found, it is necessary to traverse through bound ports, which inevitably requires some CPU time.

Time_wait a lot, both memory and CPU consumption, which is why many people, see time_wait a lot, just want to kill them. In fact, if you go further to study, 10,000 time_wait connection, also consumes about 1M of memory, to the modern many servers, has not counted what. As for the CPU, it is certainly better to reduce it, but not to worry about more than 10,000 hash item.

If you really want to tune, you still need to figure out the tuning recommendations of others and the meaning behind tuning parameters!

Time_wait tuning, you must understand a few tuning parameters

Before the specific legend, we will first resolve the relevant parameters of the existence of the meaning.

  1. Net.ipv4.tcp_timestamps

    In the TCP Reliability section, RFC 1323 introduces timestamp TCP option, a two 4-byte timestamp field, where the first 4-byte field is used to hold the time that the packet was sent, The second 4-byte field is used to hold the last time that a recipient was sent to the data. With these two time fields, there is room for subsequent optimizations.

    Tcp_tw_reuse and tcp_tw_recycle depend on these time fields.

  2. net.ipv4.tcp_tw_reuse

     

    Literal meaning, reuse the time_wait state of the connection.

     

    Always remember a socket connection, that is, the five-tuple, the time_wait state of the connection, must appear on the side of the active shutdown connection. Therefore, when the side of the active shutdown connection, again to the other side to initiate the connection request (for example, the client closes the connection, the client connects to the server again, it can be reused, load balancer servers, actively shut down the backend connection, when there is a new HTTP request, the Load Balancer server again connected to the back-end server, can also be reused at this time), you can reuse the TIME_WAIT state of the connection.

     

    By literal explanation, as well as examples, you see, tcp_tw_reuse application scenario: One party needs to constantly connect to other servers through a "short connection", always shutting down the connection (time_wait on its own side). After shutting down and reconnecting with each other continuously.

     

    Then, when the connection is reused, delayed or re-sent packets arrive, how does the new connection determine whether the data reached is a reuse connection or a pre-reuse connection? Then you need to rely on the two time fields mentioned earlier. After multiplexing the connection, the time of the connection is updated to the current time, and when the delayed data reaches the time of the delayed data is less than the time of the new connection, the kernel can determine by time that the delayed data can be safely discarded.

     

    This configuration relies on both sides of the connection while supporting timestamps. At the same time, this configuration only affects the outbound connection, which is the role of the client, which is used to multiplex the time_wait socket when connecting to the service side [Connect (DEST_IP, Dest_port)].

     

  3. Net.ipv4.tcp_tw_recycle

    Literally, destroy the time_wait.

    When this configuration is turned on, the kernel will quickly recycle the socket connection in the TIME_WAIT state. How fast? is no longer 2MSL, but an RTO (retransmission timeout, timeout time for packet retransmission), which is calculated dynamically based on RTT, but is much smaller than 2MSL.

    With this configuration, it is still necessary to protect the lost retransmission or delay of the packet, will not be a new connection (note that this is no longer a reuse, but the previous time_wait state of the connection has been destroy dropped, a new connection, Just the same five-tuple as the one that was destroy dropped) received incorrectly. When this configuration is enabled, when a socket connection enters the TIME_WAIT state, some statistics are recorded in the kernel, including the other IP in the five-tuple corresponding to the socket, including, of course, the last packet time received from the other IP. When a new packet arrives, the packet is discarded as long as the time is later than the kernel record.

    This configuration relies on the connection of both sides to timestamps support. At the same time, this configuration, the main impact on the inbound connection (the connection to the outbound also has an impact, but is not reused), that is, as a service-side role, the client connected, the server actively closed the connection, time_wait state socket in the service side, The server quickly recycles the connection to that state.

Thus, if the client is in a NAT network (multiple clients, the same IP egress network environment), if Tw_recycle is configured, it is possible that only one client and its own connection will succeed in an RTO time (inconsistent time for different client contracting Cause the server to discard the packets directly).

I try to explain it clearly in words, but a few examples and illustrations should help us to understand it thoroughly.

Let's look at a network situation like this:

    1. The client IP address is: 180.172.35.150, we can think of as a browser

    2. The load balancer has two IPs, the extranet IP address is 115.29.253.156, the intranet address is 10.162.74.10; The network address listens for 80 ports

    3. There are two Web servers behind the load balancer, one IP address is 10.162.74.43, listening on 80 ports and the other is 10.162.74.44, monitoring 80 ports

    4. The Web server connects to the data server with an IP address of 10.162.74.45, listening on 3306 ports

With this simple architecture, let's look at the impact of tw_reuse/tw_recycle on network connectivity in different situations that we're talking about today.

Let's make a hypothesis:

    1. The client connects to the load balancer via http/1.1, that is, the HTTP protocol is connection to keep-alive, so we assume that the client disconnects the client from the server's socket connection, so time_wait appears on the client

    2. Web server and MySQL server connection, we assume that the program on the Web server at the end of the connection, call close operation to close the socket resource connection, so, Time_wait appears on the Web server side.

So, under this assumption:

    1. On the Web server, you can definitely configure the enabled configuration: tcp_tw_reuse; If the Web server has many connections to the DB server, you can guarantee the reuse of the socket connection.

    2. So, the load Balancer server and the Web server, who first closes the connection, determines how we configured the Tcp_tw_reuse/tcp_tw_recycle.

Scenario One: The Load Balancer server first shuts down the connection

In this case, because the load Balancer server connects to the Web server, time_wait mostly appears on the Load Balancer server.

Configuration on the Load Balancer server:

    • Net.ipv4.tcp_tw_reuse = 1//reuse connection as much as possible

    • net.ipv4.tcp_tw_recycle = 0//There is no guarantee that the client is not on a NAT network.

The configuration on the Web server is:

    • Net.ipv4.tcp_tw_reuse = 1//This configuration primarily affects the connection multiplexing of Web server to DB server

    • Net.ipv4.tcp_tw_recycle: setting to 1 and 0 doesn't make any sense. Consider that, in load balancing and its connection, it is the server side, but time_wait appears on the load balancer servers; It is connected to DB, it is the client, recycle has no effect on it, the key is reuse

Scenario Two: The Web server first shuts down the connection from the Load Balancer server

In this case, the Web server becomes the hardest hit of time_wait. Load balancing the connection to the Web server, the Web server first closes the connection, time_wait appears on the Web server, the Web server connects to the DB server, the Web server shuts down the connection, and Time_wait appears on it, at which point the configuration on the Load Balancer server:

    • net.ipv4.tcp_tw_reuse:0 or 1, it doesn't really mean anything.

    • Net.ipv4.tcp_tw_recycle=0//Must be off recycle

Configuration on the Web server:

    • Net.ipv4.tcp_tw_reuse = 1//This configuration primarily affects the connection multiplexing of Web server to DB server

    • Net.ipv4.tcp_tw_recycle=1//Because there is no NAT network between load balancing and Web servers, consider turning on recycle to speed up the large number of time_wait due to load balancing and connection between Web servers

Answer a few questions that you have mentioned

1. What we are saying is that the connection pool can be reused, does it mean that you need to wait until the last connection time wait is over before you can use it again?

The so-called connection pool multiplexing, multiplexing must be active connection, so-called active, the first indicates that the connection pool connection is established, second, the connection pool as the upper application, there will be a timed heartbeat to maintain the active connection. Since the connection is active, there is no concept of time_wait, in the previous article also mentioned that the TIME_WAIT is actively close the connection of the party, after the connection is closed before entering the state. Now that it's closed, the connection is definitely not in the connection pool, which is freed by the connection pool.

2. Would you like to ask, as the load Balancer machine random port used up in the case of a large number of time_wait, do not adjust the text of the three parameters said, there are other better solutions?

First, when the random port is used up, you can modify the Net.ipv4.ip_local_port_range configuration under/etc/sysctl.conf, at least to net.ipv4.ip_local_port_range=1024 65535, ensure that your load Balancer server can use at least 60,000 random ports, or 60,000 of the reverse proxy to the back end of the connection, can support the concurrency of 1000 per second (think, because the TIME_WAIT state will last 1 minutes after the disappearance, so a minute up to 60,000, 1000 per second) If you've used all of this, you should have added the server, or your load Balancer server needs to configure multiple IP addresses, or your backend server will need to listen for more ports and configure more IPs (think about the five-tuple of sockets)

Second, a lot of time_wait, how much? If it is thousands of, do not worry, because the memory and CPU consumption of some, but can be ignored.

Third, if the volume is really large, tens of thousands of the kind, you can consider, let the backend server actively shut down the connection, if the back-end server does not have an extranet connection only the connection of the Load Balancer server (mainly without a NAT network connection), you can configure the tw_recycle on the back-end server, and then at the same time, On the Load Balancer server, configure Tw_reuse.

3. What is recommended if you want to learn more about the Internet?

Learning a network is a lot more difficult than studying a programming language. The so-called difficult, in fact, because it takes a lot of time to invest. I am not proficient in myself, I can only say getting started and understanding. Basic books can be recommended: "Detailed TCP/IP protocol", must read, "TCP/IP high-efficiency programming: 44 Skills to improve network programs", must read; "UNIX environment Advanced Programming", must read; "UNIX Network programming: Volume One", I only read volume one; In addition, you need to familiarize yourself with network Tools, Tcpdump and Wireshark, I have a one-stop learning wireshark:https://github.com/dafang/notebook/issues/114 in my notes and it's worth reading. With this accumulation, there may be some practice and fragmentation of learning and accumulation.

Written in the last

I wrote this article on and off for two days, and found several places to verify, including a Vincent Bernat article and Vincent's discussion in many places and others. During the period, I also spent some time and Vincent discussed a few I did not find in the TCP source found in the question of the place.

I strive to be more accurate and tidy than the articles scattered online. However, it will inevitably

There are omissions or errors in the place, master see can be corrected at any time, and discuss with me, we study together!

Related article: I was serious, for a net.ipv4.tcp_tw_recycle parameter

Time-wait and Close-wait

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.