Close_wait status and time_wait status

Source: Internet
Author: User
Tags socket error
Tags:

Not long ago, my Socket Client encountered a very embarrassing error. It should have been sending data continuously to the server on a persistent socket connection. If the socket connection is disconnected, the program will automatically retry the connection. One day, I found that the program was constantly trying to establish a connection, but it always failed. Using netstat to check whether there are thousands of socket connections in the CLOSE_WAIT State, so that the upper limit is reached, so a new socket connection cannot be established. Why? Why are they all in the CLOSE_WAIT status? Reasons for CLOSE_WAIT status generationFirst, we know that if our Client program is in the CLOSE_WAIT status, the socket is Passive ShutdownOf! If the Server actively breaks the current connection, both parties need four packages to close the TCP connection: Server ---> FIN ---> Client Server <--- ACK <--- ClientAt this time, the Server is in the FIN_WAIT_2 state, and our program is in the CLOSE_WAIT state. Server <--- FIN <--- ClientWhen the Client sends a FIN to the Server, the Client is set to the LAST_ACK state. Server ---> ACK ---> ClientThe Server responds to ACK, so the Client socket is truly set to CLOSED.

 Our program is in the CLOSE_WAIT status, rather than the LAST_ACK status. It indicates that no FIN has been sent to the Server, so there may be a lot of data to be sent or other things to do before closing the connection, as a result, the FIN is not sent.
Packet.
The reason is clear. Why don't I release a FIN package? Is there so much to do before I close my connection? Another question is, Why are thousands of connections in this status? During that time, did the Server Always take the initiative to remove our connections? In any case, we must prevent similar situations from happening again!First, we need to prevent new ports from being opened. This can be done by setting the SO_REUSEADDR socket option:Reuse local addresses and portsIn the past, I used to change to another port because thousands of Ports entered the CLOSE_WAIT status. If this happens again next time, I want to add a limit, but the current port is in the CLOSE_WAIT status! Before callingSockConnected = socket (AF_INET, SOCK_STREAM, 0 );Then, we need to set the socket options for reuse:

/// Allow reuse of the local address and port: // The advantage is that, even if the socket is disconnected, calling the preceding socket function will not occupy another one, instead, it is always a port // to prevent the socket from being connected at all times. In this way, the port will be continuously changed. Int nREUSEADDR = 1; setsockopt (sockConnected, SOL_SOCKET, SO_REUSEADDR, (const
Char *) & nREUSEADDR,
Sizeof (int ));
In textbooks, this is the case: In this way, if the server is closed or exited, the local address and port are both in TIME_WAIT status, so SO_REUSEADDR is very useful.We may not be able to avoid freezing in the CLOSE_WAIT state, but at least we can ensure that it will not occupy the new port. Next, we need to set SO_LINGER socket options: Close it easily or forcibly?LINGER means "delay. By default (Win2k), SO_DONTLINGER socket options are 1; SO_LINGER options are {l_onoff: 0, l_linger: 0 }. If closesocket () is called while sending data (sending () is not completed, and data is not sent), we usually take the following measures ": because I will call ///
Disable two-way communication first Shutdown (sockConnected, SD_BOTH );/// For security, close the old connection before each Socket connection is established. Closesocket (sockConnected );We will do this time: Set SO_LINGER to zero (that is, the l_onoff field in the linger structure is set to non-zero, but the l_linger is 0)You don't have to worry about the closesocket call going into the "locked" status (waiting for completion), whether or not there are queued data not sent or not confirmed. This method is called "force close" because the virtual circuit of the socket is reset immediately, and all data that has not been sent will be lost. All remote recv () calls fail and the WSAECONNRESET error is returned. Set this option after connect successfully establishes a connection:
Linger m_sLinger; m_sLinger.l_onoff = 1; // (allowed when closesocket () is called, but data is not sent) m_sLinger.l_linger = 0;
// (The allowable stay time is 0 seconds) setsockopt (sockConnected, SOL_SOCKET, SO_LINGER, (const
Char *) & m_sLinger,
Sizeof (linger ));
  SummaryWe may not be able to avoid the recurrence of CLOSE_WAIT status freezing, but we will minimize the impact. We hope that the reuse socket option will enable CLOSE_WAIT to be kicked off during the next connection establishment.

Feedback
# Reply: [Socket] embarrassing CLOSE_WAIT status and Response Policy PM yun. zheng
Reply to: elssann (smelly asshole and his pistachio) () Credit: 51 14:00:00 score: 0

I mean: when one party closes the connection, the other party fails to detect it, which leads to the appearance of CLOSE_WAIT. This is also true of one of my friends last time, he wrote a client to connect to APACHE. After APACHE disconnected the connection, he did not detect it and CLOSE_WAIT appeared. Then I told him to check the location, after he added the code to call closesocket, this problem was eliminated.

If CLOSE_WAIT still appears before closing the connection, we recommend that you cancel the shutdown call and try closesocket on both sides.

Another problem:

For example:
After the client logs on to the server, it sends an authentication request. The server receives the data and authenticates the client identity. The password is incorrect, in this case, the server should first send a wrong password to the client, and then disconnect the connection.

If
M_sLinger.l_onoff = 1;
M_sLinger.l_linger = 0;
After this setting, in many cases, the client cannot receive a message with a wrong password, and the connection is broken.

 

# Reply: [Socket] embarrassing CLOSE_WAIT status and Response Policy PM yun. zheng
Elssann (ODPS and his pistachio) () Credit: 51 13:24:00 score: 0

The reason for the occurrence of CLOSE_WAIT is very simple, that is, after a certain party disconnects the network, it does not detect this error and does not execute closesocket, leading to the implementation of this status, this can be clearly seen in the status change diagram of TCP/IP protocol. At the same time, there is also a kind of corresponding TIME_WAIT.

In addition, setting the SO_LINGER of the SOCKET to zero-second delay (that is, immediately disabling it) is often harmful.
Also, setting ports to reusable is an insecure network programming method.

 

# Reply: [Socket] embarrassing CLOSE_WAIT status and Response Policy PM yun. zheng
Elssann (ODPS and his pistachio) () Credit: 51 14:48:00 score: 0

For more information, see here.
Http://blog.csdn.net/cqq/archive/2005/01/26/269160.aspx

 

Let's look at the figure again:

Http://tech.ccidnet.com/pub/attachment/2004/8/322252.png

When the connection is disconnected,
When a FIN is sent from the left side that initiates the active shutdown request, the right side passively closes the request to respond to an ACK. The ACK is a TCP response instead of an application, the party that passively closes is in the CLOSE_WAIT status. If the party that is passively closed does not call closesocket at this time, it will not send the next FIN, so that it is always in CLOSE_WAIT. Only when the party that is passively closed calls closesocket will it send a FIN to the party that is actively closed, and change its status to LAST_ACK.

 

# Reply: [Socket] embarrassing CLOSE_WAIT status and Response Policy PM yun. zheng
Elssann (ODPS and his pistachio) () Credit: 51 15:39:00 score: 0

For example, the client is passively closed...

When the other party calls closesocket, your program is

Int nRet = recv (s ,....);
If (nRet = SOCKET_ERROR)
{
// Closesocket (s );
Return FALSE;
}

Many people forget the sentence closesocket, which is too common.

In my understanding, when the active side sends a FIN to the passive side, the TCP of the passive side immediately responds to an ACK and submits an ERROR to the application, cause the send or recv of the above SOCKET to return SOCKET_ERROR. Normally, if closesocket is called after SOCKET_ERROR is returned, the tcp of the passively closed party will send a FIN, your status changes to LAST_ACK.

 

# Reply: [Socket] embarrassing CLOSE_WAIT status and Response Policy PM yun. zheng
Int nRecvBufLength =
Recv (sockConnected,
SzRecvBuffer,
Sizeof (szRecvBuffer ),
0 );
// Zhengyun 20050130:
/// Elssann, for example, when the other party calls closesocket
/// Recv. At this time, I may not receive the FIN package sent by the other party, but it is returned by TCP.
/// An ACK package, so my program enters the CLOSE_WAIT status.
/// Therefore, it is recommended that you determine whether an error has occurred here. It is the active closesocket.
/// Because we have set the recv timeout time to 30 seconds, if it is time-out,
/// The error here should be WSAETIMEDOUT. In this case, you can also disable the connection.
If (nRecvBufLength = SOCKET_ERROR)
{
TRACE_INFO (_ T ("= Socket error when receiving with recv = "));
Closesocket (sockConnected );
Continue;
}

Can this happen?

Network connection cannot be released -- CLOSE_WAIT

Keywords:TCP, CLOSE_WAIT, Java, SocketChannel

Problem description:A problem encountered in recent performance tests. The client uses NIO, and the server is still connected to a common Socket. After testing for a period of time, it is found that the server system has a large number of unreleased network connections. Use netstat-na to check whether the connection status is CLOSE_WAIT. This is strange. Why is the Socket closed and the connection still not released.

 

Solution:After half a day on Google, I found that the problem about CLOSE_WAIT is generally C, and Java seems to have encountered a few problems (this article is good, but it also solves CLOSE_WAIT, but it seems that it has not been completely solved, instead, we chose a compromise method ). Next, I found this article because NIO is used, and I suspect it may be a problem. I followed the post, and several of them mentioned a problem --
After the Socket at one end calls close, the Socket at the other end does not call closeSo I checked the code and found that the Server did not close the Socket in some exceptions. Solve the problem after correction.

Time is basically spent on Google, but I have learned a lot. The following figure shows the status transition of a TCP connection:

 

 

Note: The dotted line and solid line correspond to the server (connected) and client (active connection) respectively ).

Use the netstat-na command to know the current TCP connection status. Generally, LISTEN, ESTABLISHED, and TIME_WAIT are common.

 

Analysis:

The problem I encountered above is mainly because the TCP End Process is not completed, resulting in the connection not released. The client is automatically disconnected. The process is as follows:

 

Client Message Server

Close ()
------ FIN ------->
FIN_WAIT1 CLOSE_WAIT
<----- ACK -------
FIN_WAIT2
Close ()
<------ FIN ------
TIME_WAIT LAST_ACK

------ ACK ------->
CLOSED
CLOSED

 

As shown in, because the Server Socket is not called to close when the client is closed, the connection on the Server is "suspended", while the client is waiting for a response. Typical features of this problem are:One end is in FIN_WAIT2, and the other end is in CLOSE_WAITHowever, the underlying problem is that the program is not well written and needs to be improved.

TIME_WAIT status

According to the TCP protocol, the party that initiates the shutdown will enter the TIME_WAIT status, lasting 2 * MSL (Max Segment Lifetime), the default value is 240 seconds, this post briefly describes why this status is required.

It is worth noting that for the TCP-based HTTP protocol, the Server end closes the TCP connection. In this way, the Server will enter the TIME_WAIT status. It is conceivable that for the WebServer with a large traffic volume, there will be a large number of TIME_WAIT statuses. If the server receives 1000 requests in one second, there will be a backlog of 240*1000 = 240,000 TIME_WAIT records. Maintaining these statuses will burden the Server. Of course, modern operating systems use Quick search algorithms to manage these TIME_WAIT instances. Therefore, it is not too time-consuming to determine whether a TIME_WAIT instance in hit instances is used for new TCP connection requests, however, it is always difficult to maintain so many statuses.

HTTP 1.1 requires that the default behavior is Keep-Alive, that is, multiple requests/response will be transmitted over TCP connections. One major reason is that this problem has been found. Another way to reduce the TIME_WAIT pressure is to reduce the system's 2 * MSL time, because the time of 240 seconds is really a little longer. For Windows, modify the registry, add a DWORD Value TcpTimedWaitDelay to HKEY_LOCAL_MACHINE/SYSTEM/CurrentControlSet/Services/Tcpip/Parameters. Generally, the value should be less than 60, otherwise it may be troublesome.

For a large service, a server may not be able to solve the problem. a lb (LoadBalancer) is required to distribute traffic to several backend servers. If this LB works in NAT mode, it may cause problems. If the sourceaddress of all IP packets from LB to backend Server is the same (the internal address of LB), the TCP connection between LB and backend Server will be limited, because frequent TCP connections are established and closed, the TIME_WAIT status is left on the server, and the remote addresses corresponding to these statuses are LB, load balancer's sourceport can survive more than 60000 (2 ^ 16 = 65536,1 ~ 1023 is the reserved port, and some other ports will not be used by default). Once the port on each LB enters the Server's TIME_WAIT blacklist, it will no longer be used to establish a connection with the Server in 240 seconds, in this way, LB and Server support up to 300 connections. If there is no LB, there will be no such problem, because the remoteaddress seen by the server is a vast collection of the internet, it is enough for each address, more than 60000 ports.

At first, I thought using LB would greatly limit the number of TCP connections, but the experiment showed that this was not the case. The number of requests processed by One Windows Server 600 after LB reached, does the TIME_WAIT status not work? After observing with NetMonitor and netstat, we found that after the connection between Server and lb xxxx port enters the TIME_WAIT status, the Server receives and processes the SYN Packet of lb xxxx port, instead, it was dropped as expected. Read the books and find out the UNIX NetworkProgramming, Volume
1, Second Edition: Networking APIs: Sockets andXTI, mentioned in the middle, for BSD-derived implementation, as long as the SYN sequence number is larger than the maximum sequencenumber at the last shutdown, so the TIME_WAIT status is the same as accepting this SYN, so it is hard to calculate the BSD-derived in Windows? With this clue and keyword (BSD), finding this post is different from BSD-derived in NT4.0, But Windows
Server 2003 is NT5.2, which may be a little different.

Make an experiment, compile a Client using SocketAPI, Bind to a local port such as 2345 every time, and repeatedly establish a TCP connection to send an HTTP request with Keep-Alive = false to a Server, the implementation of Windows keeps sequencenumber increasing. Although the Server maintains the TIME_WAIT status for port 2345 of the Client, it is always able to accept new requests and will not reject them. What if the Sequence Number of SYN decreases? Socket API is also used, but this time Raw IP is used to send a SYN packet with a small sequencenumber. Net
In the Monitor, the SYN is received by the Server, and then dropped.

According to the book, BSD-derived and Windows Server 2003 have security risks, but at least this will not cause TIME_WAIT to block TCP requests. Of course, the client should cooperate, ensure that the sequence number of different TCP connections is increased or not decreased.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.