Http://wiki.apache.org/HttpComponents/FrequentlyAskedConnectionManagementQuestions1. Connections in Time_wait state
After running your HTTP application, you use the netstat command and detect a lot of connections in stateTIME_WAIT. Now you wonder why these connections is not cleaned up.
1.1. What's the time_wait state?
The TIME_WAIT state was a protection mechanism in TCP. The side thatcloses a socket connection orderly would keep the connection in statetime_wait for some time, typically betwee N 1 and 4 minutes. Thishappens After the connection is closed. It does not indicate a cleanup problem. The TIME_WAIT State protects against lossof data and data corruption. It's there to help. For Technicaldetails, there is a look at the Unix Socket FAQs, section 2.7.
1.2. Some Connections Go to time_wait, Others not
If A connection is orderly closed by your application, it'll go tothe time_wait state. If A connection is orderly closed by the server,the server keeps it in time_wait and your client doesn ' t. If aconnection is reset or otherwise dropped by your application in anon-orderly fashion, it won't go to time_wait.
Unfortunately, it won't always be obvious to you whether aconnection are closed orderly or not. This is because connections arepooled and kept open for re-use by default. HttpClient 3.x, HttpClient4, and also the standard Java httpurlconnection do. Mostapplications would simply execute requests, then read from the Responsestream, and finally close to that stream.
Closing the response stream is not the same thing as Closing the connection! Closing The response streamreturns the connection to the pool, but it'll be kept open ifpossible. This saves a lot of time if you send another request to thesame host within a few seconds, or even minutes.
Connection Pools has a limited number of connections. A Pool Mayhave 5 connections, or, or maybe only 1. When you send a request Toa host, and there are no open connection to this host in the pool, a newconnection needs to be op Ened. But if the pool was already full, an openconnection have to being closed before a new one can be opened. In Thiscase, the old connection is closed orderly and go to the time_waitstate.
When your application exits and the JVM terminates, the open connections in the pools would not be closed orderly. They is reset or cancelled, without going to time_wait. To avoid this, you should call theshutdownmethod of the connection pools your application are using beforeexiting. The standard Java httpurlconnection have no public method toshutdown it ' s connection pool.
1.3. Running out of Ports
Some applications open and orderly close a lot of connections Withina short time, for example when load-testing a serve R. A connection instate Time_wait would prevent that port number from being re-used foranother connection. That's not a error, it is the purpose oftime_wait.
TCP is configured at the operating system level, not through Java.your first action should BES to increase the number of Ephemeral Portson the machine. Windows in particular have a rather low default for Theephemeral ports. The performancewiki has tuning tips for the common operating systems, with a look at the respective Network sect ion.
only if increasing the number of ephemeral ports does not solve Yourproblem, you should consider decreasing The duration of the time_waitstate. You probably has to reduce the maximum lifetime of IP packets,as the duration of time_wait are typically twice that TIMESP An to allowfor a round-trip delay. Be aware that this would affect all applications running on the machine. Don ' t ask us how to does it, we ' re not the experts for network tuning.
There is some ways to deal and the problem at the Applicationlevel. One-is-to-send a "Connection:close" header with Eachrequest. That'll tell the server to close the connection, so it's Goesto time_wait on the other side. Of course this also disables thekeep-alive feature of connection pooling and thereby degradesperformance. If you is running load tests against a server, theuntypical behavior of your application may distort the test results. [[BR] Another-on-is-not orderly close connections. There is a trickto set So_linger to a special value, which would cause the connection Tobe reset instead of orderly closed. Note that the HttpClient API willnot support so directly, you'll have to extend or modify some Classesto implement this Hack.
Yet another is to re-use ports Thatare still blocked by a connection in time_wait. You can do this byspecifying the SO_REUSEADDR option when opening a socket. Java 1.4introduced The methodsocket.setreuseaddress for this purpose. You'll have to extend or modify some classes of the HttpClient for this too, but at least it's not a hack.
1.4. Further Reading
Unix Socket FAQ
Java.net.Socket.setReuseAddress
Discussion on the HttpClient mailing list in December 2007
Performancewiki
Netstat command line tool
http://www.softlab.ntua.gr/facilities/documentation/unix/unix-socket-faq/unix-socket-faq-2.html#ss2.7
2.7 Explain the time_wait state.
Remember that TCP guarantees all data transmitted would be delivered, if atall possible. When your close a socket, the server goes into a time_waitstate, just to being really really sure that all the data have gone t Hrough. When a socket was closed, both sides agree by sending messages to eachother that they would send no more data. This, it seemed to me was goodenough, and after the handshaking are done, the socket should be closed. The problem is two-fold. First, there is no-sure, the last ACK was communicated successfully. Second, there may is "wandering duplicates" left on the net that must is dealt with if they is delivered.
Andrew Gierth ([email protected]) helped to explain the closing sequence in the following Usenet posting:
Assume that a connection are in established state, and the client is Aboutto does an orderly release. The client ' s sequence No. Is Sc, and the server ' SIS Ss. The pipe is an empty in both directions.
Client Server ====== ====== established established (CLI ENT closes) established established <ctl=fin+ack><s eq=sc><ack=ss>------->> fin_wait_1 <<--------<ctl=ack><seq=ss><ack= sc+1> fin_wait_2 close_wait <<--------<ctl=fin +ack><seq=ss><ack=sc+1> (server closes) Last_ac K <CTL=ACK>,<SEQ=Sc+1><ACK=Ss+1>------->> time_wait CLOSED (2*MSL elapses ...) CLOSED
Note:the +1 on the sequence numbers is because the FIN counts as one byte of data. (The above diagram is equivalent to fig. From RFC 793).
Now consider what happens if the last of the those packets are dropped in thenetwork. The client has do with the connection; It has no more data Orcontrol info to send, and never would have. But the server does not knowwhether the client received all the data correctly; That's what the Lastack segment are for. Now the whether theclient got the data of the server may or may be, but that's not a issue for TCP; TCP is a reliablerotocol, and must distinguish between an orderly connection closewhere all data is Tran Sferred, and a connection abortwhere data mayor may not have been lost.
So, if this last packet is dropped, the server would retransmit it (it is,after all, an unacknowledged segment) and would ex Pect to see a suitableack segment in reply. If the client went straight to CLOSED, the onlypossible response to that Retransmit would is a RST, which would indicateto The server that data had been lost and when in fact it had not been.
(Bear on mind that the server's FIN segment may, additionally, Containdata.)
Disclaimer:this is my interpretation of the RFCs (I had read all thetcp-related ones I could find), but I had not attem pted to Examineimplementation source code or trace actual connections in order toverify it. I am satisfied that's the logic is correct, though.
More Commentarty from Vic:
The second issue was addressed by Richard Stevens ([email protected], author of "Unix Network programming", see1.5 Where can I get source code for the book [book title]?). I have a put together quotes from someof he postings and email which explain this. I has brought togetherparagraphs from different postings, and has made as few changes as possible.
From Richard Stevens ([email protected]):
If the duration of the time_wait state were just to handle TCP's Full-duplex close, then the time would be much smaller, a nd it would is some function of the current RTOS (retransmission timeout), not the MSL (the packet lifetime).
A couple of points about the TIME_WAIT state.
- the end that sends the first FIN goes into the time_wait state, because thatis the end of that sends the final ACK. If The other end's fin is lost, orif the final ACK is lost, have the end that sends the first FIN maintain state about t He connection guarantees that it had enough information to retransmit the final ACK.
- Realize that TCP sequence numbers wrap around after 2**32 bytes has been transferred. Assume a connection between a.1500 (host A, port) and b.2000. During the connection one segment is lost and retransmitted. But the segment was not really lost, it was held by some intermediate router and then re-injected into the network. (This is called a "wandering duplicate".) But at the time between the packet being lost & retransmitted, and then reappearing, the connection is closed (without any problems) and then another connection are established between the same host, same port (that is, a.1500 and b.2000; th IS is called another "incarnation" of the connection). But the sequence numbers chosen for the new incarnation just happen to overlap with the sequence number of the wandering D Uplicate that's about to reappear. (This is indeed possible, given the sequence numbers be chosen for TCP connections.) Bingo, deliver the data from the wandering duplicate (the Previous incarnation of the connection) to the new incarnation of the connection. To avoid the the same incarnation of the connection to being reestablished until the TIME_WAIT state Termi Nates. Even the time_wait state doesn ' t all solve the second problem, given what's called time_wait assassination. RFC 1337 has more details.
- The reason the duration of the TIME_WAIT state is 2*MSL are that the maximum amount of time a packet can wander around A network is assumed to be MSL seconds. The factor of 2 is for the round-trip. The recommended value for MSL was seconds, but berkeley-derived implementations normally use of seconds instead. This means a time_wait delay between 1 and 4 minutes. Solaris 2.x does indeed use the recommended MSL of seconds.
A wandering duplicate is a packet so appeared to be lost and wasretransmitted. But it wasn ' t really lost ... some router had problems,held on to the packet for a while (order of seconds, could is a min Uteif The TTL is large enough) and then re-injects the packet back Intothe network. But by the time it reappears, the application that Sentit originally have already retransmitted the data contained in that Packet.
Because of these potential problems with time_wait assassinations, one should not avoid the time_wait state by SE Tting the SO_LINGER
option to send an RST instead of the normal TCP connection termination (fin/ack/fin/ack). The TIME_WAIT state was there for a reason; It ' s your friend and it's there to help you:-)
I have a long discussion of just this topic in my just-released "tcp/ipillustrated, Volume 3". The TIME_WAIT state is indeed, one of the Mostmisunderstood features of TCP.
I ' m currently rewriting "Unix Network Programming" (see1.5 Where can I get source code for the book [book title]?). And would include lots more on the this topic, as it is often confusing and misunderstood.
An additional note from Andrew:
Closing A socket: if have not SO_LINGER
been called on a socket and then was not close()
supposed to discard data. This is true on SVR4.2 (and,apparently, on all NON-SVR4 systems) but apparently not on SVR4; Theuse of either shutdown()
or SO_LINGER
seems to being required toguarantee delivery of all data.
http://blog.csdn.net/liuxuejin/article/details/8552677
When using httpclient crawl, Netstat discovers many time_wait connections