Analysis of tcp time-WAIT and CLOSE-WAIT statuses
I. Server exceptions
If an exception occurs on the server, either of the following conditions is true:
1. The server maintains a large number of TIME_WAIT statuses.
2. The server maintains a large number of CLOSE_WAIT statuses.
Ii. TIME_WAIT status
1. There are two reasons for the TIME_WAIT status:
1) make the process of closing 4 handshakes more reliable; the last ACK of 4 handshakes is sent by the active closing party. If this ACK is lost, the passive closing party sends a FIN again. If the active shutdown party can maintain a 2MSL TIME_WAIT status, there is a greater chance that the lost ACK will be sent again.
If the active end does not maintain the TIME_WAIT status, but is in the CLOSED status, when the active end ACK is lost, the passive end resends the final FIN, and the active end will respond to the RST, after receiving the packet, the passive end interprets this section as an error (SocketException of connection reset will be thrown in java ).
Therefore, to terminate a tcp full-duplex connection normally, you must handle the loss of any of the four sub-nodes during the termination process. The A end that actively closes the connection must maintain the TIME_WAIT status.
2) The old duplicate segments are allowed to disappear in the network to prevent damage to the transmission of new normal links. Lost duplicate is very common in the actual network. It is often because of A router failure and the path cannot be converged. As A result, A packet performs A similar endless jump between routers A, B, and C. The IP header has a TTL, which limits the maximum number of hops of a packet in the network. Therefore, this packet has two kinds of fate: either the TTL is changed to 0 and disappears in the network; alternatively, the router path converges before the TTL value is 0, and the remaining TTL hops finally reach the destination. However, it is a pity that TCP sent a packet exactly the same as it earlier through the timeout retransmission mechanism and reached its destination before it, therefore, its fate is destined to be abandoned by the TCP protocol stack.
Another concept is incarnation connection, which refers to the new connection that is exactly the same as the next socket pair, called incarnation of previous connection. Lost duplicate with incarnation connection will cause a fatal error to our transmission.
As we all know, TCP is stream, and the arrival sequence of all packets is inconsistent. serial numbers are concatenated by the TCP protocol stack. Assume that an incarnation connection receives seq = 1000, when a lost duplicate is seq = 1000, len = 1000, tcp considers this lost duplicate to be valid and put it in the receive buffer, resulting in transmission errors. A 2MSL TIME_WAIT status ensures that all lost duplicate disappears to avoid errors caused to new connections.
2. Why is this status designed to take the initiative to close this party:
(1) The last ack is to take the initiative to close one party.
(2) As long as one party maintains the TIME_WAIT status, it can avoid the re-establishment of incarnation connection in 2MSL without both parties.
3. How to Treat 2MSL TIME_WAIT correctly
RFC requires that the socket pair cannot start an incarnation connection when it is in TIME_WAIT. However, most TCP implementations impose more stringent restrictions. During the 2MSL wait period, the local port used in the socket cannot be used by default. If A 10.234.5.5: 1234 and B 10.55.55.60: 6666 establish A connection, and A closes the connection actively, as long as the port is 1234 on the side, no matter what the port and ip address of the other side are, you are not allowed to start the service again. Obviously, this is more restrictive than RFC. RFC only requires that the socket pair is inconsistent. in implementation, as long as the port is in TIME_WAIT, the connection is not allowed.
This restriction does not matter to active openers, because temporary ports are generally used. However, for passive openers, server is generally used. This is a tragedy, because the server is generally familiar with ports. For example, if the http port is usually 80, it is impossible to allow this service to be unable to get up within 2MSL. The solution is to set the SO_REUSEADDR option for the socket on the server, so that even if the port is well known to be in the TIME_WAIT status, the service can still be started on this port. Of course, although the SO_REUSEADDR option is available, the sockt pair restriction still exists. For example, in the above example, A listens on port 1234 through the SO_REUSEADDR option, but if we connect to port B through port 6666, the TCP protocol will tell us that the connection fails, the reason is Address already in use.