Background:
We have a service based on the OAUTH2.0 protocol to third-party licensing and information, a few years ago, the access layer, the business layer made a second migration. Brief introduction to the Business Architecture: LVS Access---> Nginx---> Tomcat
Questions:After the 1th day of migration, a number of complaints were received from several partners, including one saying that in their business clusters, there were about 20% failure rates, and the log showed that the connection was rejected.
Positioning:and developers debugging, Telnet our port is normal. Curl test request is also normal. No way, please the developer of the operation of the classmate Tcpdump grabbed a few minutes of data,
Sent over to analyze. Such as
Open a look all is gray ... It's creepy. There are also a lot of packets from the server-side rst, which is why there are 20% or so connections being rejected. Look at these.
The request basically is that the server does not return the packet after the SYN is issued. See this is a little doubt server side is not open tcp_tw_recycle and tcp_timestamps, horse
On the login Rs nginx machine view, sure enough is open. Close the tcp_tw_recycle, and then contact the developer to observe the log. It's OK!
cause:
This is defined in the standard RFC 1323 of TCPIP:
An additional mechanism could is added to the TCP, a per-host
Cache of the last timestamp received from any connection.
This value could then is used in the PAWS mechanism to reject
Old duplicate segments from earlier incarnations of the
Connection, if the timestamp clock can be guaranteed to has
Ticked at least once since the old connection was open. This
Would require that the time-wait delay plus the RTT together
Must is at least one tick of the sender's timestamp clock.
Such an extension are not part of the of the proposal of this RFC.
The approximate Chinese meaning is: there is a mechanism in the TCP protocol that caches the latest timestamp values for each host (that is, IP) coming up. The value of this cache
Can be used for paws (Protect against wrapped Sequence numbers, is a simple mechanism to prevent duplication of messages), to discard the current connection
The possible old duplicate messages in the connection. And the way Linux implements this mechanism is to enable both Net.ipv4.tcp_timestamps and net.ipv4.tcp_tw_recycle
These two options.
This mechanism does not have any problems when the client-server is one-to-one, but when the server is behind the load balancer, the load balancer does not modify
The timestamp value inside the package, and the machine on the internet is not likely to maintain the consistency of time, plus the load balancer will repeatedly use the same TCP port
What happens when you initiate a connection to an internal server:
Load balancer initiates a connection to one of the internal servers through a port, the source address is the internal address of the load balancer-the same IP
If you happen to have two times the same source port, this server has received two packets, the first packet of timestamp was saved by the server, the second package came again,
In contrast, it was found that the timestamp of the second packet was older than the first--client time was inconsistent. Based on Paws, the server determines that the second packet is a duplicate message,
Discard the
The situation is to capture the packet on the server, found that there is a SYN packet, but the server is not back ACK packet, because the SYN packet has been discarded. To verify that this
As a result, you can execute Netstat-s | grep timestamp command, see output inside passive connections rejected by timestamp a number change.
Reference:
1. http://saview.wordpress.com/2011/09/27/tcp_tw_recycle%E5%92%8Cnat%E9%80%A0%E6%88%90syn_ack%E9%97%AE%E9%A2%98/
An analysis of a complaint caused by opening tcp_tw_recycle