Analysis of TCP server receiving syn but not replying syn ack
Analysis of TCP server receiving syn but not replying syn ack
-- Lvyilong316
I encountered a strange situation when analyzing a customer's problem recently. The customer opened a port on the server, but it still failed to connect to the client via telnet. Through packet capture on the server side, we found that the syn subnode of the client has arrived, but the server did not respond. Check the current number of connections and find that the number of connections is not large. Therefore, it is caused by the full connection queue. Later, I suddenly remembered that the net. ipv4.tcp _ tw_recycle option may cause this problem, so I disabled this option and solved the problem. Analyze the cause here.
Some servers (of course, the client can also) in order to avoid occupying connections in the TIME_WAIT status, to speed up the collection of TIME_WAIT status, the net. ipv4.tcp _ tw_recycle option is usually enabled. Of course, the effectiveness of this option depends on the enabling of the net. ipv4.tcp _ timestamps Option. Although enabling this option can accelerate the collection of TIME_WAIT connections, another problem is introduced. Let's take a look at the working mechanism of the tcp_tw_recycle option:
After the tcp_tw_recycle option is enabled, when the connection enters the TIME_WAIT status, the timestamp of the corresponding remote host arriving at the Shard is recorded. If a new shard arrives on the same host and the timestamp is smaller than the previously recorded timestamp, the corresponding data packet is discarded (rfc1323 ).
Whether or not Linux enables this behavior depends on tcp_timestamps and tcp_tw_recycle. Because tcp_timestamps is enabled by default, this behavior is actually activated when tcp_tw_recycle is enabled.
Nowadays, many companies use LVS for load balancing, usually the first LVS and multiple backend servers. This is actually NAT. When a request arrives at LVS, it modifies the address data and forwards it to the backend server, but does not modify the timestamp data. For the backend server, the request source address is the LVS address, and the port is reused, therefore, from the perspective of backend servers, requests from different clients are forwarded through LVS, which may be considered the same connection, and the time of different clients may be inconsistent, therefore, the timestamp is disordered, and the subsequent data packets are discarded. The specific performance is usually the SYN sent by the client, but the server does not respond to ACK, you can also run the following command to check whether packets are discarded:
Shell> netstat-s | grep timestamp
... Packets rejects in established connections because of timestamp
If the server is in a NAT environment, tcp_tw_recycle is usually prohibited for security reasons. As for the problem of too many TIME_WAIT connections, you can activate tcp_tw_reuse to mitigate (only for the client ).
Of course, disabling the tcp_timestamps option can also avoid this problem:
Set tcp_timestamps = 0 in sysctl. conf or use the command sysctl-w net. ipv4.tcp _ timestamps = 0.
However, it is recommended that you disable the tcp_tw_recycle option instead of timestamp, because enabling tcp_tw_recycle does not work when tcp timestamp is disabled, while tcp timestamp can be enabled and used independently. In addition, tcp timestamp is related to other options, such as tcp_tw_reuse.