Summary of tcp_tw_recycle and tcp_timestamps articles

Source: Internet
Author: User

Near the approaching, people will become impetuous, during the writing code is a mess. But out of the mix is always the same, it is not recently found that a PHP script is often not connected to the server.

I'm used to having a strace command to track down a few things like this:

Shell> strace php/path/to/file Eaddrnotavail (cannot assign requested address)

The literal results seem to be related to network resource problems. Here's a little bit of a tip: When debugging, it's usually the result of looking at the Strace command from the back forward, so it's easier to find valuable information.

Looking at the current network connection, the results show that the number of time_wait is very large:

Shell> Netstat-nt | awk '/^tcp/{++state[$NF]} END {for (key in) print key, "T", State[key]} ' time_wait 28233

Repeated several tests, the result every time the problem, time_wait equals 28233, this is really a magic number! The actual reason is simple, it depends on a kernel parameter net.ipv4.ip_local_port_range:

Shell> Sysctl-a | grep port Net.ipv4.ip_local_port_range = 32768 61000

Because the port range is a closed interval, the number of ports actually available is:

Shell> echo $ ((61000-32768+1)) 28233

The problem analysis here is basically clear, the solution direction is clear, the content is limited, here does not say how to optimize the program code, but from the system to explain how to solve the problem, nothing but the following two aspects:

The first is to increase the number of locally available ports. This can be done with the following command:

shell> echo "Net.ipv4.ip_local_port_range = 10240 61000" >>/etc/sysctl.conf shell> sysctl-p

The second is to reduce the TIME_WAIT connection state. There are already a lot of relevant introductions on the network, most of which are suggestions:

shell> sysctl net.ipv4.tcp_tw_reuse=1 shell> sysctl net.ipv4.tcp_tw_recycle=1

Note: The kernel parameters are modified by the SYSCTL command, which is restored after reboot, so you can refer to the previous method if you want to persist.

These two options can be said to be immediate in terms of reducing the number of time_wait, but if you feel that the problem has been done perfectly then it is wrong to actually introduce a more complex network failure.

For a detailed description of the kernel parameters, refer to the official documentation. Here's a brief description of the tcp_tw_recycle parameter. It is used to quickly reclaim time_wait connections, but it can cause problems in a NAT environment.

RFC1323 is described in the following paragraph:

An additional mechanism could is added to the TCP, a per-hostcache of the last timestamp received from any connection. This value could then is used in the PAWS mechanism to rejectold duplicate segments from earlier incarnations of Theconnec tion, if the timestamp clock can be guaranteed to haveticked at least once since the old connection was open. Thiswould require that the time-wait delay plus the RTT togethermust is at least one tick of the sender ' s timestamp clock. Such an extension are not part of the of the proposal of this RFC.

Presumably, TCP has a behavior that caches the latest timestamp for each connection, and if the timestamp is less than the cached timestamp in subsequent requests, it is considered invalid and the corresponding packet is discarded.

Whether Linux enables this behavior depends on Tcp_timestamps and tcp_tw_recycle, because the tcp_timestamps default is on, so when the tcp_tw_recycle is turned on, the behavior is actually activated.

Now many companies use LVS to do load balancing, usually the front of a LVS, behind a number of back-end servers, which is actually NAT, when the request reaches the LVS, after it modifies the address data is forwarded to the back-end server, but does not modify the timestamp data, for the backend server, the source address of the request is LVS address , plus the port will be reused, so from the perspective of the back-end server, the original request of different clients through the LVS forwarding, it may be considered the same connection, combined with different client time may be inconsistent, so there will be a time stamp confusion phenomenon, so the data packets behind are discarded, The specific performance is usually the client sends the SYN, but the server is not responding to the ACK, you can also use the following command to confirm that the packet is constantly discarded phenomenon:

Shell> Netstat-s | grep timestamp ... packets rejects in established connections because of timestamp

If the server is in a NAT environment, for security reasons, it is usually forbidden to tcp_tw_recycle, as for the problem of too many time_wait connections, it can be mitigated by activating the Tcp_tw_reuse.

Further thinking, since Tcp_timestamps and tcp_tw_recycle must be activated at the same time to trigger this phenomenon, as long as the ban tcp_timestamps, while activating tcp_tw_recycle, you can avoid the problem of NAT packet loss, and reduce the number of time_wait connections. If the server does not rely on RFC1323, then this method should be feasible, but it is better to do more testing, in case there are other side effects.

shell> sysctl net.ipv4.tcp_timestamps=0 shell> sysctl net.ipv4.tcp_tw_recycle=1

...

Overall, the network fault itself is not very advanced, I do not want to ro-ro to write so much, but pull out the carrot to take out the mud, in the process involved in all aspects of it is worth everyone's taste, so there is this text.

Recently there have been some problems of connect failure, after analysis and test, the final confirmation and proc parameters tcp_tw_recycle/tcp_timestamps related;
1. Phenomena
The first phenomenon: module A through the NAT gateway to access the service s success, while Module B through the NAT gateway Access Service s recurrent connect failure, Packet discovery: The service S side has received a SYN packet, but did not reply synack; In addition, module a closed the TCP timestamp, and Module B opens the TCP timestamp;
The second phenomenon: module C on different hosts (turn on timestamp), access the same service s through the Nat Gateway (an egress IP), host C1 Connect succeeds, and host C2 connect fails;

2. Analysis
According to the phenomenon above the problem is obviously related to TCP timestmap; View the Linux 2.6.32 kernel source code, found that tcp_tw_recycle/tcp_timestamps are open under the conditions of the same source IP host socket in 60s The timestamp in the connect request must be incremented.
SOURCE function: Tcp_v4_conn_request (), the function is the TCP layer three-time Handshake SYN Packet processing function (server);
SOURCE snippet:
if (Tmp_opt.saw_tstamp &&
Tcp_death_row.sysctl_tw_recycle &&
(DST = Inet_csk_route_req (SK, req))! = NULL &&
(Peer = Rt_get_peer ((struct rtable *) DST))! = NULL &&
Peer->v4daddr = = saddr) {
if (Get_seconds () < Peer->tcp_ts_stamp + TCP_PAWS_MSL &&
(S32) (peer->tcp_ts-req->ts_recent) >
Tcp_paws_window) {
NET_INC_STATS_BH (Sock_net (SK), linux_mib_pawspassiverejected);
Goto Drop_and_release;
}
}
Tmp_opt.saw_tstamp: the socket supports Tcp_timestamp
Sysctl_tw_recycle: Tcp_tw_recycle option is enabled on the native system
tcp_paws_msl:60s, this condition determines that the last TCP communication for that source IP occurred within 60s
Tcp_paws_window:1, this condition determines that the last TCP communication of the source IP is timestamp greater than the current TCP

Analysis: Host CLIENT1 and CLIENT2 access Servern through a NAT gateway (1 IP addresses), CLIENT1 and Client2 are not the same because timestamp time is the system boot to the current time , according to the above SYN packet processing source code, in Tcp_tw_recycle and tcp_timestamps simultaneously open conditions, timestamp large host access Servern success, and Timestmap small host access failed;

Parameters:/proc/sys/net/ipv4/tcp_timestamps-control timestamp option on/off
/proc/sys/net/ipv4/tcp_tw_recycle-reduces the timeout period for timewait socket release

3. Workaround
echo 0 >/proc/sys/net/ipv4/tcp_tw_recycle;
Tcp_tw_recycle By default is off, there are many servers, in order to improve performance, the option is turned on;
To address these issues, it is recommended that you turn off the tcp_tw_recycle option instead of timestamp, because tcp_tw_recycle is not working if the TCP timestamp is turned off, and the TCP The timestamp can be opened and functioning independently.
SOURCE function: tcp_time_wait ()
SOURCE snippet:
if (tcp_death_row.sysctl_tw_recycle && tp->rx_opt.ts_recent_stamp)
RECYCLE_OK = Icsk->icsk_af_ops->remember_stamp (SK);
......

if (Timeo < RTO)
Timeo = RTO;

if (RECYCLE_OK) {
Tw->tw_timeout = RTO;
} else {
Tw->tw_timeout = Tcp_timewait_len;
if (state = = tcp_time_wait)
Timeo = Tcp_timewait_len;
}

Inet_twsk_schedule (TW, &tcp_death_row, Timeo,
Tcp_timewait_len);

Timestamp and tw_recycle at the same time, the timewait state socket release timeout time and RTO-related; otherwise, the time-out is Tcp_timewait_len, that is, 60s;

This parameter is described in the kernel description document as follows:
Tcp_tw_recycle-boolean
Enable Fast recycling time-wait sockets. Default value is 0.
It should not being changed without advice/request of technical
Experts.

Source: http://blog.sina.com.cn/s/blog_781b0c850100znjd.html

On some highly concurrent webserver, for the port to be able to recover quickly, open the net.ipv4.tcp_tw_recycle, and when the net.ipv4.tcp_tw_recycle is turned off, Kernal will not check the packet timestamp of the end machine Open the tcp_tw_reccycle, it will check the timestamp, very unfortunate move the Cmwap sent packets of time stamp is disorderly jump, so the server will take the "backwards" timestamp packet as "recycle TW connection retransmission data, not a new request", so lost not to return the package, resulting in a large number of drop packets.

#ExtMail专业版 # These two days in dealing with a new customer's strange problem, the branch office most users can access, but a few users sometimes can not access, by narrowing the fault encirclement, the current preliminary discovery is the kernel parameter net.ipv4.tcp_tw_recycle = 1 problem, Set to 0 to initially dismiss the fault.

Through this fault, warning us in the daily procedures, systems, such as changes, changes, restarts and other operations, we need to strictly follow the process of careful testing, assessment of the revised risk and the problem of back-up and solutions; In particular , the kernel parameter changes must be understood thoroughly, can not be blindly modified . Then proceed with the gradual release, to avoid the impact of the global failure. Try to reduce the failure rate.

Article summaries for tcp_tw_recycle and tcp_timestamps

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.