Problem Description:
Recently replaced upgraded server, there has been a problem, is that many users can ping and tracert through our site, but when the landing is not normal, occasionally refresh can open, very abnormal.
Catch packet found that the user came over the IP can to the server, but the server did not reply.
Thanks to Huang's analysis!!!!
Here is a summary of some of the solutions found online.
=============================================================================================================== =============
Original: http://blog.csdn.net/gzh0222/article/details/8000508
For some users can ping our site, but the connection time-out server does not have any response, suspect that the problem is in the HTTP three handshake link, it is decided to capture the packet analysis:
1, the problem of the machine:
2, the normal machine:
3. Finding problems
From the packet capture data, the Web server returns an ACK packet to the TCP SYN packet for the problematic machine and the normal machine system, but there are problems with the TCP SYN packet being emitted, sometimes responding, and sometimes not responding. When not responding, the TCP connection between the terminal and the Web server does not establish properly, causing the page to not open. Comparing these two packets, there is a difference in the timestamp, the problematic machine emits a TCP SYN packet with a timestamp, and therefore suspects that the timestamp problem is causing the failure.
4, solve the problem
Since the suspicion is caused by a timestamp, let's proceed to analyze if the time stamp of the machine that is having the problem is removed will solve the problem. For the time-stamped TCP SYN packet does not respond to the problem, read the relevant data that the cause of the problem is that the issue of the registry in the system tcp1323opts this option, will cause it in the package time stamp, after the NAT, if the previous same port is used, And the timestamp is greater than the time stamp in the SYN sent by this link, the server will ignore this SYN, do not return to the Syn-ack message, the performance of the user can not complete the TCP3 handshake, so that the Web page cannot be opened. In the business hours, if the user Nat port is not used, it can open normally, when the business is busy, the NAT port is reused frequently, it is difficult to divide the unused port, which causes this problem.
There are two ways to solve this:
(1) The variable is modified on the server
First we look at the default value of our server net.ipv4.tcp_timestamps, if the value of the 0 name is not the problem caused, if it is 1 we need to set this value to 1.
How to view the default values: [[email protected] ~]# Cat/proc/sys/net/ipv4/tcp_timestamps
method to modify this value: vim/etc/sysctl.conf add net.ipv4.tcp_timestamps=0
(2) Modify the client's registry tcp1323opts setting to 0.
Note:
Tcp1323Opts
Description: This parameter controls the RFC 1323 Timestamp and window scaling options. By default, timestamps are enabled with
The window is scaled, but can be controlled using the flag bit. 0-bit Control window scaling, 1-bit control timestamp.
Value of 0 (option to disable RFC 1323)
A value of 1 (window scaling is enabled only)
A value of 2 (timestamp only)
A value of 3 (two options are enabled)
Net.ipv4.tcp_timestamps=0
Description: The time stamp prevents winding of the serial number. A 1Gbps link will definitely encounter a previously used serial number. Timestamps allow the kernel to accept this "exception" packet. You need to turn it off here.
A value of 0 (time stamp is disabled)
A value of 1 (timestamp enabled)
Only when the client and the server are on the time stamp, it will be able to ping can not establish a TCP three handshake, so as a service provider, it is impossible to ensure that all users are off the timestamp, this function, so we must turn off the timestamp, so that the user can provide normal service.
Use this command to make it effective immediately:/sbin/sysctl-p
=============================================================================================================== =============
Original: http://blog.sina.com.cn/s/blog_781b0c850100znjd.html
Recently there have been some problems of connect failure, after analysis and test, the final confirmation and proc parameters tcp_tw_recycle/tcp_timestamps related;
1. Phenomena
The first phenomenon: module A through the NAT gateway to access the service s success, while Module B through the NAT gateway Access Service s recurrent connect failure, Packet discovery: The service S side has received a SYN packet, but did not reply synack; In addition, module a closed the TCP timestamp, and Module B opens the TCP timestamp;
The second phenomenon: module C on different hosts (turn on timestamp), access the same service s through the Nat Gateway (an egress IP), host C1 Connect succeeds, and host C2 connect fails;
2. Analysis
According to the phenomenon above the problem is obviously related to TCP timestmap; View the Linux 2.6.32 kernel source code, found that tcp_tw_recycle/tcp_timestamps are open under the conditions of the same source IP host socket in 60s The timestamp in the connect request must be incremented.
SOURCE function: Tcp_v4_conn_request (), the function is the TCP layer three-time Handshake SYN Packet processing function (server);
SOURCE snippet:
if (Tmp_opt.saw_tstamp &&
Tcp_death_row.sysctl_tw_recycle &&
(DST = Inet_csk_route_req (SK, req))! = NULL &&
(Peer = Rt_get_peer ((struct rtable *) DST))! = NULL &&
Peer->v4daddr = = saddr) {
if (Get_seconds () < Peer->tcp_ts_stamp + TCP_PAWS_MSL &&
(S32) (peer->tcp_ts-req->ts_recent) >
Tcp_paws_window) {
NET_INC_STATS_BH (Sock_net (SK), linux_mib_pawspassiverejected);
Goto Drop_and_release;
}
}
Tmp_opt.saw_tstamp: the socket supports Tcp_timestamp
Sysctl_tw_recycle: Tcp_tw_recycle option is enabled on the native system
tcp_paws_msl:60s, this condition determines that the last TCP communication for that source IP occurred within 60s
Tcp_paws_window:1, this condition determines that the last TCP communication of the source IP is timestamp greater than the current TCP
Analysis: Host CLIENT1 and CLIENT2 access Servern through a NAT gateway (1 IP addresses), CLIENT1 and Client2 are not the same because timestamp time is the system boot to the current time , according to the above SYN packet processing source code, in Tcp_tw_recycle and tcp_timestamps simultaneously open conditions, timestamp large host access Servern success, and Timestmap small host access failed;
Parameters:/proc/sys/net/ipv4/tcp_timestamps-control timestamp option on/off
/proc/sys/net/ipv4/tcp_tw_recycle-reduces the timeout period for timewait socket release
3. Workaround
echo 0 >/proc/sys/net/ipv4/tcp_tw_recycle;
Tcp_tw_recycle By default is off, there are many servers, in order to improve performance, the option is turned on;
To address these issues, it is recommended that you turn off the tcp_tw_recycle option instead of timestamp, because tcp_tw_recycle is not working if the TCP timestamp is turned off, and the TCP The timestamp can be opened and functioning independently.
SOURCE function: tcp_time_wait ()
SOURCE snippet:
if (tcp_death_row.sysctl_tw_recycle && tp->rx_opt.ts_recent_stamp)
RECYCLE_OK = Icsk->icsk_af_ops->remember_stamp (SK);
......
if (Timeo < RTO)
Timeo = RTO;
if (RECYCLE_OK) {
Tw->tw_timeout = RTO;
} else {
Tw->tw_timeout = Tcp_timewait_len;
if (state = = tcp_time_wait)
Timeo = Tcp_timewait_len;
}
Inet_twsk_schedule (TW, &tcp_death_row, Timeo,
Tcp_timewait_len);
Timestamp and tw_recycle at the same time, the timewait state socket release timeout time and RTO-related; otherwise, the time-out is Tcp_timewait_len, that is, 60s;
This parameter is described in the kernel description document as follows:
Tcp_tw_recycle-boolean
Enable Fast recycling time-wait sockets. Default value is 0.
It should not being changed without advice/request of technical
Experts.
Original link: http://blog.sina.com.cn/u/2015038597
=============================================================================================================== =============
Original: http://www.verydemo.com/demo_c167_i3289.html
The Linux server could not establish a TCP connection timestamp net.ipv4.tcp_timestamps
I. The situation is manifested
1. HTTP access to the site in the company intranet:
Linux host failure: Curl and packet capture analysis, found that the server does not respond to requests from Linux clients, cannot establish a TCP connection, and the browser returns "Unable to connect to servers"
Windows host is OK
2.http Access Quality decreased:
The keynote shows that the quality of the visit declined after the new structure was launched, mainly
2.1. Access Prompt "Unable to connect to server"
2.2. Only a few people encounter this kind of failure, and not every visit in the day will encounter, but the appearance of good and bad phenomenon
Two. Process
The Google search keyword "Server cannot establish a TCP connection" is available directly.
After turning a few pages, found this post: "Http://www.sunchis.com/html/os/linux/2012/0518/413.html".
Look, and our company intranet performance is exactly the same, but a variety of problems (1 for this basic knowledge is weak, 2 for no time to verify this configuration)
Then the problem lasted n long ... Always thought it was an internal device problem.
Late, the bold on-line to enable the parameter "Net.ipv4.tcp_timestamps = 0", did the next test, found that the failure, the original failure machine every access is normal!
However, it is unclear how the principle, just to understand, is also in the NAT Internet user (shared with others export IP address), if your timestamp is smaller than others, then the server will not respond to your TCP request, to ignore this entry, will net.ipv4.tcp_timestamps = 0 ( /etc/sysctl.conf)
Three. Summary
Later in the study, saw a more detailed blog, speaking very detailed, also introduced a new problem: http://huoding.com/2012/01/19/142
====== Small Copy ======
In fact, the Linux server originally to timestamp (timestamps) default is not open, whether Linux enabled this behavior depends on Tcp_timestamps and tcp_tw_recycle, because the tcp_timestamps default is open, So when Tcp_tw_recycle is turned on, this behavior is actually activated.
What is Net.ipv4.tcp_tw_recycle, a search for a time_wait. The recovery parameters of the connection are basically
When Net.ipv4.tcp_timestamps is not set (the default is on) and Net.ipv4.tcp_tw_recycle is turned on, the pit-daddy error occurs, but note that it only appears in the NAT network environment. Moreover, most blogs, as well as some Daniel, have said to open the net.ipv4.tcp_tw_recycle ...
====== Small Copy ======
Four. Unfinished matters
As mentioned in http://huoding.com/2012/01/19/142 above:
1. (not verified) tw_recycle function is invalid after closing timestamps
2. (not verified) a new solution to time_wait too many connections: Net.ipv4.tcp_max_tw_buckets = 10000 sets a maximum value, but the downside is that the system log prompts: Tcp:time wait bucket table Overflow
Users can ping and tracert through the website, but can't open