Background
Recently wrote a script to monitor the business Server, the main principle is to use shell script (the machine running the shell called the Monitoring machine ) to call the project team dedicated interface test tools, the designated business Server for business operations, Determine if the Business Server is functioning properly based on the return results of the interface test tool and use the crontab setting to perform a monitoring script every minute.
Before the interface test tool starts, it is used to telnet ip port
determine whether the port of the Business Server is open. The monitoring work is on schedule ...
However, when a new business Server is monitored, it is found that telnet to the business Server takes a long time and has exceeded 1 minutes, which means that the script cannot be executed within the scheduled 1-minute monitoring interval, so I explored and researched the problem.
Exploration and research
Some Web pages tell you that you can set variables in certain files TMOUT
to control the time-out of Telnet, and that you can use nc
commands instead of Telnet to achieve similar effects and set time-outs-before you can get an insight, Test the length of time required to telnet the Business Server:
Time Telnet 123.59.208.201 62715
Trying 123.59.208.201...telnet:connect to address 123.59.208.201:connection timed outtelnet 123.59.208.201 62715 0.00s user 0.00s system 0% CPU 2:07.29 Total
The results show 2:07.29
that the duration is, that is 127s
.
Common time-outs, should be more than a multiple of minutes, such as 60s, 3600s, or should be a multiple of 10, such as 30s, 200s and so on. But repeated tests were made several times, all of 127
s. Its particularity has aroused my curiosity, because 127 is not an ordinary number, it is exactly 2的7次方-1
. So, I fixed the direction of the search. Finally, a link is seen in the results page returned from the search, with the original link:
http://www.chengweiyang.cn/2017/02/18/linux-connect-timeout/?utm_source=tuicool&utm_medium=referral
This article solves my doubts at this time and opens up a new journey of discovery.
The origin of 127
tcpdump
to use the capture package, run the following command:
sudo tcpdump-i eth0-nn ' host 123.59.208.201 '
Tcpdump:verbose output suppressed, use-v OR-VV for full protocol decodelistening on eth0, Link-type EN10MB (Ethernet), Capture Size 65535 bytes
Open a new terminal window and execute the telnet
command:
Date +%y%m%d-%h:%m:%s.%n;time telnet 123.59.208.201 62715;date +%y%m%d-%h:%m:%s.%n;
20170526-18:04:23.764397558trying 123.59.208.201...telnet:connect to address 123.59.208.201:connection timed Outtelnet 123.59.208.201 62715 0.00s user 0.00s system 0% CPU 2:07.22 total20170526-18:06:30.989480391
At this point, the tcpdump
terminal will synchronously output the packet information generated by the Telnet command:
18:04:23.765507 IP 10.253.4.55.34680 > 123.59.208.201.62715:flags [S], seq 922947731, Win 29200, options [MSS 1460,sac Kok,ts Val 13801993 ECR 0,nop,wscale 7], length 018:04:24.768182 IP 10.253.4.55.34680 > 123.59.208.201.62715:flags [S] , Seq 922947731, Win 29200, options [MSS 1460,sackok,ts val 13802996 ECR 0,nop,wscale 7], length 018:04:26.772188 IP 10.25 3.4.55.34680 > 123.59.208.201.62715:flags [S], seq 922947731, Win 29200, options [MSS 1460,sackok,ts Val 13805000 ECR 0,nop,wscale 7], length 018:04:30.780189 IP 10.253.4.55.34680 > 123.59.208.201.62715:flags [S], seq 922947731, Win 292 XX, Options [MSS 1460,sackok,ts val 13809008 ECR 0,nop,wscale 7], length 018:04:38.796205 IP 10.253.4.55.34680 > 123.59 .208.201.62715:flags [S], seq 922947731, Win 29200, options [MSS 1460,sackok,ts val 13817024 ECR 0,nop,wscale 7], length 018:04:54.828196 IP 10.253.4.55.34680 > 123.59.208.201.62715:flags [S], seq 922947731, Win 29200, options [MSS 1460,sa Ckok,ts Val 13833056 ECR 0,nop,wscale 7], length 018:05:26.860210 IP 10.253.4.55.34680 > 123.59.208.201.62715:flags [S], seq 922947731, Win 29 $, options [MSS 1460,sackok,ts val 13865088 ECR 0,nop,wscale 7], length 0
Depending on telnet
the start time of the command and tcpdump
the time of the output:
04:23.7604:24.7604:26.7704:30.7804:38.7904:54.8205:26.8606:30.98
It is not difficult to see the law: 它们的差值(单位s)是双倍递增的:1、2、4、8、16、32、64
.
Internal principle TCP Timeout and retransmission
Executing telnet
a command is actually the process of trying to establish a TCP connection. There are two concepts involved when establishing a TCP connection: one is RTT and one is RTO.
RTT (Round-Trip Time) 即,往返时间
RTO (Retransmission Time Out) 即,重传超时
The time interval above is obviously RTO.
The process and details of establishing a TCP connection are not mentioned here, and can be found in the TCPIP (Volume 1) if necessary.
Depending on the kernel version of the monitor, the relevant code for Linux is viewed:
Http://elixir.free-electrons.com/linux/v3.10/source/include/net/tcp.h
#define TCP_RTO_MAX ((unsigned) (120*hz))
#define Tcp_rto_min ((unsigned) (HZ/5))
Get:
- Hz is 1s. So the minimum value of RTO is the maximum value of 200ms,rto is 120s, i.e. 2 minutes.
- From
tcpdump
the results, the monitoring machine RTO is 1s, the reason is unknown .
- According to the algorithm part of the code, the RTO retransmission interval is exponential, and the time index increases with the increase of retransmission frequency. When the RTO is less than
TCP_RTO_MAX
, the RTO doubles every time, and when it is over, it is TCP_RTO_MAX
no longer doubled, but fixed TCP_RTO_MAX
, that is, 2 minutes.
Instead, the Linux kernel variable net.ipv4.tcp_syn_retries
is used to tell the kernel how many initial SYN messages to resend when attempting to create a new TCP connection. Can be viewed through the sysctl command.
View variables
tcp_syn_retries
The value
sudo sysctl net.ipv4.tcp_syn_retries
Net.ipv4.tcp_syn_retries = 6
Because the tcp_syn_retries
control is the number of resend , so add the initial 1 times, so the above tcpdump output in a total of 7.
Setting variables
tcp_syn_retries
The value
Set net.ipv4.tcp_syn_retries
and test telnet
the duration:
sudo sysctl Net.ipv4.tcp_syn_retries=1
Net.ipv4.tcp_syn_retries = 1
Verify:
Time Telnet 123.59.208.201 62715
Trying 123.59.208.201...telnet:connect to address 123.59.208.201:connection timed outtelnet 123.59.208.201 62715 0.00s user 0.00s system 0% CPU 3.008 Total
When it is visible, net.ipv4.tcp_syn_retries = 1
telnet takes about 3s of time. Predictably, if you use tcpdump to grab the bag, there should be two, and 3s is the rto respectively take 1s and 2s results.
sudo sysctl net.ipv4.tcp_syn_retries=2
Net.ipv4.tcp_syn_retries = 2
Time Telnet 123.59.208.201 62715
Trying 123.59.208.201...telnet:connect to address 123.59.208.201:connection timed outtelnet 123.59.208.201 62715 0.00s user 0.00s system 0% CPU 7.010 Total
net.ipv4.tcp_syn_retries = 2
, Telnet takes about 7s and follows the RTO exponential increment rule.
sudo sysctl net.ipv4.tcp_syn_retries=5
Net.ipv4.tcp_syn_retries = 5
Time Telnet 123.59.208.201 62715
Trying 123.59.208.201...telnet:connect to address 123.59.208.201:connection timed outtelnet 123.59.208.201 62715 0.00s user 0.00s system 0% CPU 1:03.14 Total
net.ipv4.tcp_syn_retries = 5
, Telnet takes about 63s and follows the RTO exponential increment rule.
Then, according to the RTO exponential increment rule and TCP_RTO_MAX
the setting, when the net.ipv4.tcp_syn_retries
continuous increase, the RTO does not correspond to the exponential increment down, net.ipv4.tcp_syn_retries
set to 10 to see the effect:
sudo sysctl net.ipv4.tcp_syn_retries=10
Net.ipv4.tcp_syn_retries = 10
Verify:
Date +%y%m%d-%h:%m:%s.%n;time telnet 123.59.208.201 62715;date +%y%m%d-%h:%m:%s.%n;
20170526-18:47:25.899074041trying 123.59.208.201...telnet:connect to address 123.59.208.201:connection timed Outtelnet 123.59.208.201 62715 0.00s user 0.00s system 0% CPU 10:08.64 total20170526-18:57:34.541575652
View the results of the tcpdump:
18:47:25.900294 IP 10.253.4.55.36266 > 123.59.208.201.62715:flags [S], seq 3545463057, Win 29200, options [MSS 1460,sa Ckok,ts Val 16384128 ECR 0,nop,wscale 7], length 018:47:26.902184 IP 10.253.4.55.36266 > 123.59.208.201.62715:flags [S ], seq 3545463057, Win 29200, options [MSS 1460,sackok,ts val 16385130 ECR 0,nop,wscale 7], length 018:47:28.908195 IP 10. 253.4.55.36266 > 123.59.208.201.62715:flags [S], seq 3545463057, Win 29200, options [MSS 1460,sackok,ts Val 16387136 E CR 0,nop,wscale 7], length 018:47:32.916178 IP 10.253.4.55.36266 > 123.59.208.201.62715:flags [S], seq 3545463057, win 29200, Options [MSS 1460,sackok,ts val 16391144 ECR 0,nop,wscale 7], length 018:47:40.940213 IP 10.253.4.55.36266 > 12 3.59.208.201.62715:flags [S], seq 3545463057, Win 29200, options [MSS 1460,sackok,ts val 16399168 ECR 0,nop,wscale 7], le Ngth 018:47:56.972221 IP 10.253.4.55.36266 > 123.59.208.201.62715:flags [S], seq 3545463057, Win 29200, options [MSS 1 460,sackok,ts Val 1641520,nop,wscale ECR 7], length 018:48:29.004213 IP 10.253.4.55.36266 > 123.59.208.201.62715:flags [S], seq 3545463057, Win 29200, options [MSS 1460,sackok,ts val 16447232 ECR 0,nop,wscale 7], length 018:49:33.132192 IP 10.253.4.55.36266 ; 123.59.208.201.62715:flags [S], seq 3545463057, Win 29200, options [MSS 1460,sackok,ts val 16511360 ECR 0,nop,wscale 7], Length 018:51:33.580213 IP 10.253.4.55.36266 > 123.59.208.201.62715:flags [S], seq 3545463057, Win 29200, options [MSS 1460,sackok,ts Val 16631808 ECR 0,nop,wscale 7], length 018:53:33.900217 IP 10.253.4.55.36266 > 123.59.208.201.62715: Flags [S], seq 3545463057, Win 29200, options [MSS 1460,sackok,ts val 16752128 ECR 0,nop,wscale 7], length 018:55:34.22021 6 IP 10.253.4.55.36266 > 123.59.208.201.62715:flags [S], seq 3545463057, Win 29200, options [MSS 1460,sackok,ts Val 16 872448 ECR 0,nop,wscale 7], length 0
When the RTO 18:48:29.00 -> 18:49:33.13
reaches 64s, the next increment rule should change to 128s, and in fact, the 18:49:33.13 -> 18:51:33.58 -> 18:53:33.90 -> 18:55:34.22
RTO becomes a constant 120s ( TCP_RTO_MAX
).
Range of values
Set to 0 times will error:
sudo sysctl net.ipv4.tcp_syn_retries=0
Sysctl:setting key "Net.ipv4.tcp_syn_retries": Invalid argumentnet.ipv4.tcp_syn_retries = 0
A number of pages mentioned on net.ipv4.tcp_syn_retries
the maximum value can be set to 255, the default value is 5, on this monitoring machine found that the maximum can only be set to 127:
sudo sysctl net.ipv4.tcp_syn_retries=127
Net.ipv4.tcp_syn_retries = 127
Error-preserving when attempting to set to 128:
sudo sysctl net.ipv4.tcp_syn_retries=128
Sysctl:setting key "Net.ipv4.tcp_syn_retries": Invalid argumentnet.ipv4.tcp_syn_retries = 128
There is a clear record on the relevant web page of the Linux kernel- 最大不超过127、默认值为6
:
Https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt
Tcp_syn_retries-integer
Number of times initial SYNs for an active TCP connection attempt
would be retransmitted. should not being higher than 127. Default value
is 6, which corresponds to 63seconds till the last retransmission
With the current initial RTO of 1second. With this final timeout
For the active TCP connection attempt would happen after 127seconds.
However, the indicated net.ipv4.tcp_syn_retries
maximum value can be set to 255 of the Web page, there are many Web pages also appear to have certain credibility, such as:
Https://www.frozentux.net/ipsysctl-tutorial/chunkyhtml/tcpvariables.html
3.3.24. tcp_syn_retries
The tcp_syn_retries variable tells the kernel how many times to try to retransmit the initial SYN packet for an active TCP Connection attempt.
This variable takes a integer value, but should not being set higher than 255 since each retransmission would consume huge am Ounts of time as well as some amounts of bandwidth. Each connection retransmission takes aproximately 30-40 seconds. The default setting is 5, which would leads to an aproximate of seconds delay before the connection times out.
With the results of the test and the authority of the site, I am more reliable 最大不超过127、默认值为6
, but also very curious, the content of these pages from where?
Permanently modified
The changes mentioned above are net.ipv4.tcp_syn_retries
temporarily modified, and are invalidated after rebooting the system. If you want to take effect permanently, you need to modify the corresponding configuration file, please search by yourself.
In addition, because the variable net.ipv4.tcp_syn_retries
is system-wide and it affects the number of retransmissions that TCP establishes the connection, it is recommended to confirm with the system administrator before modifying it.
Summarize
The beauty of the name of exploration and research, is actually stepping on their own because of the lack of basic knowledge buried in the pit .
Although the origin of the 127 is clear, but it brings a lot of new doubts and puzzled, as the next step of the pit of the primer.
Reference links
The process of stepping on the pit, can always find some good resources site, such as:
- https://www.kernel.org
- http://lxr.linux.no/
- http://free-electrons.com/docs/
I think this 困而学之者
is the best reward. It's also about letting me be 学而知之者
.
Related knowledge
- TCP protocol
- tcpdump command
- Sysctl command
- Telnet command
Why did Tcp--telnet return after 127s?