Implementation of RTO in Linux

Source: Internet
Author: User
Implementation of RTO in Linux. Implementation of RTO in Linux. eliminate code logic problems, TCP-related bugs, and explore the implementation of RTO in Linux

You have encountered a Network timeout problem recently. troubleshoot the problem according to your ideas.

1. eliminate code logic issues, TCP-related bugs, kernel parameters, and other issues;

2. during KVM troubleshooting, the problem of timeout occurs again on different KVM of the same host.

It is found that most of the abnormal connections lasts about 1 S. through packet capture analysis, we can see that the packets are retransmitted, and the retransmission time is fixed to 1 second.

Why is the retransmission time 1 Second? what are the related standards and actual implementations?

This article mainly discusses this part of content (2.6.32-358 based on centos)

RFC standard


RTO is determined by the current network condition (RTT) and then by an algorithm. This part of related content is mentioned in "TCP/IP details Volume 1", but it is outdated.

After checking the RFC, RFC6298 is the latest related to retransmission timeout. RFC1122 is updated and RFC2988 is discarded.

I will give a brief introduction to the content. if you are interested, click it.

RFC6298

1. repeat the basic RTO calculation method:

First, there is a time parameter RTO_MIN obtained through the clock.

Initialization:

First Calculation:

Future Computing:

The minimum RTO value is 1 second. The maximum value must be greater than 60 seconds.

2 for multiple retransmission of the same package, the Karn algorithm must be used, that is, the double increase seen just now

In addition, RTT sampling cannot use retransmission packets unless the timestamps parameter is enabled (RTT can be accurately calculated using this parameter)

3 when 4 * RTTVAR tends to 0, the obtained value must be close to RTO_MIN.

The more accurate the clock, the better. the better the error is within Ms.

4 RTO timer management

(1) when sending data (including re-transmission), check whether the timer is started. if not, start. Delete the timer when the ACK of the data is received.

(2) use RTO = RTO * 2 for Backoff

(3) New FALLBACK feature: When the timer expires while waiting for SYN packets, and the current TCP implementation uses RTO of less than 3 seconds, the RTO of the connection pair must be reset to 3 seconds. The reset RTO will be used for the transmission of formal data (after the three-way handshake ends)

Analyze the actual implementation of linux

Send syn packet with three-way handshake

123456 01:00:00. 129688 IP 172.16.3.14.1868> 172.16.10.40.80: Flags [S], seq 3774079837, win 14600, options [mss 1460, nop, nop, sackOK, nop, wscale 7], length. 129065 IP 172.16.3.14.1868> 172.16.10.40.80: Flags [S], seq 3774079837, win 14600, options [mss 1460, nop, nop, sackOK, nop, wscale 7], length. 129063 IP 172.16.3.14.1868> 172.16.10.40.80: Flags [S], seq 3774079837, win 14600, options [mss 1460, nop, nop, sackOK, nop, wscale 7], length. 129074 IP 172.16.3.14.1868> 172.16.10.40.80: Flags [S], seq 3774079837, win 14600, options [mss 1460, nop, nop, sackOK, nop, wscale 7], length. 129072 IP 172.16.3.14.1868> 172.16.10.40.80: Flags [S], seq 3774079837, win 14600, options [mss 1460, nop, nop, sackOK, nop, wscale 7], length. 129128 IP 172.16.3.14.1868> 172.16.10.40.80: Flags [S], seq 3774079837, win 14600, options [mss 1460, nop, nop, sackOK, nop, wscale 7], length 0

Double increment from 1 second

It is worth noting that after the fifth timeout, the upper-layer connection will be notified of timeout only after the sixth timeout, which is 63 seconds in total.

Send a three-way handshake syncak packet

1234567 01:17:20. 084839 IP 172.16.3.15.2535> 172.16.3.14.80: Flags [S], seq 1297135388, win 14600, options [mss 1460, nop, nop, sackOK, nop, wscale 7], length. 084908 IP 172.16.3.14.80> 172.16.3.15.2535: Flags [S.], seq 1194120443, ack 1297135389, win 14600, options [mss 1460, nop, nop, sackOK, nop, wscale 7], length. 284093 IP 172.16.3.14.80> 172.16.3.15.2535: Flags [S.], seq 1194120443, ack 1297135389, win 14600, options [mss 1460, nop, nop, sackOK, nop, wscale 7], length. 284088 IP 172.16.3.14.80> 172.16.3.15.2535: Flags [S.], seq 1194120443, ack 1297135389, win 14600, options [mss 1460, nop, nop, sackOK, nop, wscale 7], length. 284095 IP 172.16.3.14.80> 172.16.3.15.2535: Flags [S.], seq 1194120443, ack 1297135389, win 14600, options [mss 1460, nop, nop, sackOK, nop, wscale 7], length. 284097 IP 172.16.3.14.80> 172.16.3.15.2535: Flags [S.], seq 1194120443, ack 1297135389, win 14600, options [mss 1460, nop, nop, sackOK, nop, wscale 7], length. 284093 IP 172.16.3.14.80> 172.16.3.15.2535: Flags [S.], seq 1194120443, ack 1297135389, win 14600, options [mss 1460, nop, nop, sackOK, nop, wscale 7], length 0

Double increment from 1 second

Normal packet transmission

12345678910111213141516 01:32:20. 443757 IP 172.16.3.15.2548> 172.16.3.14.80: Flags [P.], seq 3319667389: 3319667400, ack 1233846614, win 115, length 110. 644600 IP 172.16.3.15.2548> 172.16.3.14.80: Flags [P.], seq 3319667389: 3319667400, ack 1233846614, win 115, length 110:32:21. 046579 IP 172.16.3.15.2548> 172.16.3.14.80: Flags [P.], seq 3319667389: 3319667400, ack 1233846614, win 115, length 110:32:21. 850632 IP 172.16.3.15.2548> 172.16.3.14.80: Flags [P.], seq 3319667389: 3319667400, ack 1233846614, win 115, length 110. 458555 IP 172.16.3.15.2548> 172.16.3.14.80: Flags [P.], seq 3319667389: 3319667400, ack 1233846614, win 115, length 110. 674594 IP 172.16.3.15.2548> 172.16.3.14.80: Flags [P.], seq 3319667389: 3319667400, ack 1233846614, win 115, length 110. 106601 IP 172.16.3.15.2548> 172.16.3.14.80: Flags [P.], seq 3319667389: 3319667400, ack 1233846614, win 115, length 110. 970567 IP 172.16.3.15.2548> 172.16.3.14.80: Flags [P.], seq 3319667389: 3319667400, ack 1233846614, win 115, length 110:33:11. 698415 IP 172.16.3.15.2548> 172.16.3.14.80: Flags [P.], seq 3319667389: 3319667400, ack 1233846614, win 115, length 110:34:03. 154300 IP 172.16.3.15.2548> 172.16.3.14.80: Flags [P.], seq 3319667389: 3319667400, ack 1233846614, win 115, length 110:35:46. 065892 IP 172.16.3.15.2548> 172.16.3.14.80: Flags [P.], seq 3319667389: 3319667400, ack 1233846614, win 115, length 110:37:46. 065382 IP 172.16.3.15.2548> 172.16.3.14.80: Flags [P.], seq 3319667389: 3319667400, ack 1233846614, win 115, length 110:39:46. 064917 IP 172.16.3.15.2548> 172.16.3.14.80: Flags [P.], seq 3319667389: 3319667400, ack 1233846614, win 115, length 110. 064466 IP 172.16.3.15.2548> 172.16.3.14.80: Flags [P.], seq 3319667389: 3319667400, ack 1233846614, win 115, length 110. 064060 IP 172.16.3.15.2548> 172.16.3.14.80: Flags [P.], seq 3319667389: 3319667400, ack 1233846614, win 115, length 110:45:46. 063675 IP 172.16.3.15.2548> 172.16.3.14.80: Flags [P.], seq 3319667389: 3319667400, ack 1233846614, win 115, length 11

Increase from 0.2 seconds, up to 120 seconds, a total of 15 times

It is worth noting that it starts from 32 minutes and ends at 47 minutes, that is, about 15 minutes and 25 seconds.

Does linux support the FALLBACK feature? let's do a simple test.

123456789101112131415161718192021222324252627282930 After the server enables iptables, the client connects to the server and closes iptables23: 35: 01.036565 IP 172.16.3.14.6071> 172.16.10.40.12345: Flags [S], seq 2364912154, win 14600, options [mss 1460, nop, nop, sackOK, nop, wscale 7], length. 036152 IP 172.16.3.14.6071> 172.16.10.40.12345: Flags [S], seq 2364912154, win 14600, options [mss 1460, nop, nop, sackOK, nop, wscale 7], length 023:35:04. 036126 IP 172.16.3.14.6071> 172.16.10.40.12345: Flags [S], seq 2364912154, win 14600, options [mss 1460, nop, nop, sackOK, nop, wscale 7], length 023:35:08. 036127 IP 172.16.3.14.6071> 172.16.10.40.12345: Flags [S], seq 2364912154, win 14600, options [mss 1460, nop, nop, sackOK, nop, wscale 7], length. 036131 IP 172.16.3.14.6071> 172.16.10.40.12345: Flags [S], seq 2364912154, win 14600, options [mss 1460, nop, nop, sackOK, nop, wscale 7], length. 036842 IP 172.16.10.40.12345> 172.16.3.14.6071: Flags [S.], seq 3634006739, ack 2364912155, win 14600, options [mss 1460], length. 036896 IP 172.16.3.14.6071> 172.16.10.40.12345: Flags [.], ack 3634006740, win 14600, length 0 after the server enables iptables, the client sends a packet and closes iptables23: 35: 48.129273 IP 172.16.3.14.6071> 172.16.10.40.12345 within 15 times of timeout: flags [P.], seq 2364912155: 2364912156, ack 3634006740, win 14600, length. 129120 IP 172.16.3.14.6071> 172.16.10.40.12345: Flags [P.], seq 2364912155: 2364912156, ack 3634006740, win 14600, length. 129070 IP 172.16.3.14.6071> 172.16.10.40.12345: Flags [P.], seq 2364912155: 2364912156, ack 3634006740, win 14600, length. 129068 IP 172.16.3.14.6071> 172.16.10.40.12345: Flags [P.], seq 2364912155: 2364912156, ack 3634006740, win 14600, length. 129802 IP 172.16.10.40.12345> 172.16.3.14.6071: Flags [.], when the server does not enable iptables, the client sends a packet at 23:36:15. 217231 IP 172.16.3.14.6071> 172.16.10.40.12345: Flags [P.], seq 2364912156: 2364912157, ack 3634006740, win 14600, length. 217766 IP 172.16.10.40.12345> 172.16.3.14.6071: Flags [.], ack 2364912157, win 14600, length 0, server enable iptables, and client sends the packet at 23:36:26. 658172 IP 172.16.3.14.6071> 172.16.10.40.12345: Flags [P.], seq 2364912157: 2364912158, ack 3634006740, win 14600, length. 859055 IP 172.16.3.14.6071> 172.16.10.40.12345: Flags [P.], seq 2364912157: 2364912158, ack 3634006740, win 14600, length. 261065 IP 172.16.3.14.6071> 172.16.10.40.12345: Flags [P.], seq 2364912157: 2364912158, ack 3634006740, win 14600, length. 065106 IP 172.16.3.14.6071> 172.16.10.40.12345: Flags [P.], seq 2364912157: 2364912158, ack 3634006740, win 14600, length. 673132 IP 172.16.3.14.6071> 172.16.10.40.12345: Flags [P.], seq 2364912157: 2364912158, ack 3634006740, win 14600, length. 889068 IP 172.16.3.14.6071> 172.16.10.40.12345: Flags [P.], seq 2364912157: 2364912158, ack 3634006740, win 14600, length. 321091 IP 172.16.3.14.6071> 172.16.10.40.12345: Flags [P.], seq 2364912157: 2364912158, ack 3634006740, win 14600, length. 185135 IP 172.16.3.14.6071> 172.16.10.40.12345: Flags [P.], seq 2364912157: 2364912158, ack 3634006740, win 14600, length. 913091 IP 172.16.3.14.6071> 172.16.10.40.12345: Flags [P.], seq 2364912157: 2364912158, ack 3634006740, win 14600, length 1

From this test, we can find that when the RTT exceeds 1 second during three handshakes, the RTO of the data sending phase is 3 seconds (as is the case when the server SYNACK times out)

After a normal RTT, RTO converges to about MS.

Let's see how timestamps supports


1234567891011121314151617 After the server enables iptables, the client connects to the server and closes iptables23: 47: 47.754316 IP 172.16.3.14.8603> 172.16.10.40.12345: Flags [S], seq 479022248, win 14600, options [mss 1460, sackOK, TS val 2336007392 ecr 0, nop, wscale 7], length. 754079 IP 172.16.3.14.8603> 172.16.10.40.12345: Flags [S], seq 479022248, win 14600, options [mss 1460, sackOK, TS val 2336008392 ecr 0, nop, wscale 7. 754088 IP 172.16.3.14.8603> 172.16.10.40.12345: Flags [S], seq 479022248, win 14600, options [mss 1460, sackOK, TS val 2336010392 ecr 0, nop, wscale 7. 754083 IP 172.16.3.14.8603> 172.16.10.40.12345: Flags [S], seq 479022248, win 14600, options [mss 1460, sackOK, TS val 2336014392 ecr 0, nop, wscale 7. 754094 IP 172.16.3.14.8603> 172.16.10.40.12345: Flags [S], seq 479022248, win 14600, options [mss 1460, sackOK, TS val 2336022392 ecr 0, nop, wscale 7. 754683 IP 172.16.10.40.12345> 172.16.3.14.8603: Flags [S.], seq 697602971, ack 479022249, win 14480, options [mss 1460, nop, nop, TS val 4044659641 ecr 2336022392], length. 754742 IP 172.16.3.14.8603> 172.16.10.40.12345: Flags [.], ack 697602972, win 14600, options [nop, nop, TS val 2336022392 ecr 4044659641], length 0. after the server enables iptables, the client sends data packets and closes iptables23 within 15 times of timeout: 48: 11.944170 IP 172.16.3.14.8603> 172.16.10.40.12345: Flags [P.], seq 479022249: 479022250, ack 697602972, win 14600, options [nop, nop, TS val 2336031582 ecr 4044659641], length. 145036 IP 172.16.3.14.8603> 172.16.10.40.12345: Flags [P.], seq 479022249: 479022250, ack 697602972, win 14600, options [nop, nop, TS val 2336031783 ecr 4044659641], length. 547084 IP 172.16.3.14.8603> 172.16.10.40.12345: Flags [P.], seq 479022249: 479022250, ack 697602972, win 14600, options [nop, nop, TS val 2336032185 ecr 4044659641], length. 351106 IP 172.16.3.14.8603> 172.16.10.40.12345: Flags [P.], seq 479022249: 479022250, ack 697602972, win 14600, options [nop, nop, TS val 2336032989 ecr 4044659641], length. 959080 IP 172.16.3.14.8603> 172.16.10.40.12345: Flags [P.], seq 479022249: 479022250, ack 697602972, win 14600, options [nop, nop, TS val 2336034597 ecr 4044659641], length. 175092 IP 172.16.3.14.8603> 172.16.10.40.12345: Flags [P.], seq 479022249: 479022250, ack 697602972, win 14600, options [nop, nop, TS val 2336037813 ecr 4044659641], length. 607088 IP 172.16.3.14.8603> 172.16.10.40.12345: Flags [P.], seq 479022249: 479022250, ack 697602972, win 14600, options [nop, nop, TS val 2336044245 ecr 4044659641], length 1

After timestamps is enabled, the FALLBACK mechanism does not work if RTO is reset to 3 seconds.

Fine-tuning RTO computing in linux

The actual implementation of RTO computing in linux is different from that in RFC documents. if you only follow the RFC document to search for details, then the actual RTO estimation will go astray.

1 According to the previous section, we can find that he sets the minimum RTO value to 200 ms (even 50 ms on ubuntu, and 1 second is recommended for RFC ), the maximum value is set to 120 seconds (RFC enforces 60 seconds or more)

2. Based on my analysis of linux code, in the case of sharp RTT jitter, the implementation of linux reduces the RTT interference caused by sharp changes, making the RTO trend chart smoother.

This is reflected in two points of fine-tuning:

Fine-tuning 1

When the following conditions are met:

RTTVAR "/>

It indicates that r' fluctuates too much, and the RTT value ratio is also greater than RTTVAR.

Therefore

The RFC document is

As you can see, the smoothing factor multiplied by 1/8 compared to the RFC document indicates that r' has less impact on RTTVAR, making RTTVAR smoother and RTO smoother.

Fine-tuning 2

When RTTVAR is reduced, it will perform a smooth processing on RTTVAR, so that RTO will not fall too far and a steep trend chart will appear.

Here, RTTVAR refers to the value calculated based on RTT. this value limits the RTTVAR value after the lower limit (RTO_MIN) and compared with the RTTVAR value when the previous RTT is detected, smooth processing with a 1/4 coefficient

Why not handle the increase? I think it is okay to increase RTO, but if you reduce a small amount, it may cause spurous retransmission (For more information about this term, see the RFC document mentioned above)

Manual intervention to modify RTO

Back to the initial question, can we shorten the RTO value, and how can this RTO value be estimated based on the actual implementation of linux?

Obviously, RTO initial values (including FALLBACK) cannot be changed. This part is fixed and written in the code.

The RTO value other than the three-way handshake is predictable.

Assuming that the network is stable during estimation, the RTT never changes to R (otherwise, it will be extremely complicated due to fine-tuning 1 and 2)

SRTT will always be R, and RTTVAR will always be 0.5R

Otherwise

Therefore, you only need to change the RTO_MIN value to significantly affect the RTO value.

RTO_MIN settings

RTO_MIN settings are implemented based on ip route

12345678910111213 Root@localhost.localdomain ~ # Ping www. baidu. comPING www.a.shifen.com (180.97.33.107) 56 (84) bytes of data.64 bytes from 180.97.33.107: icmp_seq = 1 ttl = 51 time = 30.8 ms64 bytes from 180.97.33.107: icmp_seq = 2 ttl = 51 time = 29.9 ms after obtaining Baidu's IP address [root@localhost.localdomain ~] # Ip route add 180.97.33.108/32 via 172.16.3.1 rto_min 20 [root@localhost.localdomain ~] # Nc www.baidu.com 80 [root@localhost.localdomain ~] # Ss-eipn' (dport =: www) 'state Recv-Q Send-Q Local Address: Port Peer Address: PortESTAB 0 0 172.16.3.14: 14149 180.97.33.108: 80 users :( ("nc", 7162,3) ino: 48057454 sk: ffff88023905adc0sack cubic wscale: 27/13 rto: 81 rtt. 5 cwnd: 10 send 4.3 Mbps rcv_space: 14600

Because RTO_MIN <2R, RTO = 3R = 27*3 = 81

If it is an intranet, the RTT is very small.

1234567 Root@localhost.localdomain ~ # Ip route add 172.16.3.16/32 via 172.16.3.1 rto_min 20 [root@localhost.localdomain ~] # Nc 172.16.3.16 22SSH-2.0-OpenSSH_5.3 [root@localhost.localdomain ~] # Ss-eipn '(dport =: 22) 'state Recv-Q Send-Q Local Address: Port Peer Address: PortESTAB 0 0 172.16.3.14: 57578 172.16.3.16: 22 users :( ("nc", 7272,3) ino: 48059707 sk: ffff88023b7c7000sack cubic wscale: 7,7 rto: 21 rtt: 1/0. 5 ato: 40 cwnd: 10 send 116.8 Mbps rcv_space: 14600

Because RTO_MIN> 2R, RTO = R + RTO_MIN = 1 + 20 = 21

If you are confident about the entire intranet network, you can directly apply it to all connections without setting the target IP address.

1 Ip route change dev eth0 rto_min 20 ms
Summary

1 linux's timeout retransmission implementation is generally referred to in RFC, but there are some minor adjustments:

RFC has only one RTO initial value, which is 1 second. In linux, the RTO of the three-way handshake package is set to 1 second, and the initial time of other packages is set to 0.2 seconds.

Due to the imperfect RFC algorithm, the actual implementation of linux reduces the RTT interference caused by sharp RTT jitter, making the RTO trend chart smoother.

2. the SYN retransmission time of the connection cannot be adjusted unless the kernel is re-compiled, but the push package can adjust the retransmission time.

3. in a stable network, assume that the minimum rto value is RTO_MIN.

2RTT), RTO = RTT + RTO. _ MIN "/>

The implementation of explain (RTO) has recently encountered a Network timeout problem, which should be checked according to the general idea. 1. eliminate code logic problems and possible TCP-related bugs ,...

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.