Article title: tcp connection recovery after network disconnection. Linux is a technology channel of the IT lab in China. Includes basic categories such as desktop applications, Linux system management, kernel research, embedded systems, and open source.
Encountered a problem in the project. The two machines use a socket to establish a TCP connection for two-way communication, causing heavy traffic. in this case, the network is disconnected by setting a packet loss rate of 100% on the router. at this time, the socket certainly cannot send packets, and a large number of retransmission occurs. then, cancel the settings on the vro and restore the network. as a result, the traffic from the TCP connection client to the server is normal, but the traffic from the server to the client fails, whatever you do, the returned value is 0, and errno is EAGAIN.
I used tcpdump to check the package data at this time (tc2 is server, tc1 is client ):
12:08:21. 020291 IP tc1.corp.com. 42171> tc2.corp.com. 3003: S 4009389430: 4009389430 (0) win 5840
12:08:21. 020571 IP tc2.corp.com. 3003> tc1.corp.com. 42171: R 0: 0 (0) ack 4009389431 win 0
12:08:38. 934329 IP tc2.corp.com. 3903> tc1.corp.com. 3904: P 2398055392: 2398056153 (761) ack 2538876742 win 724
12:08:38. 934519 IP tc1.corp.com. 3904> tc2.corp.com. 3903:. ack 2165 win 13756
12:08:39. 958457 IP tc1.corp.com. 3904> tc2.corp.com. 3903: P 1: 763 (762) ack 2165 win 13756
12:08:39. 958485 IP tc2.corp.com. 3903> tc1.corp.com. 3904:. ack 763 win 1448
12:08:39. 958653 IP tc1.corp.com. 3904> tc2.corp.com. 3903: P 763: 881 (118) ack 2165 win 13756
12:08:39. 958660 IP tc1.corp.com. 3904> tc2.corp.com. 3903: P 881: 997 (116) ack 2165 win 13756
12:08:39. 958719 IP tc2.corp.com. 3903> tc1.corp.com. 3904:. ack 997 win 1448
12:08:39. 958890 IP tc1.corp.com. 3904> tc2.corp.com. 3903: P 997: 1114 (117) ack 2165 win 13756
12:08:39. 958898 IP tc1.corp.com. 3904> tc2.corp.com. 3903: P 1114:1232 (118) ack 2165 win 13756
12:08:39. 958903 IP tc1.corp.com. 3904> tc2.corp.com. 3903: P 1232: 1349 (117) ack 2165 win 13756
12:08:39. 958971 IP tc2.corp.com. 3903> tc1.corp.com. 3904:. ack 1349 win 1448
12:08:39. 959141 IP tc1.corp.com. 3904> tc2.corp.com. 3903: P 1349: 1466 (117) ack 2165 win 13756
12:08:39. 959149 IP tc1.corp.com. 3904> tc2.corp.com. 3903: P 1466: 1583 (117) ack 2165 win 13756
12:08:39. 959154 IP tc1.corp.com. 3904> tc2.corp.com. 3903: P 1583: 1700 (117) ack 2165 win 13756
12:08:39. 959222 IP tc2.corp.com. 3903> tc1.corp.com. 3904:. ack 1700 win 1448
Tc2 does not send its own data, but just blindly ACK the data from tc1, waiting for half an hour, still so. Why is it not sent?
The final result is that we set TCP_NODELAY on the socket. Remove this setting and restart the program. after the network is disconnected and restored, TCP works normally in both directions. You can also use tcpdump to see:
16:05:38. 782427 IP tc2.corp.alimama.com. 3903> tc1.corp.alimama.com. 3904: P 0: 887 (887) ack 1 win 26064
16:05:38. 782619 IP tc1.corp.alimama.com. 3904> tc2.corp.alimama.com. 3903:. ack 3783 win 25352
16:05:38. 782634 IP tc2.corp.alimama.com. 3903> tc1.corp.alimama.com. 3904:. 3783: 5231 (1448) ack 1 win 26064
16:05:38. 782637 IP tc2.corp.alimama.com. 3903> tc1.corp.alimama.com. 3904:. 5231: 6679 (1448) ack 1 win 26064
16:05:38. 782890 IP tc1.corp.alimama.com. 3904> tc2.corp.alimama.com. 3903:. ack 5231 win 25352
16:05:38. 782896 IP tc2.corp.alimama.com. 3903> tc1.corp.alimama.com. 3904:. 6679: 8127 (1448) ack 1 win 26064
16:05:38. 782898 IP tc2.corp.alimama.com. 3903> tc1.corp.alimama.com. 3904:. 8127: 9575 (1448) ack 1 win 26064
16:05:38. 782901 IP tc1.corp.alimama.com. 3904> tc2.corp.alimama.com. 3903:. ack 6679 win 25352
16:05:38. 782904 IP tc2.corp.alimama.com. 3903> tc1.corp.alimama.com. 3904:. 9575: 11023 (1448) ack 1 win 26064
16:05:38. 783183 IP tc1.corp.alimama.com. 3904> tc2.corp.alimama.com. 3903:. ack 8127 win 25352
16:05:38. 783188 IP tc2.corp.alimama.com. 3903> tc1.corp.alimama.com. 3904:. 11023: 12471 (1448) ack 1 win 26064
16:05:38. 783191 IP tc1.corp.alimama.com. 3904> tc2.corp.alimama.com. 3903:. ack 9575 win 25352
16:05:38. 783193 IP tc2.corp.alimama.com. 3903> tc1.corp.alimama.com. 3904:. 12471: 13919 (1448) ack 1 win 26064
16:05:38. 783196 IP tc1.corp.alimama.com. 3904> tc2.corp.alimama.com. 3903:. ack 11023 win 25352
16:05:38. 783199 IP tc2.corp.alimama.com. 3903> tc1.corp.alimama.com. 3904:. 13919: 15367 (1448) ack 1 win 26064
16:05:38. 783201 IP tc2.corp.alimama.com. 3903> tc1.corp.alimama.com. 3904:. 15367: 16815 (1448) ack 1 win 26064
16:05:38. 783502 IP tc1.corp.alimama.com. 3904> tc2.corp.alimama.com. 3903:. ack 12471 win 25352
16:05:38. 783506 IP tc2.corp.alimama.com. 3903> tc1.corp.alimama.com. 3904:. 16815: 18263 (1448) ack 1 win 26064
16:05:38. 783509 IP tc1.corp.alimama.com. 3904> tc2.corp.alimama.com. 3903:. ack 13919 win 25352
16:05:38. 783512 IP tc2.corp.alimama.com. 3903> tc1.corp.alimama.com. 3904:. 18263: 19711 (1448) ack 1 win 26064
16:05:38. 783514 IP tc1.corp.alimama.com. 3904> tc2.corp.alimama.com. 3903:. ack 15367 win 25352
16:05:38. 783517 IP tc2.corp.alimama.com. 3903> tc1.corp.alimama.com. 3904:. 19711: 21159 (1448) ack 1 win 26064
16:05:38. 783519 IP tc1.corp.alimama.com. 3904> tc2.corp.alimama.com. 3903:. ack 16815 win 25352
Tc2 sent its own data stream this time, and tc1 started to send data to its ACK. after a while, tc1 started to send data, and the last two-way was normal.
Why cannot the socket with TCP_NODEALY be recovered after the network is ready?
Let's look at the implementation of the recv system call (2.6.9 kernel), which is traced back to the tcp_recvmsg function:
[Net/ipv4/tcp. c --> tcp_recvmsg]
813 while (-- iovlen> = 0 ){
814 int seglen = iov-> iov_len;
815 unsigned char _ user * from = iov-> iov_base;
816
817 iov ++;
818
819 while (seglen> 0 ){
820 int copy;
821
822 skb = sk-& gt; sk_write_queue.prev;
823
824 if (! Sk-> sk_send_head |
825 (copy = mss_now-skb-> len) <= 0 ){
826
[1] [2] [3] Next page