Why the TCP connection was accidentally reset

Source: Internet
Author: User
Tags ack connection reset

Today in the server stress test, there is a very strange situation, and the server to establish a connection will be successful, but will soon be reset (reset) off. It took me half a day to find out why, and I wrote down the process and the results to share with you.

Server normal logic is: accept the connection, wait for the user to register the message, processing other requests, if the connection for a period of time no activity, then actively shut down the connection.

Client logic is: After the connection with the server, immediately send the registration message, and then send a request every time. There are tens of thousands of clients connecting to a server at the same time, and reconnect immediately when a connection error occurs.

When an error occurs, the client reports the following series of errors:

Recv:connection Reset by peer
Recv:connection Reset by peer
Recv:connection Reset by peer
Recv:connection Reset by peer
Recv:connection Reset by peer
Recv:connection Reset by peer
Recv:connection Reset by peer
Recv:connection Reset by peer
Recv:connection Reset by peer
Recv:connection Reset by peer
Recv:connection Reset by peer
Recv:connection Reset by peer
Recv:connection Reset by peer
Recv:connection Reset by peer
Recv:connection Reset by peer
Recv:connection Reset by peer
Recv:connection Reset by peer
Recv:connection Reset by peer
Recv:connection Reset by peer

Further tests have found that only if the number of clients exceeds a certain number of cases will this happen. So the Linux process is associated with the number of open file descriptors (the Linux socket is also a file descriptor) is limited (the number of a few days ago increased this limit), but what exactly is the connection is not known.

Starting to think that the server program logic actively shut down the connection, but according to the results of the capture, the server does not send a TCP fin message, the following is a typical connection to establish and reset the process:

14:01:03.567888 IP 192.168.6.45.36692 > 192.168.6.46.8080:s 1231228012:1231228012 (0) win 5792 <mss 1460,sackOK, Timestamp 174530727 168899918,nop,wscale 8>
14:01:03.567969 IP 192.168.6.46.8080 > 192.168.6.45.36692:s 909133089:909133089 (0) Ack 1231228013 win 5792 <MSS 146 0,sackok,timestamp 168900338 174530727,nop,wscale 8>
14:01:03.567978 IP 192.168.6.45.36692 > 192.168.6.46.8080:. Ack 1 win <nop,nop,timestamp 174530727 168900338>
14:01:03.568022 IP 192.168.6.45.36692 > 192.168.6.46.8080:p 1:76 (a) Ack 1 win <nop,nop,timestamp 174530727 1689 00338>
14:01:03.568110 IP 192.168.6.46.8080 > 192.168.6.45.36692:. ACK-<nop,nop,timestamp win-168900338 174530727>
14:01:03.568769 IP 192.168.6.46.8080 > 192.168.6.45.36692:r 1:1 (0) Ack-win-<nop,nop,timestamp 168900338 17453 0727>

Check the log of the server, and do not actively close the record of the connection, even did not accept the new connection. This means that the connection is closed by the underlying protocol stack, but why the protocol stack is actively closed.

Using Telnet to connect to the server is not normal, but it is turned off normally (with a normal fin sequence) instead of being reset.

is related to the ready connection queue length of the listening socket. However, when the connection queue is full, the protocol stack does nothing, but lets the client supermarket to send the SYN message again, which is inconsistent with the occurrence.

There appears to be a limit on the number of open file descriptors, so what happens when you can't open the file descriptor again? Why is there a different situation with the Telnet connection? The answer to these questions is to look at the nature of the phenomenon, my analysis is as follows:

First, the connection is indeed established, stating that the protocol stack accepts this connection, of course, the application is definitely not accepted, otherwise the number of open file descriptors exceeds the maximum. On the other hand, the protocol stack will of course close this connection, but not immediately, should be shut down (not validated) when the application accepts the connection (accept), and accept will produce a too many open files error. This also indicates that the socket this file descriptor was opened at accept, and that the connection established in the protocol stack does not have a corresponding socket descriptor.

Telnet connection is different from not sending data to the server, the normal way to shut down, this is the protocol requirements or Linux implementation of the special case has not been verified.

         There is another conclusion that data received before accept is still accepted and answered, and that the data on the connection is stored on the protocol stack. This is in line with my previous concept.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.