TCP knowledge segment

Last Update:2013-11-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

TCP sending scenarios I understand the ACK sending scenario. In the following scenarios, TCP sends the ack packet: 1. when one packet is received, start the MS timer and wait until the MS timer reaches the point (the second packet does not come), The ack for this package is sent. This is called "delayed sending ". 2. After receiving one packet, start the MS timer. The Ms timer has not arrived yet, and the second packet comes again (two packets have one ack ). 3. After receiving one package, start the MS timer. The timer has not timed out yet. It is just necessary to send a message to the other party. So ack followed the confirmation of this package. This is called "sending with sending ". 4. Whenever TCP receives an out-of-order data that exceeds the expected sequence number, it always sends an ACK with the confirmation sequence number as the expected sequence number. 5. Window update is also called window opening (when the receiving end window reaches the maximum, all the data in the receiving cache is pushed to the process, leading to blank receiving cache). The sending end can continue sending the notification. 6. under normal circumstances, the response to the other side's active probe is TCP-RST sending scenario 1. connect a non-existent port; 2. send data to a closed connection. 3. send data to a crashed peer (established before connection); 4. close (sockfd) directly discards unread data from the receiving buffer and sends an RST to the recipient. This is controlled by the SO_LINGER option; 5. a restarts and receives the active probe of B. a sends rst to notify B. In any status, the TCP socket enters the initial state of CLOSED as long as it receives the RST package. It is worth noting that the RST packet segment will not cause any response from the other end, and the other end will not be confirmed at all. The party receiving the RST will terminate the connection. The program behavior is as follows: In the blocking model, the kernel cannot actively notify the application layer of errors. Only when the application layer actively calls an IO system call such as read () or write, the kernel sends an error to the application-layer peer RST. In a non-blocking model, select or epoll returns sockfd readable. When the application layer reads it, read () reports an error in RST. TCP-caused exception shutdown means that the normal method for terminating a connection is to send FIN. All queued data in the sending buffer is sent before sending FIN. Normally, no data is lost. However, sometimes we may send an RST packet segment instead of FIN to close a connection. This is called an exception closure. By default, the process closes the socket normally. If you need to disable it abnormally, use the SO_LINGER option to control it. Disabling an abnormal connection has two advantages for the Application: (1) discard any meaningless data to be sent and immediately send the RST packet segment; (2) the receiver of the RST uses the close method to identify whether the other end performs abnormal close or normal close. It is worth noting that the RST packet segment will not cause any response from the other end, and the other end will not be confirmed at all. The party receiving the RST will terminate the connection. The program behavior is as follows: In the blocking model, the kernel cannot actively notify the application layer of errors. Only when the application layer actively calls an IO system call such as read () or write, the kernel sends an error to the application-layer peer RST. In a non-blocking model, select or epoll returns sockfd readable. When the application layer reads it, read () reports an error in RST. This option is used in the implementation of haproxy. The TCP_KEEPALIVE KEEPALIVE mechanism of the TCP option is the TCP layer (implemented by non-application-layer Business Code) stipulated by the TCP protocol to detect the connectivity of the TCP connection between the TCP local end and the other host. This prevents the server from being unable to perceive any bad conditions on the client, and always waits on this TCP connection. This option allows you to set the details of this detection behavior, as shown in the following code: int keepAlive = 1; // non-0 value, enable the keepalive attribute int keepIdle = 60; // if the connection does not exchange any data within 60 seconds, the TCP layer test int keepInterval = 5; // The test packet sending interval is 5 seconds int keepCount = 3; // number of times the test was attempted. if the response is received for the first test packet, setsockopt (sockfd, SOL_SOCKET, SO_KEEPALIVE, (void *) & keepAlive, sizeof (keepAlive) will not be sent for the next two times )); setsockopt (sockfd, SOL_TCP, TCP_KEEPIDLE, (void *) & keepIdle, sizeof (keepIdle); setsockopt (sockfd, SOL_TCP, TCP_KEEPI NTVL, (void *) & keepInterval, sizeof (keepInterval); setsockopt (sockfd, SOL_TCP, TCP_KEEPCNT, (void *) & keepCount, sizeof (keepCount )); after this option is set, if no data is exchanged in any direction of the connection corresponding to this set of interfaces within 60 seconds, the TCP layer will automatically send a keepalive probe (keepalive probe) to the other side ). This is a TCP shard that the other party must respond. It can lead to the following three situations: the recipient receives everything normally: the desired ACK response. After 60 seconds, TCP starts the next round of detection. The other party has crashed and restarted: respond with RST. The pending error of the Set interface is set to ECONNRESET. The other party has no response: for example, the client has been disconnected from the network, or the client has crashed directly. Try three times at a specified interval, and give up without a response. The pending error of the Set interface is set to ETIMEOUT. Global settings can be changed to/etc/sysctl. conf, add: net. ipv4.tcp _ keepalive_intvl = 5net. ipv4.tcp _ keepalive_probes = 3net. ipv4.tcp _ keepalive_time = 60 in the program: In the blocking model, when the TCP layer detects that the Peer socket is no longer available, the kernel cannot actively notify the application layer of an error, only when the application layer actively calls an IO system call such as read () or write () Can the kernel notify the application layer of errors. In a non-blocking model, select or epoll returns sockfd readable. When the application layer reads it, read () reports an error. Experience: in fact, when we are working on a server program, the keepalive detection mechanism on the client is basically not dependent on this TCP layer. Instead, we implement a set of request response messages at the application layer to implement such a function at the application layer. SO_RCVBUF and SO_SNDBUF SO_RCVBUF SO_SNDBUF of TCP options first define a concept: Each TCP socket has a sending buffer and a receiving buffer in the kernel, the full duplex mode of TCP and the Sliding Window of TCP depend on the two independent buffers and the filling status of the buffer. The receiving buffer caches the data into the kernel. If the application process does not call read to read the data, the data is always cached in the receiving buffer of the corresponding socket. Again, no matter whether the process reads the socket or not, the data sent from the peer end will be received by the kernel and cached to the socket's kernel receiving buffer. The read operation is to copy the data in the kernel buffer to the user's buffer at the application layer. When the process calls send to send data, it is the simplest case (also in general cases), copy the data into the socket kernel sending buffer, and then send will return in the upper layer. In other words, when sending the returned data, the data may not necessarily be sent to the peer end (similar to the write File). Sending only copies the data in the buffer layer of the application layer to the kernel sending buffer of the socket. In the future, I will use an article to introduce the kernel actions associated with read and send. Each UDP socket has a receiving buffer without a sending buffer. In terms of concept, data is sent as long as there is data. No matter whether the other party can correctly receive the data, no buffering is required. The receiving buffer is used by TCP and UDP to cache data on the network and is kept until the application process reads it. For TCP, if the application process has not been read and the buffer is full, the action is to notify the peer TCP to close the window. This is the implementation of sliding windows. This ensures that the TCP interface receiver buffer does not overflow, thus ensuring reliable transmission of TCP. Because the recipient is not allowed to send data that exceeds the size of the advertised window. This is TCP traffic control. If the recipient ignores the window size and sends data that exceeds the window size, the receiver TCP will discard it. UDP: when the receiving buffer of the interface is full, the new datagram cannot enter the receiving buffer, and the datagram is discarded. There is no traffic control for UDP. A fast sender can easily drown out slow recipients, causing the receiver to discard the UDP datagram. The above is the implementation of TCP reliability and UDP reliability. These two options are used to set the two buffer sizes of the TCP connection. In simple terms, the semi-closing of TCP and CLOSE_WAIT will take four handshakes to terminate a connection. This is caused by half-close of TCP. Since a TCP connection is full (that is, data can be transmitted simultaneously in both directions, it can be understood as two independent channels in the opposite direction), each direction must be closed separately. This principle is that when one party completes its data sending task, it can send a FIN to terminate the connection in this direction. When one end receives a FIN, the kernel asks the read to return 0 to notify the other end of the application layer that data transmission to the current end has been terminated. Sending FIN is usually the result of the application layer disabling the socket. For example, a TCP client sends a FIN to disable data transmission from the client to the server. What is the impact of semi-shutdown on the server? First, let's take a look at the following TCP status transition diagram. When the client closes the tcp status transition diagram, it sends a FIN packet, receives the server ACK, and the client stays in the FIN_WAIT2 status. When the server receives FIN and sends ACK, it stays in the COLSE_WAIT status. This CLOSE_WAIT status is very annoying. It lasts for a long time. If the server accumulates a large number of COLSE_WAIT status sockets, it may exhaust server resources and thus cannot provide services. Then, how does one generate a large number of socket in the uncontrolled COLSE_WAIT status on the server? Let's trace it. A simple reason is that the server does not send any more FIN packets to the client. The reason why the server does not send FIN may be the need for business implementation. Now is not the time to send FIN, because the server still has data to send to the client. After sending, the system calls the server to send FIN, this scenario is not the continuous COLSE_WAIT status we mentioned above, which is under control. So what is the reason? Let's introduce two systems to call close (sockfd) and shutdown (sockfd, how) and analyze them further. Here, we need to clarify a concept-when a process opens a socket and sends another child process, the socket's sockfd will be inherited. Socket is a system-level object. The result is that the socket is opened by two processes, and the reference count of this socket is changed to 2. Continue with the socket closure of the above two system calls. When close (sockfd) is called, the kernel checks the reference count on the socket corresponding to this fd. If the reference count is greater than 1, subtract 1 from the reference count and return. If the reference count is equal to 1, the kernel will actually close the TCP connection by sending FIN. When shutdown (sockfd, SHUT_RDWR) is called, the kernel does not check the reference count on the socket corresponding to this fd, and directly uses FIN to close the TCP connection. Now it should be clear that the implementation of the server may be a problem. The parent process opens the socket and then processes the business using the dispatch process. The parent process continues to listen to network requests, it will never end. When the client sends a FIN, the read of the sub-process processing the business returns 0. The sub-process finds that the Peer has been closed and directly calls close () to close the local end. In fact, the socket is not closed until the reference count of the socket is reduced by 1. As a result, a CLOSE_WAIT socket is added to the system... How can this problem be avoided? Shutdown (sockfd, SHUT_RDWR); close (sockfd); in this way, the server's FIN will be sent, and the socket will enter the LAST_ACK state, wait for the last ACK to arrive, and then enter the initial state of CLOSED. The shutdown () function indicates that the shutdown system call is used in linux to control the socket shutdown method int shutdown (int sockfd, int how ); the how parameter allows you to select the following methods for the shutdown operation: SHUT_RD: Close the connected read end. That is, the socket no longer accepts data, and any data currently in the socket accept buffer will be discarded. The process cannot perform any read operations on the socket. Any data received after the call to the TCP socket will be confirmed and discarded. SHUT_WR: The write end of the connection. SHUT_RDWR: It is equivalent to calling shutdown twice: first, it is based on SHUT_RD, and then with SHUT_WR. Note: In a multi-process, other processes cannot communicate after shutdown (sfd, SHUT_RDWR. if a process is close (sfd), it will not affect other processes.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

TCP knowledge segment

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

TCP knowledge segment

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support