The keeplive mechanism can be used to detect network exceptions .)

Last Update:2018-12-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. What is a keepalive timer? [1]

There is no data stream on an idle (idle) TCP connection. Many beginners of TCP/IP are surprised by this. That is to say, if no process at both ends of the TCP connection sends data to the other end, there is no data exchange between the two TCP modules. You may find polling in other network protocols, but it does not exist in TCP. The implication is that we only need to start a client process and establish a TCP connection with the server. No matter how many hours, days, weeks, or months you leave, the connection still exists. The middle router may crash or restart, and the telephone line may go down or back up. As long as the host at both ends of the connection is not restarted, the connection remains established.

This can be considered as a client or a server-side application.ProgramThere is no application-level timer to detect inactivity of the connection, resulting in the termination of any application. However, sometimes the server needs to know whether the client host has crashed and closed, or crashed but restarted. Many implementations provide a survival timer to complete this task.

The survival timer is a controversial feature. Many people think that even if this feature is needed, such round-robin to the other party should also be done by the application, rather than in TCP. In addition, if a connection is temporarily interrupted on an intermediate network between two terminal systems, the survival option can cause the termination of a good connection between two processes. For example, if a middle-end router crashes or restarts to send a survival test, TCP will think that the client host has crashed, but this is not the case.

Keepalive is not part of the TCP specification. The Host Requirements RFC lists three reasons for not using it: (1) during a short fault, they may cause a good connection to be released (dropped ), (2) they consume unnecessary bandwidth, and (3) They (extra) spend money on the Internet where data packets are billed. However, a survival timer is provided in many implementations.

Some server applications may occupy resources on the client. They need to know whether the client host crashes. The survival timer can provide probe services for these applications. Many versions of the Telnet server and rlogin server provide the survival option by default.

PC users use the TCP/IP protocol to log on to a host through Telnet. This is a common example of survival timer. If a user only turns off the power at the end of use and does not log off, then the user leaves a semi-open connection. In Figure 18.16, we can see how to get a reset (reset) returned by sending data on a semi-open connection, but that is the data sent by the client on the client. If the client disappears, the server is left with a semi-open connection, and the server is waiting for the client data, the waiting will continue forever. The survival feature aims to detect this semi-open connection on the server side.

Ii. How does keepalive work? [1]

In this description, we refer to the section that uses the survival option as the server and the other end as the client. You can also set this option on the client, and there is no reason not to allow this, but it is usually set on the server. If both ends of the connection need to detect whether the other end disappears, you can set both ends (such as NFS ).

If no activity is performed within two hours on a given connection, the server sends a detection segment to the client. (We will see the probe section in the following example .) The client host must be in one of the following four States:

1) The client host is still active (UP) and can be reached from the server. From the normal response of the client TCP, the server knows that the other party is still active. The TCP of the server resets the active timer for the next two hours. If the application communication occurs before the expiration of the two hours, the timer resets the timer for the next two hours, and then exchange data.

2) The client has crashed, shut down, or is restarting. In both cases, TCP does not respond. The server does not receive a Detection Response and times out after 75 seconds. The server will send a total of 10 such probes, each of which is 75 seconds. If no response is received, the client host is considered closed and the connection is terminated.

3) The client has crashed but has restarted. In this case, the server will receive a response to its survival detection, but the response is a reset, causing the server to terminate the connection.

4) The client host is active, but the slave server cannot be reached. This is similar to status 2 because TCP cannot distinguish the two. It can only indicate that no response has been received to the test.

The server does not have to worry about the client host being shut down and then restarted (this refers to the normal shutdown by the operator, rather than the host crash ). When the system is shut down by the operator, all application processes (that is, client processes) will be terminated, and client TCP will send a fin over the connection. After receiving the fin, the server TCP reports the end of a file to the server process to allow the server to detect this state.

In the first State, the server application does not know whether the survival test has occurred. Everything is handled by the TCP layer, and the survival detection is transparent to the application until the following three States are 2, 3, and 4. In these three states, an error message is returned to the server application through TCP of the server. (Generally, the server sends a read request to the network, waiting for the client data. If the survival feature returns an error message, the message is returned to the server as the return value of the read operation .) In status 2, the error message is similar to "connection timeout ". Status 3 indicates that the connection is reset by the other party ". The fourth state may be like connection timeout, or other error messages may be returned based on whether the ICMP error message related to the connection is received.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

The keeplive mechanism can be used to detect network exceptions .)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

The keeplive mechanism can be used to detect network exceptions .)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support