Use a keepalive timer in Linux

Last Update:2018-12-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. What is a keepalive timer? [1] There is no data stream on an idle (idle) TCP connection. Many beginners of TCP/IP are surprised at this. That is to say, if no process at both ends of the TCP connection sends data to the other end, there is no data exchange between the two TCP modules. You may find polling in other network protocols, but it does not exist in TCP. The implication is that we only need to start a client process and establish a TCP connection with the server. No matter how many hours, days, weeks, or months you leave, the connection still exists. The intermediate router may crash or restart, and the telephone line may go
Down or back up, as long as the host at both ends of the connection is not restarted, the connection remains established. This means that no application-level timer is available for applications on both the client and server to detect inactivity ), this causes the termination of any application. However, sometimes the server needs to know whether the client host has crashed and closed, or crashed but restarted. Many implementations provide a survival timer to complete this task. The survival timer is a controversial feature. Many people think that even if this feature is needed, such round-robin to the other party should also be done by the application, rather than in TCP. In addition, if a connection is temporarily interrupted on an intermediate network between two terminal systems, the survival option can cause the termination of a good connection between two processes. For example, if a middle-end router crashes or restarts to send a survival test, TCP will think that the client host has crashed, but this is not the case. Keepalive is not part of the TCP specification. The Host Requirements RFC lists three reasons for not using it: (1) during a short fault, they may cause a good connection to be released (dropped ), (2) they consume unnecessary bandwidth, and (3) They (extra) spend money on the Internet where data packets are billed. However, a survival timer is provided in many implementations. Some server applications may occupy resources on the client. They need to know whether the client host crashes. The survival timer can provide probe services for these applications. Many versions of the Telnet server and rlogin server provide the survival option by default. PC users use the TCP/IP protocol to log on to a host through Telnet. This is a common example of survival timer. If a user only turns off the power at the end of use and does not log off, then the user leaves a semi-open connection. In Figure 18.16, we can see how to get a reset (reset) returned by sending data on a semi-open connection, but that is the data sent by the client on the client. If the client disappears, the server is left with a semi-open connection, and the server is waiting for the client data, the waiting will continue forever. The survival feature aims to detect this semi-open connection on the server side.

Ii. How does keepalive work? [1] In this description, we refer to the section that uses the survival option as the server and the other end as the client. You can also set this option on the client, and there is no reason not to allow this, but it is usually set on the server. If both ends of the connection need to detect whether the other end disappears, you can set both ends (such as NFS ). If no activity is performed within two hours on a given connection, the server sends a detection segment to the client. (We will see the probe section in the following example .) The client host must be in one of the following four States:
1) The client host is still active (UP) and can be reached from the server. From the normal response of the client TCP, the server knows that the other party is still active. The TCP of the server resets the active timer for the next two hours. If the application communication occurs before the expiration of the two hours, the timer resets the timer for the next two hours, and then exchange data. 2) The client has crashed, shut down, or is restarting. In both cases, TCP does not respond. The server does not receive a Detection Response and times out after 75 seconds. The server will send a total of 10 such probes, each of which is 75 seconds. If no response is received, the client host is considered closed and the connection is terminated. 3) The client has crashed but has restarted. In this case, the server will receive a response to its survival detection, but the response is a reset, causing the server to terminate the connection. 4) The client host is active, but the slave server cannot be reached. This is similar to status 2 because TCP cannot distinguish the two. It can only indicate that no response has been received to it. The server does not have to worry about the client host being shut down and then restarted (this refers to the normal shutdown by the operator, rather than the host crash ). When the system is shut down by the operator, all application processes (that is, client processes) will be terminated, and client TCP will send a fin over the connection. After receiving the fin, the server TCP reports the end of a file to the server process to allow the server to detect this state. In the first State, the server application does not know whether the survival test has occurred. Everything is handled by the TCP layer, and the survival detection is transparent to the application until the following three States are 2, 3, and 4. In these three states, an error message is returned to the server application through TCP of the server. (Generally, the server sends a read request to the network, waiting for the client data. If the survival feature returns an error message, the message is returned to the server as the return value of the read operation .) In status 2, the error message is similar to "connection timeout ". Status 3 indicates that the connection is reset by the other party ". The fourth state may look like a connection timeout, or other error messages may be returned based on whether the ICMP error message related to the connection is received.

3. How to Use keepalive in Linux? [2] Linux has built-in support for keepalive. You need to enable TCP/IP networking in order to use it. You also needProcfsSupport andSysctlSupport
Be able to configure the kernel parameters at runtime. The procedures involving keepalive use three user-driven variables:

Tcp_keepalive_time: The interval between the last data packet sent (simple acks are not considered data) and the first keepalive probe; after the connection is marked to need keepalive, this counter is not used any
Further
Tcp_keepalive_intvl: The interval between subsequential keepalive probes, regardless of what the connection has exchanged in the meantime
Tcp_keepalive_probes: The number of unacknowledged probes to send before considering the connection dead and policying the application layer

Remember that keepalive support, even if configured in the kernel, is not the default behavior in Linux. Programs must request keepalive control for their sockets using
The SetsockoptInterface. There are relatively few programs implementing keepalive, but you can easily add keepalive support for most of them following the instructions. The above section has clearly stated that the Linux kernel supports keepalive. Three parameters are used: tcp_keepalive_time (idle time when keepalive is enabled) tcp_keepalive_intvl (sending interval of the keepalive test package) and tcp_keepalive_probes (the number of times the test package is sent if no response is sent ); how to configure these three parameters? There are two ways to configure keepalive parameters inside the kernel via userspace commands:

ProcfsInterface
SysctlInterface

We mainly discuss how this is accomplished on the procfs interface because it's the most used, recommended and the easiest to understand. The sysctl interface, especially
Regarding Sysctl(2) syscall and not Sysctl(8) tool, is only here for the purpose of background knowledge. The procfs InterfaceThis interface requires both SysctlAnd ProcfsTo be built into the kernel, and ProcfsMounted somewhere in the filesystem (usually on /Proc,
As in the examples below). You can read the values for the actual parameters by "catting" files in /Proc/sys/NET/IPv4/Directory:

  # cat /proc/sys/net/ipv4/tcp_keepalive_time  7200  # cat /proc/sys/net/ipv4/tcp_keepalive_intvl  75  # cat /proc/sys/net/ipv4/tcp_keepalive_probes  9

The first two parameters are expressed in seconds, and the last is the pure number. This means that the keepalive routines wait for two hours (7200 SECs) before sending
The first keepalive probe, and then resend it every 75 seconds. if no ACK response is already ed for nine consecutive times, the connection is marked as broken. modifying this value is straightforward: You need to write new values into the files. suppose you decide to configure the host so that keepalive starts after ten minutes
Of channel inactivity, and then send probes in intervals of one minute. because of the high instability of our network trunk and the low value of the interval, suppose you also want to increase the number of probes to 20. here's how we wocould change the settings:

  # echo 600 > /proc/sys/net/ipv4/tcp_keepalive_time  # echo 60 > /proc/sys/net/ipv4/tcp_keepalive_intvl  # echo 20 > /proc/sys/net/ipv4/tcp_keepalive_probes

To be sure that all succeeds, recheck the files and confirm these new values are showing in place of the old ones. In this way, the preceding three parameters are configured. For more information about how to keep these parameters unchanged during restart, see [2].

4. How to Use keepalive in a program? [2]-[4] All You Need To enable keepalive for a specific socket is to set the specific socket option on the socket itself. The prototype of the function is as follows:

int setsockopt(int s, int level, int optname,                 const void *optval, socklen_t optlen)

The first parameter is the socket, previusly created with Socket(2); the second one must be Sol_socket, And the third must be So_keepalive.
The fourth parameter must be a Boolean integer value, indicating that we want to enable the option, while the last is the size of the value passed before. according to the manpage, 0 is returned upon success, and-1 is returned on error (and ErrnoIs properly set). There are also three other socket options you can set for keepalive when you write your application. They all use Sol_tcpLevel instead Sol_socket,
And they override system-wide variables only for the current socket. If you read without writing first, the current system-wide parameters will be returned.

Tcp_keepcnt: OverridesTcp_keepalive_probes

Tcp_keepidle: OverridesTcp_keepalive_time

Tcp_keepintvl: OverridesTcp_keepalive_intvlInt keepalive = 1; // enable the keepalive attribute. We can see that keepalive is a switch option and can be enabled through functions. Specifically, you can use the following code: setsockopt (RS, sol_socket, so_keepalive, (void *) & keepalive, sizeof (keepalive )); the second parameter mentioned in the English document above can be set to sol_tcp to set the three parameters of keepalive (For details, refer to [3]). the header file "netinet/TCP. H ". Of course, you can also configure the keepalive parameter by calling the system during actual programming. For other parameters of setsockopt, refer to [4].

V. how to determine whether a TCP connection is disconnected? [3] when TCP detects that the Peer socket is no longer available (the detection package cannot be issued, or the detection package does not receive the ACK response package), the SELECT statement returns the socket readable, in addition,-1 is returned for Recv and errno is set to etimedout.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Use a keepalive timer in Linux

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Use a keepalive timer in Linux

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support