TCP Heartbeat | TCP keepAlive (RPM)

Source: Internet
Author: User

The application layer uses the following function for each socket to turn on the keepalive mechanism, whose parameters will be configured in the system above.

SetSockOpt (RS, Sol_socket, so_keepalive, (void *) &keepalive, sizeof (KEEPALIVE));

Note: KeepAlive is a TCP protocol package, not an application layer packet, meaning that the protocol package cannot be obtained from the application layer through functions such as recv. Can be seen through the grab kit tool.

==================================================================

One, what is the keepalive timer? [1]

On a free (idle) TCP connection, there is no data flow, and many beginners of TCP/IP are amazed at this. That is, if no process on either side of the TCP connection sends data to the other, there is no data exchange between the two TCP modules. You may find polling (polling) in other network protocols, but it does not exist in TCP. The implication is that we just start a client process and establish a TCP connection with the server, and the connection is still there, no matter how many hours, days, weeks or months you leave. The middle router may crash or restart, the phone line may go down or back up, as long as the host on both ends of the connection is not restarted, the connection remains established.

This makes it possible to assume that neither the client nor the server-side application has an application-level (Application-level) timer to probe the inactive state of the connection (inactivity), causing any application to terminate. Sometimes, however, the server needs to know if the client host has crashed and shut down, or crashed but restarted. Many implementations provide a survival timer to accomplish this task.

The survival timer is a feature that contains controversy. Many people think that even if this feature is needed, this polling should be done by the application rather than by TCP. In addition, if there is a temporary interruption of connectivity on one of the intermediate networks between the two end systems, then the survival option (option) can cause the termination of a good connection between two processes. For example, if a live probe is sent just when an intermediate router crashes and restarts, TCP will assume that the client host has crashed, but that is not the case.

Survival (keepalive) is not part of the TCP specification. The host Requirements RFC lists three reasons for not using it: (1) during a brief failure, they may cause a good connection (good connection) to be released (dropped), (2) they consume unnecessary broadband, (3) They (extra) spend money on the internet, which is billed on the packet. However, in many implementations, a survival timer is provided.

Some server applications may consume resources on behalf of the client, and they need to know whether the client host crashes. The survival timer can provide probing services for these applications. Many versions of the Telnet server and the Rlogin server provide a survival option by default.

Personal computer users use the TCP/IP protocol to log on to a host via Telnet, which is a common example of the need to use a live timer. If a user simply turns off the power at the end of the use and does not log off (log off), then he leaves a half-open (Half-open) connection. In Figure 18.16, we see how to send data on a half-open connection, get a reset (reset) return, but that is the data that is sent by the client on the client side. If the client disappears, leaving the server-side semi-open connection and the server waiting for the client's data, the wait will continue forever. The purpose of the survival feature is to detect this semi-open connection on the server side.
second, how does keepalive work? [1]

In this description, we say that the segment that uses the survival option is the server and the other end is the client. This option can also be set on the client, and there is no reason why this is not allowed, but it is usually set on the server. If both sides of the connection need to detect whether the other side disappears, then it can be set at both ends (such as NFS).

If there is no activity within two hours on a given connection, the server sends a probe segment to the client. (We'll see what the detection section looks like in the following example.) The client host must be one of the following four states:

1) The client host is still active (up) and reachable from the server. From the client TCP's normal response, the server knows that the other side is still active. The server's TCP resets the surviving timer for the next two hours, and if the application's communication occurs on the connection before the two-hour expiration, the timer resets back to the two-hour downward, and then the data is exchanged.

2) The client has crashed, or has been shut down, or is in the process of restarting. In both cases, it does not respond to TCP. The server did not receive a response to its probe and timed out after 75 seconds. The server will send a total of 10 such probes, each probing 75 seconds. If you do not receive a response, it is assumed that the client host has shut down and terminated the connection.

3) The client has crashed but has restarted. In this case, the server will receive a response to its surviving probe, but the response is a reset, causing the server to terminate the connection.

4) The client host is actively running, but unreachable from the server. This is similar to state 2 because TCP cannot differentiate between two of them. What it can show is that it is not receiving a reply to its probe.

The server does not have to worry about the client host being shut down and then restarting (this refers to the normal shutdown of the operator, not the host's crash). When the system is shut down by the operator, all application processes (that is, the client process) are terminated, and client TCP sends a fin on the connection. After receiving this fin, server TCP reports a file end to the server process to allow the server to detect this state.

In the first state, the server application does not know whether the survival probe has occurred. Everything is handled by the TCP layer, and the survival probe is transparent to the application until three states of the 2,3,4 later occur. In these three states, the server application error message is returned to the server via TCP. (the server usually sends a read request to the network, waiting for the client's data.) If the surviving feature returns an error message, the information is returned to the server as the return value of the read operation. In state 2, the error message is similar to "Connection timed out". Status 3 is "The connection is reset by the other side". The fourth state looks like a connection timeout, or it may return additional error information depending on whether it received the ICMP error message associated with the connection.

Windows implementations:

On a normal TCP connection, when we call the following recv or send in an infinite wait:

RET=RECV (S,&buf[idx],nleft,flags);

Or

Ret=send (S,&buf[idx],nleft,flags);

If the TCP connection is properly closed by the other side, that is, if the other party is correctly calling Closesocket (s) or shutdown (s), then the above recv or send call can return immediately and error. This is because closesocket (s) or shutdown (s) has a normal shutdown process that tells the other "The TCP connection is closed and you do not need to send or receive messages again". However, if a network cable is suddenly unplugged, the machine on either side of the TCP connection suddenly loses power or restarts, then the party that is performing the recv or send operation will wait until there is no notification of any connection interruption, that is, it will be stuck for a long time. The solution to this situation is to start the keepalive mechanism in TCP programming.

   struct Tcp_keepalive inkeepalive = {0};     unsigned long ulinlen = sizeof (struct tcp_keepalive);     struct Tcp_keepalive utkeepalive = {0};     unsigned long uloutlen = sizeof (struct tcp_keepalive);     unsigned long ulbytesreturn = 0;    inkeepalive.onoff=1;     inkeepalive.keepaliveinterval=5000; The unit is millisecond    inkeepalive.keepalivetime=1000;      Units are in milliseconds    ret=wsaioctl (S, Sio_keepalive_vals, (LPVOID) &inkeepalive, Ulinlen,                            (LPVOID) &outkeepalive , Uloutlen, &ulbytesreturn, NULL, and NULL);

The KeepAliveTime here indicates that the TCP connection is in the open when the detection frequency, once the probe packet is not returned, the frequency of KeepAliveInterval sent, after several retries, if the probe packet is not returned, then concluded: TCP connection has been disconnected, So the above recv or send call can be returned immediately, and will not be stuck indefinitely.

is a description of the above text. Before the light bar, TCP is unblocked, KeepAlive is sending a probe packet at a frequency of 1000 milliseconds (KeepAliveTime value), and the probe packet is not returned when it is sent to the 32nd probe packet, so it takes 5000 milliseconds (the value of KeepAliveTime) The frequency of sending the probe packet, after repeated several times, the probe packet did not return, and then concluded: This TCP connection has been disconnected!

For win2k/xp/2003, you can find the KeepAlive parameter that affects all connections to the entire system from the following registry key:

[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters]

"KeepAliveTime" =dword:006ddd00 "KeepAliveInterval" =dword:000003e8 "maxdataretries" = "5″

For utilities, 2 hours of idle time is too long. Therefore, we need to manually turn on the KeepAlive function and set reasonable keepalive parameters. On XP and WIN2003 systems, it can be set for a separate socket, but in Windows 2000, it cannot be set individually, and if set, then the effect is all sockets for the entire system.

Linux implementations:

SO_KEEPALIVE/TCP_KEEPCNT/TCP_KEEPIDLE/TCP_KEEPINTVL If one side has closed or abnormally terminated the connection, and the other does not know, we will call such a TCP connection semi-open. TCP detects a semi-open connection by means of a keepalive timer (KeepAlive). In a high-concurrency network server, there are often missed sockets, the corresponding results in one case is a large number of close_wait state connections. This time, you can set the KeepAlive option to solve the problem, of course, there are other ways to solve the problem, detailed information can be seen in reference 8.

Use the following method://setting for KeepAlive int KeepAlive = 1; SetSockOpt (Incomingsock,sol_socket,so_keepalive, (void*) (&keepalive), (socklen_t) sizeof (KEEPALIVE));                          int keepalive_time = 30; SetSockOpt (Incomingsock, Ipproto_tcp, Tcp_keepidle, (void*) (&keepalive_time), (socklen_t) sizeof (keepalive_time )); int KEEPALIVE_INTVL = 3; SetSockOpt (Incomingsock, Ipproto_tcp, TCP_KEEPINTVL, (void*) (&KEEPALIVE_INTVL), (socklen_t) sizeof (keepalive_ INTVL)); int keepalive_probes= 3; SetSockOpt (Incomingsock, Ipproto_tcp, tcp_keepcnt, (void*) (&keepalive_probes), (socklen_t) sizeof (keepalive_ Probes)), set the So_keepalive option to turn on KeepAlive, and then set the TCP_KEEPINTVL start time, interval, number of times, and so on by Tcp_keepidle, tcp_keepcnt, and KeepAlive. Of course, you can also do this by setting kernel parameters such as/proc/sys/net/ipv4/tcp_keepalive_time, TCP_KEEPALIVE_INTVL, and Tcp_keepalive_probes, but in this case, Affects all sockets, it is recommended to use the SetSockOpt setting.

http://blog.csdn.net/cccallen/article/details/8003324

TCP Heartbeat | TCP keepAlive (GO)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.