The following are some simple heartbeats for sending customers and server programs. These functions can detect premature failure of the Peer host or the communication path to the peer.
Before providing these functions, we must give some warnings.First, some people will think of using the TCP persistence feature (so_keepalive socket option) to provide this function. However, TCP has to send a persistence detection segment after the connection has been idle for two hours.After realizing this, their next question is how to change the keep-alive parameter to a much smaller value (usually in seconds) so that the failure can be detected more quickly.Although TCP keep-alive timer parameters are indeed feasible in many systems, these parameters are usually maintained according to the kernel rather than per socket, so changes to them will affect all sockets that enable this option. In addition, the intention of the "Keep Alive" option is by no means this purpose (for frequency polling ).
Second, transient loss of connectivity between two end systems is not always a bad thing. TCP was designed to deal with temporary disconnection from the beginning, and the TCP implementation from BerkeleyRetransmission: 8-10 minutesTo discard a connection. Newer IP routing protocols can detect link failures and enable candidate paths within a short period of time (for example, in seconds. Therefore, application developers must review the specific applications that want to introduce the psychological mechanism. It is indeed a good thing or a bad thing to terminate the connection after they have not heard the response from the peer for more than 5-10 s. Some application systems require this function, but most do not.
We will use the TCP emergency mode to round the peer periodically. In the following explanation, we assume that the Round-Robin is performed once every 1 s, if the peer response is not received for five seconds, the peer is no longer alive, but these values can be changed by the application.
In this example, the customer sends an out-of-band byte to the server every 1 S, and the server obtains this byte, causing it to send back an out-of-band byte to the customer.Each end needs to know whether the peer end no longer exists or is no longer reachable.The customer and server increase their CNT variable every 1 s, and reset the variable to 0 every time an out-of-band byte is received. If the counter reaches 5 (that is, the process has not received the out-of-band byte from the peer for 5S), the connection is deemed invalid. When an out-of-band byte arrives, both the customer and the server use the sigurg signal for notification. We pointed out in the middle of the figure that both the data, the delivery data and the out-of-band bytes are exchanged through a single TCP connection.
The following is our heatbeat_cli function to set the customer's heartbeats. The second parameter is the polling frequency in seconds, the third parameter is the number of consecutive No-response polling times that should be performed before the current connection is abandoned.
# Include "unp. H "/* Copy to heartbeat_cli parameters: Set interface description (signal processing program needs it to send and receive out-of-band data), sigalrm frequency, total number of sigalrm responses from the subserver before the customer deems the server or connection dead. nprobes records the number of sigalrm since the last server response */static int servfd; static int nsec;/* # seconds between each alarm */static int maxnprobes;/* # probes w/no response before quit */static int nprobes; /* # probes since last server response */static void sig_urg (INT), sig_alrm (INT); void heartbeat_c Li (INT servfd_arg, int nsec_arg, int maxnprobes_arg) {/* heartbeat_cli function checks and saves parameters, and establishes signal processing functions for sigurg and sigalrm, set the owner of the Set interface to the process ID. Alarm schedules a sigalrm */servfd = servfd_arg;/* Set globals for signal handlers */If (nsec = nsec_arg) <1) nsec = 1; if (maxnprobes = maxnprobes_arg) <nsec) maxnprobes = nsec; nprobes = 0; signal (sigurg, sig_urg); fcntl (servfd, f_setown, getpid ()); signal (sigalrm, sig_alrm); alar M (NESC);} static void sig_urg (INT signo) {/* this signal is generated when an out-of-band notification arrives. We try to read out-of-band bytes, but it does not matter if it has not been reached (ewouldblock. Because the system does not receive out-of-band data online, it does not interfere with the customer's reading of its common data. Since the server is still alive, nprobes is reset to 0 */int n; char C; If (n = Recv (servfd, & C, 1, MSG_OOB) <0) {If (errno! = Ewouldblock) err_sys ("Recv error");} nprobes = 0;/* Reset Counter */return;/* may interrupt client code */} static void sig_alrm (INT signo) {/* this signal is generated at regular intervals. The counter nprobes is increased by 1. If maxnprobes is reached, we think the server is crashed or inaccessible. In this example, we end the customer process, even though other designs can also be used: a signal can be sent to the main loop, or a customer function can be provided to heartbeat_cli as another parameter, when the server seems dead, call it */If (++ nprobes> maxnprobes) {fprintf (stderr, "server is unreachable \ n"); exit (0 );} send (servfd, "1", 1, MSG_OOB); Alarm (nsec); return;/* may interrupt client code */}
Global Variables3-6 the first three variables are copies of the heartbeat_cli function parameters: Socket Descriptor (which is used by the signal processing function to send and receive out-of-band data), sigalrm frequency, total number of sigalrm responses processed before the customer deems the server or connection no longer alive. The variable nprobes measures the number of sigalrm processed since the last response from the server is received.
Heartbeat_cli FunctionsThe 8-20 heartbeat_cli function checks and saves parameters, creates a signal processing function for sigurg and sigalrm, and sets the owner of the socket as the process ID. Execute alarm to schedule the first sigalrm.
Sigurg processing functions21-32 this signal is generated when an out-of-band notification arrives. We try to read the corresponding out-of-band bytes, but if it hasn't arrived (ewouldblock), it doesn't matter. Note: We do not use the out-of-band data receiving method online, because this method will interfere with the customer's normal data reading. Since the server is still alive, we reset nprobes to 0.
Sigalrm processing functions33-43 the current signal is generated at a constant interval. The incremental counter nprobes. If it reaches maxnprobes, we determine that the server host has crashed or is no longer reachable. Here we end the customer process directly. Send a byte containing character 1 as the out-of-band data (this value does not have any implicit meaning), and then execute alarm to schedule the next sigalrm.
The following is the heart beat function of the server program.
# Include "unp. H "static int servfd; static int nsec;/* # seconds between each alarm */static int maxnalarms;/* # alarms w/no client probe before quit */static int nprobes; /* # alarms since last client probe */static void sig_urg (INT), sig_alrm (INT); void Digest (INT servfd_arg, int nsec_arg, int maxnalarms_arg) {servfd = servfd_arg; /* Set globals for signal handlers */If (nsec = nsec_arg) <1) nsec = 1; if (maxnalarms = maxnalarms_arg) <nsec) maxnalarms = nsec; signal (sigurg, sig_urg); fcntl (servfd, f_setown, getpid ()); signal (sigalrm, sig_alrm); Alarm (nsec);} static void sig_urg (INT signo) {/* when an out-of-band notification is received, the server tries to read it. Like a customer, it does not matter if the out-of-band bytes do not arrive. The out-of-band bytes are returned to the customer as out-of-band data. Note: If the Recv returns the ewouldblock error, the automatic variable C will be sent to the customer as it happens. Because we do not need the value of the out-of-band bytes, this does not matter. It is important to send out-of-band data of 1 byte, regardless of what the byte is. As the customer is still alive after receiving the notification, the nprobes is reset to 0 */int n; char C; If (n = Recv (servfd, & C, 1, MSG_OOB )) <0) {If (errno! = Ewouldblock) err_sys ("Recv error");} Send (servfd, & C, 1, MSG_OOB);/* echo back out-of-hand byte */nprobes = 0; /* Reset Counter */return;/* may interrupt server code */} static void sig_alrm (INT signo) {/* nprobes adds 1, if it reaches the maxnalarms value specified by the caller, the server process is terminated. Otherwise, schedule sigalrm */If (++ nprobes> maxnalarms) {printf ("No probes from client \ n"); exit (0) ;}alarm (nsec); return; /* may interrupt server code */}
Heartbeat_serv Function7-18 declares variables. The heartbeat_serv function is almost the same as the customer's heart beat initialization function.
Sigurg processing functions19-31 after receiving an out-of-band notification, the server tries to read the corresponding out-of-band bytes. Just like the customer, if the out-of-band byte has not arrived, there is no declarative relationship. The server sends the read out-of-band bytes back to the customer as out-of-band data. Note: If the Recv returns the ewouldblock error, the automatic variable C returns the ewouldblock as it happens. Since we do not use the value of the out-of-band bytes for any purpose, there will be no problem with such disposal. It is important to send out-of-band data of 1 byte, rather than what it is. Since we have just received a notification that the customer is still alive, we reset nprobes to 0.
Sigalrm processing functions32-41 increments nprobes. If the value reaches the maxnalarms value specified by the caller, the server process is terminated. Otherwise, the next sigalrm is scheduled.
Http://blog.csdn.net/ctthuangcheng/article/details/9569265
UNIX network programming-customer/Server heartbeats function (conversion)