In normal communication, if the send function is successfully sent, the number of bytes of the sent data is returned. If an error occurs, send returns-1, and the global variable errno is set. In many cases, sending returns-1 because the connection is closed by the Peer (the peer sends an RST or FIN packet ), in this case, errno is set to econnreset (Connection reset by peer ).
However, when the peer network is disconnected, the NIC is detached, or disabled, the peer does not have the opportunity to send a tcp rst or FIN packet to the local operating system to close the connection. At this time, the operating system will not think that the Peer has crashed. Therefore, when calling the send function, the returned data is still the number of data bytes that we specify to send. When we cannot determine whether the peer is alive by sending the return value, we need to use the TCP keep-alive mechanism.
As mentioned in "UNIX Network Programming (Volume 1)", use the so_keepalive socket option to enable the keep-alive mechanism for sockets.
After the keepalive option is set for a TCP set of interfaces, if no data is exchanged in any direction of the interfaces within two hours, TCP automatically sends a keepalive probe to the peer end ).
TCP provides this mechanism to help us determine whether the peer is alive. If the peer does not respond to the keepalive packet normally, the next send or Recv request to the socket may fail. The application can detect this exception.
The following code sets the keepalive mechanism:
int keep_alive = 1;int keep_idle = 5, keep_interval = 1, keep_count = 3;int ret = 0; if (-1 == (ret = setsockopt(fd, SOL_SOCKET, SO_KEEPALIVE, &keep_alive, sizeof(keep_alive)))) { fprintf(stderr, "[%s %d] set socket to keep alive error: %s", __FILE__, __LINE__, ERRSTR);}if (-1 == (ret = setsockopt(fd, IPPROTO_TCP, TCP_KEEPIDLE, &keep_idle, sizeof(keep_idle)))) { fprintf(stderr, "[%s %d] set socket keep alive idle error: %s", __FILE__, __LINE__, ERRSTR);}if (-1 == (ret = setsockopt(fd, IPPROTO_TCP, TCP_KEEPINTVL, &keep_interval, sizeof(keep_interval)))) { fprintf(stderr, "[%s %d] set socket keep alive interval error: %s", __FILE__, __LINE__, ERRSTR);}if (-1 == (ret = setsockopt(fd, IPPROTO_TCP, TCP_KEEPCNT, &keep_count, sizeof(keep_count)))) { fprintf(stderr, "[%s %d] set socket keep alive count error: %s", __FILE__, __LINE__, ERRSTR);}
- Set so_keepalive to 1, which indicates enabling the keepalive mechanism.
- Set the tcp_keepidle option. The value is 5 seconds. This indicates that if no data packet is transmitted for five seconds on the TCP connection, the TCP keep-alive mechanism is enabled. The default value is 2 hours.
- Set the tcp_keepintvl option. The value is 1 second, which indicates that if the retention mechanism is enabled, a keep-alive package is sent every 1 second. The default value is 75 seconds.
- Set the tcp_keepcnt option. The value is 3, which indicates that if the peer does not respond normally to three keep-alive packets, the peer is declared to have crashed. The default value is 9.
This solves the problem of network disconnection.
However, if the packet sent by the sender does not receive the ACK packet from the receiver, the TCP keep-alive mechanism will not be started, and TCP will start the timeout retransmission mechanism, in this way, the TCP keep-alive mechanism becomes invalid when the ACK packet is not received. I found the information above stackoverflow when checking this problem: http://stackoverflow.com/questions/5907527/application-control-of-tcp-retransmission-on-linux
According to the first answer, a socket option named tcp_user_timeout is added to Linux kernel 2.6.37. The answer is that the tcp_user_timeout option is the socket option on the TCP layer, and the option accepts the value of the unsigned int type. The value is the maximum duration of ack confirmation not received after the packet is sent, in milliseconds. For example, if the value is set to 10000, it means that if the packet sent out does not receive ack confirmation within 10 seconds, if you call send or Recv next time, the function returns-1, and errno is set to etimeout, which indicates connection timeout.
The implementation code should be as follows:
unsigned int timeout = 10000;if (-1 == setsockopt(fd, IPPROTO_TCP, TCP_USER_TIMEOUT, &timeout, sizeof(timeout))) { fprintf(stderror, "set TCP_USER_TIMEOUT option error: %s", strerror(errno));}
As mentioned above, the TCP keep-alive and tcp_user_timeout mechanism can perfectly solve the problem that the connection is suspended for a long time when the communication peer is disconnected or power is down.
Transferred from http://blog.leeyiw.org/tcp-keep-alive /;