TCP keep-alive and tcp_user_timeout

Source: Internet
Author: User
Tags ack set socket connection reset keep alive

In normal communication, if the send function is successfully sent, the number of bytes of the sent data is returned. If an error occurs, send returns-1, and the global variable errno is set. In many cases, sending returns-1 because the connection is closed by the Peer (the peer sends an RST or FIN packet ), in this case, errno is set to econnreset (Connection reset by peer ).

However, when the peer network is disconnected, the NIC is detached, or disabled, the peer does not have the opportunity to send a tcp rst or FIN packet to the local operating system to close the connection. At this time, the operating system will not think that the Peer has crashed. Therefore, when calling the send function, the returned data is still the number of data bytes that we specify to send. When we cannot determine whether the peer is alive by sending the return value, we need to use the TCP keep-alive mechanism.

 

As mentioned in "UNIX Network Programming (Volume 1)", use the so_keepalive socket option to enable the keep-alive mechanism for sockets.

After the keepalive option is set for a TCP set of interfaces, if no data is exchanged in any direction of the interfaces within two hours, TCP automatically sends a keepalive probe to the peer end ).

TCP provides this mechanism to help us determine whether the peer is alive. If the peer does not respond to the keepalive packet normally, the next send or Recv request to the socket may fail. The application can detect this exception.

The following code sets the keepalive mechanism:

int keep_alive = 1;int keep_idle = 5, keep_interval = 1, keep_count = 3;int ret = 0; if (-1 == (ret = setsockopt(fd, SOL_SOCKET, SO_KEEPALIVE, &keep_alive,    sizeof(keep_alive)))) {    fprintf(stderr, "[%s %d] set socket to keep alive error: %s", __FILE__,        __LINE__, ERRSTR);}if (-1 == (ret = setsockopt(fd, IPPROTO_TCP, TCP_KEEPIDLE, &keep_idle,    sizeof(keep_idle)))) {    fprintf(stderr, "[%s %d] set socket keep alive idle error: %s", __FILE__,        __LINE__, ERRSTR);}if (-1 == (ret = setsockopt(fd, IPPROTO_TCP, TCP_KEEPINTVL, &keep_interval,    sizeof(keep_interval)))) {    fprintf(stderr, "[%s %d] set socket keep alive interval error: %s", __FILE__,        __LINE__, ERRSTR);}if (-1 == (ret = setsockopt(fd, IPPROTO_TCP, TCP_KEEPCNT, &keep_count,    sizeof(keep_count)))) {    fprintf(stderr, "[%s %d] set socket keep alive count error: %s", __FILE__,        __LINE__, ERRSTR);}
 
  1. Set so_keepalive to 1, which indicates enabling the keepalive mechanism.
  2. Set the tcp_keepidle option. The value is 5 seconds. This indicates that if no data packet is transmitted for five seconds on the TCP connection, the TCP keep-alive mechanism is enabled. The default value is 2 hours.
  3. Set the tcp_keepintvl option. The value is 1 second, which indicates that if the retention mechanism is enabled, a keep-alive package is sent every 1 second. The default value is 75 seconds.
  4. Set the tcp_keepcnt option. The value is 3, which indicates that if the peer does not respond normally to three keep-alive packets, the peer is declared to have crashed. The default value is 9.

This solves the problem of network disconnection.

However, if the packet sent by the sender does not receive the ACK packet from the receiver, the TCP keep-alive mechanism will not be started, and TCP will start the timeout retransmission mechanism, in this way, the TCP keep-alive mechanism becomes invalid when the ACK packet is not received. I found the information above stackoverflow when checking this problem: http://stackoverflow.com/questions/5907527/application-control-of-tcp-retransmission-on-linux

According to the first answer, a socket option named tcp_user_timeout is added to Linux kernel 2.6.37. The answer is that the tcp_user_timeout option is the socket option on the TCP layer, and the option accepts the value of the unsigned int type. The value is the maximum duration of ack confirmation not received after the packet is sent, in milliseconds. For example, if the value is set to 10000, it means that if the packet sent out does not receive ack confirmation within 10 seconds, if you call send or Recv next time, the function returns-1, and errno is set to etimeout, which indicates connection timeout.

The implementation code should be as follows:

unsigned int timeout = 10000;if (-1 == setsockopt(fd, IPPROTO_TCP, TCP_USER_TIMEOUT, &timeout, sizeof(timeout))) {    fprintf(stderror, "set TCP_USER_TIMEOUT option error: %s", strerror(errno));}

 

As mentioned above, the TCP keep-alive and tcp_user_timeout mechanism can perfectly solve the problem that the connection is suspended for a long time when the communication peer is disconnected or power is down.

Transferred from http://blog.leeyiw.org/tcp-keep-alive /;

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.