Troubleshooting: What caused the client batch heartbeat timeout drop

Source: Internet
Author: User

The heartbeat timeout refers to an online client (TCP connection) that has not received any messages from the client for a specified period of time, and that the client is considered to be out of line.

Why do you need a heartbeat mechanism? Because the server does not immediately feel (some may take a long time to feel) for some client drops (possibly because of a network disconnection, or a client program exits), it is necessary to introduce a heartbeat mechanism so that the server can discover that the client is out of line as early as possible. For a more detailed introduction to the heartbeat mechanism, see here.

If there is a lot of client batch heartbeat timeouts, it means that the server has not received any heartbeat messages from these clients for some time in the past. There are usually 3 possibilities that cause the condition to occur:

1.CPU or high memory usage

When this happens, look at the CPU and memory of the server process for exceptions.

For example, when the CPU continues at 100%, it is possible that the operation that received the data is stopped.

2. It takes a long time to process some information

If the service side of the information processing model set is iocpdirectly, then according to the principle of iocpdirectly, when processing a certain information takes more time than the server set the heartbeat timeout time, the service side will be the corresponding client misjudged as the Heartbeat timeout drop line.

Assuming that this is the cause of the heartbeat timeout, the corresponding solution is:

(1) identify those who deal with very time-consuming information, optimize the rationale and speed up processing.

(2) Set the time-out interval to locate a larger value or turn off heartbeat detection.

(3) Modify the information processing to asynchronous mode.

(4) The service-side information processing model is modified to taskqueue mode, which completely avoids the case of miscarriage due to the long processing time.

Obviously, the solution (1) is the best and the fundamental solution.

3. Server network topology, firewalls, routers, network security monitoring and other related hardware and software

If the preceding possibility is ruled out (for example, if the bulk drop is still occurring even though the Taskqueue mode is changed), then there is almost only one possible: the server did not receive any messages from these clients during the heartbeat timeout interval. It is likely that messages from the client are blocked by firewalls, routers, or related hardware and software that is fully monitored by some networks.

At this point, you need professional operations or network management personnel to participate in, to help troubleshoot problems, such as:

(1) Execute the netstat command on the server to view the relevant status information for the destination port.

(2) Execute the Packet Capture tool on the server and monitor the target port for data from the client.

(3) Analyze the network topology of the server, and take the server as the center, and then check the firewall, router, network security monitoring and other related software and hardware settings, and conduct the targeted troubleshooting tests.

After the above analysis, should be able to find the root cause of the problem, if there is no results, you can give me a message, we discuss the next AH.

Troubleshooting: What caused the client batch heartbeat timeout drop

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.