Zabbix Monitor Large-volume alarm Zabbix agent on * * * * unreachable for 5 minute

Source: Internet
Author: User

On September 4 and September 9, the company's monitoring platform Zabbix two large-scale Zabbix monitoring alarms, are ZABBIX agent on * * * * * unreachable for 5 minute unreachable, each time is all monitoring host this alarm.

Fault description: all monitored host alarms, all graphics data is interrupted

Action: The first time is to execute the zabbix_get command on the Zabbix server side, find that the data can be obtained, and add the "time" command before the command. The resulting data time is also within a relatively short range.

Results: After 10 minutes all the alarms disappeared or resumed in a flash, and the data on all the graphs was restored, and the graphics were all coherent.

Detection: After the alarm disappears, the first thing I do is to look at the logs, the logs on the Zabbix_server side, the logs on the server side.

LOG: Cannot send list of active checks to [* * *]: Host [* * *] not found

Item "Vfs.fs.size[c:,used" "On Host" "Failed:first network error, wait for the seconds

such as the log, when I linked the client, the client's log is basically a number of links 10051 port failure, system interruption and other alarms, and then I Baidu and Bing many, but basically did not find a solution to my approach, but only to consult colleagues.

Today things I think should be solved, because in the end we are unable to find the problem, the final test system log, that is/var/log/message inside, in the system log has been reported two kinds of errors:

    1. 16:17:24 localhost kernel:nf_conntrack:table full, dropping packet.

    2. LocalHost Rsyslogd-2177:imuxsock begins to drop messages from PID 21607 due to rate-limiting

      It says the table is full and starts dropping packets. The other one is Rsyslog log data loss.

I didn't notice at first, but my OPS boss thought it was related to the Zabbix data, and then we slowly searched for data, and finally we found that the Iptables Firewall service actually started, At the beginning of the Zabbix configuration, we all know that the firewall and SELinux will generally shut down, but now the situation is obviously because the firewall has resisted the data, causing the data table to be full after the start of packet loss.

Then I use the root user in the/root directory. Bash_history someone using iptables-l this command to view the firewall rules, I finally boss told me:

Remember: Do not use iptables instructions (such as IPTABLES-NL) to view the current status while the firewall is down! Because this causes the firewall to be started, and the rule is empty. Although there is no blocking effect, all connection states are logged, wasting resources and impacting performance and possibly causing the firewall to drop packets actively!

All right. Finally, after shutting down the firewall,

16:17:24 localhost kernel:nf_conntrack:table full, dropping packet alarm will not exist.


Zabbix Monitor Large-volume alarm Zabbix agent on * * * * unreachable for 5 minute

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.