How to change the default check time for Nagios monitoring

Last Update:2016-05-18 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

/usr/local/nagios/etc/nagios.cfg:
Interval_length represents the time unit, which defaults to 60, or 1 minutes

/usr/local/nagios/etc/objects/services.cfg: (This file does not exist in the new version)
Normal_check_interval represents the time interval for re-detection, which defaults to 3 time units
Check_interval, like the Normal_check_interval, can only be in the 3. Used in X
Retry_check_interval Retry Time
Max_check_attempts This is the number of failed connections, the number of times after the alarm

About Max_check_attempts, Normal_check_interval, retry_check_interval three parameters.
First, two concepts are explained,
One, the soft state: the monitored item is in the abnormal state of the retry_check detection period;
Second, the hard state: the monitored items to reach the max_check_attempts maximum number of abnormal state, in addition to the State, we estimate and call it "normal".
Let's try to see how Nagios does state detection and alerting when setting the following parameters:

Max_check_attempts 3
Normal_check_interval 3
Retry_check_interval 2
Notification_interval 3

First, Nagios detects a service every three minutes, and when a service state is detected as an exception, it goes directly to the soft state (1/3 soft), and thereafter to 2 times per 2-minute (retry_check_interval) detection frequency (3 tests, To achieve max_check_attempts) detection, if the two detection services are abnormal, then directly into the hard state. After entering the hard state, Nagios detects the service every 3 minutes (Normal_check_interva) frequency, which is the same as normal, while alerting every 3 minutes (notification_interval).

Note: Modifying these parameters does not take effect immediately. To restart Nagios first, and then wait for the next test to complete, Nagios calculates the detection time and the number of alarms according to the new parameters.

If you want to send an email alert after the alert appears, the following conditions are required:

There is a definition of notifications_enabled=1 in the service, and the contacts of this service has a service_notification_commands defined.

Service_notification_commands's command is from Commands.cfg, which defines what commands are used to send messages.

The alert for host is the same.

There are several possibilities for not receiving a message:

Mail is rejected, check the log of mail can be seen.

Whether Nagios sends an alert message is related to several parameters in the contact.cfg (or hosts.cfg) configuration file. Here are some of the key notes below:

Notifications_enabled: Whether to enable the Notification alert feature. 1 is on, 0 is disabled. Obviously, the value of this option is 0 o'clock, and Nagios is definitely not sending mail.
Contact_groups: Defines the group of contact groups that receive notification reminders messages. Please make sure your email address is filled in correctly and in the group.
Notification_interval: The shortest interval between sending reminders (messages) repeatedly. The default time interval is 60 minutes. If this value is set to 0,nagios the alert notification message will not be sent repeatedly, but once. Notification_period: Defines the time period during which an alarm notification is sent. Critical host service, set to 7x24, general host service, set to work hours (worktime). Then, if the defined monitoring is not in the defined alarm time period, Nagios will not send an alert notification message, no matter what happens.
Notification_options: Defines the circumstances under which the monitored host (object) sends an alert notification message, the optional status is as follows:

(1) w:warning, warning
(2) U:unknown, unknown
(3) C:critical, Danger (reached critical value)
(4) D:down, outage
(5) R:recovery, status restored to OK
(6) F:flapping, (does not understand the meaning of the state, perhaps the state fluctuation is very large)
(7) N:none, do not send alarm notification mail

Nagios monitoring and alarm interval:

Max_check_attempts:
Check_interval:
Retry interval:
Notification_interval:

In the OK state, nagios is monitored with check_interval defined intervals, and after a problem occurs, switches to retry_interval and max_check_attempts for monitoring, reaching Max_check_ Attempts trigger the first alarm, while resuming to check_interval for monitoring, and with notification_interval defined time interval to send an alarm, service recovery, at the nearest check_interval point send OK SMS, Complete the alarm cycle.

Special:
The 1.max_check_attempts is defined as 1, the alarm is detected immediately and no retry is reached.
2.notification_interval is defined as 0, the alarm is sent only once, not resend.

Skillfully using escalations to limit nagios alarm times

Nagios is a very powerful monitoring tool, especially its alarm function, now online implementation of a variety of forms such as the combination of mobile 139 mailbox, fetion, MSN, etc., but if the server fails to resolve in a timely manner, Nagios will continue to send alarm information, it is a headache. Now we can solve the problem of Nagios's alarm number with the following methods.

VI escalations.cfg

The escalations is automatically adjusted and continuously increased; Gradual rise, etc., the function of its own configuration file is that when the service does not recover before a certain number of alarms, the alarm frequency period will be shortened, and the alarm information will be sent to the designated contact person.

The contents are:

Define Hostescalation{

HOST_NAME www-server//Monitored host name, consistent with Hosts.cfg

First_notification 4//nth message, change frequency interval

Last_notification 0//nth information, Recovery frequency interval

Notification_interval 30//Notification interval (min)

Contact_groups sysadmin

}

Note: From the 4th alarm message to the server before the recovery, the alarm information sent to the contact person under the sysadmin group, the alarm interval is 30 minutes 1 messages.

Define Serviceescalation{

HOST_NAME www-server//Monitored host name, consistent with Hosts.cfg

service_description check_http,check_jetty//Monitored service name, consistent with Services.cfg

First_notification 4

Last_notification 0

Notification_interval 30

Contact_groups Nt-admins,managers,everyone

}

Save

Modify Nagios.cfg

VI nagios.cfg
Add to:
Cfg_file=/etc/nagios/objects/escalations.cfg

Check that the Nagios configuration file is correct
/usr/sbin/nagios-v/etc/nagios/nagios.cfg

Restart the Nagios service:
Service Nagios Restart

Test:

After the server starts, it stops the corresponding service of the monitored test machine and confirms that the alarm information is sent to different mailboxes according to the settings.

Summarize

Escalations The official definition of this feature is the expansion of notification, making notification more flexible and convenient. I used the method is a little clever, the fourth alarm information after all the information sent to my company mailbox until the server recovery (recovery information will be sent to the phone), so as to achieve the limit of the number of alarm messages sent to the phone. In this way, the ability to limit the number of Nagios alarms with escalations has been successfully implemented.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

How to change the default check time for Nagios monitoring

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

How to change the default check time for Nagios monitoring

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support