/usr/local/nagios/etc/nagios.cfg:
Interval_length represents the time unit, which defaults to 60, or 1 minutes
/usr/local/nagios/etc/objects/services.cfg: (This file does not exist in the new version)
Normal_check_interval represents the time interval for re-detection, which defaults to 3 time units
Check_interval, like the Normal_check_interval, can only be in the 3. Used in X
Retry_check_interval Retry Time
Max_check_attempts This is the number of failed connections, the number of times after the alarm
About Max_check_attempts, Normal_check_interval, retry_check_interval three parameters.
First, two concepts are explained,
One, the soft state: the monitored item is in the abnormal state of the retry_check detection period;
Second, the hard state: the monitored items to reach the max_check_attempts maximum number of abnormal state, in addition to the State, we estimate and call it "normal".
Let's try to see how Nagios does state detection and alerting when setting the following parameters:
Max_check_attempts 3
Normal_check_interval 3
Retry_check_interval 2
Notification_interval 3
First, Nagios detects a service every three minutes, and when a service state is detected as an exception, it goes directly to the soft state (1/3 soft), and thereafter to 2 times per 2-minute (retry_check_interval) detection frequency (3 tests, To achieve max_check_attempts) detection, if the two detection services are abnormal, then directly into the hard state. After entering the hard state, Nagios detects the service every 3 minutes (Normal_check_interva) frequency, which is the same as normal, while alerting every 3 minutes (notification_interval).
Note: Modifying these parameters does not take effect immediately. To restart Nagios first, and then wait for the next test to complete, Nagios calculates the detection time and the number of alarms according to the new parameters.
If you want to send an email alert after the alert appears, the following conditions are required:
There is a definition of notifications_enabled=1 in the service, and the contacts of this service has a service_notification_commands defined.
Service_notification_commands's command is from Commands.cfg, which defines what commands are used to send messages.
The alert for host is the same.
There are several possibilities for not receiving a message:
Mail is rejected, check the log of mail can be seen.
Whether Nagios sends an alert message is related to several parameters in the contact.cfg (or hosts.cfg) configuration file. Here are some of the key notes below:
Notifications_enabled: Whether to enable the Notification alert feature. 1 is on, 0 is disabled. Obviously, the value of this option is 0 o'clock, and Nagios is definitely not sending mail.
Contact_groups: Defines the group of contact groups that receive notification reminders messages. Please make sure your email address is filled in correctly and in the group.
Notification_interval: The shortest interval between sending reminders (messages) repeatedly. The default time interval is 60 minutes. If this value is set to 0,nagios the alert notification message will not be sent repeatedly, but once. Notification_period: Defines the time period during which an alarm notification is sent. Critical host service, set to 7x24, general host service, set to work hours (worktime). Then, if the defined monitoring is not in the defined alarm time period, Nagios will not send an alert notification message, no matter what happens.
Notification_options: Defines the circumstances under which the monitored host (object) sends an alert notification message, the optional status is as follows:
(1) w:warning, warning
(2) U:unknown, unknown
(3) C:critical, Danger (reached critical value)
(4) D:down, outage
(5) R:recovery, status restored to OK
(6) F:flapping, (does not understand the meaning of the state, perhaps the state fluctuation is very large)
(7) N:none, do not send alarm notification mail
Nagios monitoring and alarm interval:
Max_check_attempts:
Check_interval:
Retry interval:
Notification_interval:
In the OK state, nagios is monitored with check_interval defined intervals, and after a problem occurs, switches to retry_interval and max_check_attempts for monitoring, reaching Max_check_ Attempts trigger the first alarm, while resuming to check_interval for monitoring, and with notification_interval defined time interval to send an alarm, service recovery, at the nearest check_interval point send OK SMS, Complete the alarm cycle.
Special:
The 1.max_check_attempts is defined as 1, the alarm is detected immediately and no retry is reached.
2.notification_interval is defined as 0, the alarm is sent only once, not resend.
Skillfully using escalations to limit nagios alarm times
Nagios is a very powerful monitoring tool, especially its alarm function, now online implementation of a variety of forms such as the combination of mobile 139 mailbox, fetion, MSN, etc., but if the server fails to resolve in a timely manner, Nagios will continue to send alarm information, it is a headache. Now we can solve the problem of Nagios's alarm number with the following methods.
VI escalations.cfg
The escalations is automatically adjusted and continuously increased; Gradual rise, etc., the function of its own configuration file is that when the service does not recover before a certain number of alarms, the alarm frequency period will be shortened, and the alarm information will be sent to the designated contact person.
The contents are:
Define Hostescalation{
HOST_NAME www-server//Monitored host name, consistent with Hosts.cfg
First_notification 4//nth message, change frequency interval
Last_notification 0//nth information, Recovery frequency interval
Notification_interval 30//Notification interval (min)
Contact_groups sysadmin
}
Note: From the 4th alarm message to the server before the recovery, the alarm information sent to the contact person under the sysadmin group, the alarm interval is 30 minutes 1 messages.
Define Serviceescalation{
HOST_NAME www-server//Monitored host name, consistent with Hosts.cfg
service_description check_http,check_jetty//Monitored service name, consistent with Services.cfg
First_notification 4
Last_notification 0
Notification_interval 30
Contact_groups Nt-admins,managers,everyone
}
Save
Modify Nagios.cfg
VI nagios.cfg
Add to:
Cfg_file=/etc/nagios/objects/escalations.cfg
Check that the Nagios configuration file is correct
/usr/sbin/nagios-v/etc/nagios/nagios.cfg
Restart the Nagios service:
Service Nagios Restart
Test:
After the server starts, it stops the corresponding service of the monitored test machine and confirms that the alarm information is sent to different mailboxes according to the settings.
Summarize
Escalations The official definition of this feature is the expansion of notification, making notification more flexible and convenient. I used the method is a little clever, the fourth alarm information after all the information sent to my company mailbox until the server recovery (recovery information will be sent to the phone), so as to achieve the limit of the number of alarm messages sent to the phone. In this way, the ability to limit the number of Nagios alarms with escalations has been successfully implemented.
How to change the default check time for Nagios monitoring