Nagios Study Notes

Source: Internet
Author: User
Document directory
  • Contact Configuration
  • Contact Group Configuration
  • Add Host Configuration
  • Add service configuration
  • Register the plug-in using Nagios
  •  
  • Plug-in service configuration
  • Compile a plug-in using Python
Introduction

Nagios is a monitoring system that monitors the running status and network information of the system. It can monitor specified local or remote hosts and services, and provide exception notification functions, it runs on the Linux/Unix platform and provides an optional browser-based Web interface for system administrators to view network status, various system problems, and logs.

The key to understanding Nagios is that it does not monitor and track "common" measurement data, such as CPU usage, this tool simplifies all information into "work", "suspicious", and "fault" statuses. This helps operators focus on the most important and critical issues based on predefined and configurable standards.

Main features of Nagios:

  • -Monitoring Network Services (SMTP, POP3, HTTP, nntp, ping, etc)
  • -Monitor host resources (processes, disks, etc)
  • -The monitoring function of Nagios can be easily expanded with a simple plug-in design.
  • -Concurrent processing of services and other monitoring
  • -Error Notification function (via email, pager, or other user-defined methods)
  • -You can specify a Custom Event processing controller.
  • -An optional browser-based Web interface allows system administrators to view network status, various system problems, and logs.
  • -You can view the system monitoring information on your mobile phone.

 

Install

Here we will only introduce the source code installation: dependency tools:

  • Gcc
  • Make
  • Autoconf
  • Automake

Dependent libraries:

  • Libgd
  • OpenSSL

Many SNMP-related plug-ins also require Perl and net: SNMP packages.After Nagios is installed and configured, you can use the default http: // localhost/Nagios URL to access Nagios.

Configure Nagios

By default, all Nagios configuration files are located in the/etc/Nagios directory. The configuration can be divided into multiple files, each of which is used for different parts of the configuration. 3.3.1 The configuration directory structure is as follows:

/Etc/Nagios/
Nagios. cfg # main configuration file of Nagios, which references other configuration files
CGI. cfg # configuration related to the Nagios Web Interface
Resource. cfg # global variable definition Configuration
Objects/
Commands. cfg # register and configure commands and plug-ins
Contacts. cfg # contact and Contact Group Configuration
Localhost. cfg # sample configuration for monitoring local resources
Printer. cfg # Network Printer sample configuration
Switch. cfg
Templates. cfg # configuration Template
Timeperiods. cfg
Windows. cfg # Windows Host Configuration example
Hosts. cfg # custom configurations of all hosts and host groups
Services. cfg # customize all service configurations
Contact Configuration 

The first component to be set is the contact and Contact Group. Contacts are those who receive notifications about the stopping of the host or service. By default, Nagios provides pager and email notification methods. By using extensions, you can use jabber and many other methods to send notifications, which is convenient in some cases.

The contact is stored in the contacts. cfg file and defined as follows:

define contact{
contact_name jdoe
alias John Due
service_notification_commands notify-by-email
host_notification_commands host-notify-by-emailes
email john.doe@yourcompany.com
}

 

Contact Group Configuration

Group contacts: when the host or service status changes, Nagios does not specify the person to be notified, but notifies the relevant group. Sometimes you can even define a person multiple times to specify different notification commands or addresses, and then add all contact methods to the contact group where the user is located.

define contactgroup{
contactgroup_name server-admins
alias Server Administrators
members jdoe,albundy
}


 

Add Host Configuration

The next step is to configure the host to be monitored by Nagios. You should add all hosts that contain services that monitor or check whether they are active. The configuration file for storing host information is hosts. cfg.

The following is an example of a host definition:

define host{
host_name ubuntu_1_2
alias Ubuntu test server
address 192.168.1.2
check_command check-host-alive
max_check_attempts 20
notifications_enabled 1
event_handler_enabled 0
flap_detection_enabled 0
process_perf_data 1
retain_status_information 1
retain_nonstatus_information 1
notification_interval 60
notification_period 24x7
notification_options d,u,r
}
Add service configuration 

The last step of Nagios configuration is to define services for the configured host. This example uses a predefined "ping" Nagios plug-in that will send an Internet Control Information Protocol (ICMP) echo request to determine whether the host has a response.

define service{
use service-template
host_name ubuntu_1_2
service_description PING
check_period 24x7
contact_groups server-admins
notification_options c,r
check_command check_ping!300.0,20%!1000.0,60%
}

After completing this configuration, restart your Nagios daemon, wait a few seconds for Nagios to initialize, and then confirm the visibility of the ping service in the Web management interface.

Compile the Nagios plugin

The most exciting aspect of Nagios is that you can easily write your own plug-ins. You only need to understand some simple guiding principles. To manage the plug-in, Nagios generates a sub-process every time it queries the status of a service, and uses the output and exit code from the command to determine the specific status. The exit status code is described as follows:

  • OK-Exit code 0-indicates that the service works normally.
  • Warning-Exit code 1-indicates that the service is in the warning state.
  • Critical-Exit Code 2-indicates that the service is in a dangerous state.
  • Unknown-Exit code 3-indicates that the service is unknown.

The last status usually indicates that the plug-in cannot determine the service status. For example, an internal error may occur.

The following provides a python sample script to check the average UNIX load. It assumes that more than 2.0 indicates the warning state, and more than 5.0 indicates the dangerous state. These values are hard-coded and always use the average load of the last minute.

#!/usr/bin/env python
import os,sys
(d1, d2, d3) = os.getloadavg()
if d1 >= 5.0:
print "GETLOADAVG CRITICAL: Load average is %.2f" % (d1)
sys.exit(2)
elif d1 >= 2.0:
print "GETLOADAVG WARNING: Load average is %.2f" % (d1)
sys.exit(1)
else:
print "GETLOADAVG OK: Load average is %.2f" % (d1)
sys.exit(0)

After writing this small executable plug-in, register the plug-in using Nagios and create a service definition that checks the average load.

This work is also very simple: Use the following content to create a file named/etc/Nagios-plugins/config/mygetloadavg. cfg file, according to the following example, to services. add a service to the cfg file. Remember that localhost must be defined in the hosts. cfg configuration file.

Register the plug-in using Nagios
define command{
command_name check_mygetloadavg
command_line /path/to/check_getloadavg
}
Plug-in service configuration
define service{
use service-template
host_name localhost
service_description LoadAverage
check_period 24x7
contact_groups server-admins
notification_options c,r
check_command check_mygetloadavg
}
Compile a complete plug-in

The preceding example illustrates the restriction of a "hard-coded" plug-in, which does not support runtime configuration. In practice, the best way is to create a configurable plug-in. In this way, you can create and maintain a plug-in, use Nagios to register it as a single plug-in, and pass parameters to customize the warning and risk levels for specific situations. The next example also contains a message of use, which is very valuable for plug-ins that have been proven to be used or maintained by several different developers or administrators.

Another good practice is to capture all exceptions and report the unknown service status back so that Nagios can correctly manage notifications about this situation. Plug-ins that allow "failed" exceptions usually exit and return value 1. For Nagios, this indicates a warning state. Make sure that your plug-in is correctly divided into warning and unknown. Note that, for example, when processing it as an unknown result may cause errors, at least some warning notifications can be disabled.

Compile a plug-in using Python

The preceding recommendation-runtime parameterization, a message used, and improved Exception Handling-will get the source code of the sample plug-in, which is several times longer than the previous one. However, you can process errors more securely and reuse the plug-in a wider scope.

#!/usr/bin/env python

import os
import sys
import getopt

def usage():
print """Usage: check_getloadavg [-h|--help] [-m|--mode 1|2|3] \
[-w|--warning level] [-c|--critical level]"

Mode: 1 - last minute ; 2 - last 5 minutes ; 3 - last 15 minutes"
Warning level defaults to 2.0
Critical level defaults to 5.0"""
sys.exit(3)

try:
options, args = getopt.getopt(sys.argv[1:],
"hm:w:c:",
"--help --mode= --warning= --critical=",
)
except getopt.GetoptError:
usage()
sys.exit(3)

argMode = "1"
argWarning = 2.0
argCritical = 5.0

for name, value in options:
if name in ("-h", "--help"):
usage()
if name in ("-m", "--mode"):
if value not in ("1", "2", "3"):
usage()
argMode = value
if name in ("-w", "--warning"):
try:
argWarning = 0.0 + value
except Exception:
print "Unable to convert to floating point value\n"
usage()
if name in ("-c", "--critical"):
try:
argCritical = 0.0 + value
except Exception:
print "Unable to convert to floating point value\n"
usage()

try:
(d1, d2, d3) = os.getloadavg()
except Exception:
print "GETLOADAVG UNKNOWN: Error while getting load average"
sys.exit(3)

if argMode == "1":
d = d1
elif argMode == "2":
d = d2
elif argMode == "3":
d = d3

if d >= argCritical:
print "GETLOADAVG CRITICAL: Load average is %.2f" % (d)
sys.exit(2)
elif d >= argWarning:
print "GETLOADAVG WARNING: Load average is %.2f" % (d)
sys.exit(1)
else:
print "GETLOADAVG OK: Load average is %.2f" % (d)
sys.exit(0)

To use this new plug-in, use the following method to register/etc/Nagios-plugins/config/mygetloadavg2.cfg:

define command{
command_name check_mygetloadavg2
command_line /path/to/check_getloadavg2 -m $ARG1$ -w $ARG2$ -c $ARG3$
}

In addition, add or modify service entries in the services. cfg file according to the following example. Note that the exclamation point is used! To separate plug-in parameters. As before, localhost must be defined in the hosts. cfg configuration file.

define service{
use service-template
host_name localhost
service_description LoadAverage2
check_period 24x7
contact_groups server-admins
notification_options c,r
check_command check_mygetloadavg2!1!3.0!6.0
}

References: http://www.ibm.com/developerworks/cn/aix/library/au-nagios/

Http://yahoon.blog.51cto.com/13184/41778

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.