Nagios Study Notes

Last Update:2018-12-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Document directory

Contact Configuration
Contact Group Configuration
Add Host Configuration
Add service configuration
Register the plug-in using Nagios
Plug-in service configuration
Compile a plug-in using Python

Introduction

Nagios is a monitoring system that monitors the running status and network information of the system. It can monitor specified local or remote hosts and services, and provide exception notification functions, it runs on the Linux/Unix platform and provides an optional browser-based Web interface for system administrators to view network status, various system problems, and logs.

The key to understanding Nagios is that it does not monitor and track "common" measurement data, such as CPU usage, this tool simplifies all information into "work", "suspicious", and "fault" statuses. This helps operators focus on the most important and critical issues based on predefined and configurable standards.

Main features of Nagios:

-Monitoring Network Services (SMTP, POP3, HTTP, nntp, ping, etc)

-Monitor host resources (processes, disks, etc)

-The monitoring function of Nagios can be easily expanded with a simple plug-in design.

-Concurrent processing of services and other monitoring

-Error Notification function (via email, pager, or other user-defined methods)

-You can specify a Custom Event processing controller.

-An optional browser-based Web interface allows system administrators to view network status, various system problems, and logs.

-You can view the system monitoring information on your mobile phone.

Install

Here we will only introduce the source code installation: dependency tools:

Gcc
Make
Autoconf
Automake

Dependent libraries:

Libgd
OpenSSL

Many SNMP-related plug-ins also require Perl and net: SNMP packages.After Nagios is installed and configured, you can use the default http: // localhost/Nagios URL to access Nagios.

Configure Nagios

By default, all Nagios configuration files are located in the/etc/Nagios directory. The configuration can be divided into multiple files, each of which is used for different parts of the configuration. 3.3.1 The configuration directory structure is as follows:

/Etc/Nagios/
Nagios. cfg # main configuration file of Nagios, which references other configuration files
CGI. cfg # configuration related to the Nagios Web Interface
Resource. cfg # global variable definition Configuration
Objects/
Commands. cfg # register and configure commands and plug-ins
Contacts. cfg # contact and Contact Group Configuration
Localhost. cfg # sample configuration for monitoring local resources
Printer. cfg # Network Printer sample configuration
Switch. cfg
Templates. cfg # configuration Template
Timeperiods. cfg
Windows. cfg # Windows Host Configuration example
Hosts. cfg # custom configurations of all hosts and host groups
Services. cfg # customize all service configurations

Contact Configuration

The first component to be set is the contact and Contact Group. Contacts are those who receive notifications about the stopping of the host or service. By default, Nagios provides pager and email notification methods. By using extensions, you can use jabber and many other methods to send notifications, which is convenient in some cases.

The contact is stored in the contacts. cfg file and defined as follows:

define contact{
        contact_name                    jdoe
        alias                           John Due
        service_notification_commands   notify-by-email
        host_notification_commands      host-notify-by-emailes
        email                           john.doe@yourcompany.com
        }

Contact Group Configuration

Group contacts: when the host or service status changes, Nagios does not specify the person to be notified, but notifies the relevant group. Sometimes you can even define a person multiple times to specify different notification commands or addresses, and then add all contact methods to the contact group where the user is located.

define contactgroup{
        contactgroup_name               server-admins
        alias                           Server Administrators
        members                         jdoe,albundy
        }

Add Host Configuration

The next step is to configure the host to be monitored by Nagios. You should add all hosts that contain services that monitor or check whether they are active. The configuration file for storing host information is hosts. cfg.

The following is an example of a host definition:

define host{
        host_name                       ubuntu_1_2
        alias                           Ubuntu test server
        address                         192.168.1.2
        check_command                   check-host-alive
        max_check_attempts              20
        notifications_enabled           1
        event_handler_enabled           0
        flap_detection_enabled          0
        process_perf_data               1
        retain_status_information       1
        retain_nonstatus_information    1
        notification_interval           60
        notification_period             24x7
        notification_options            d,u,r
        }

Add service configuration

The last step of Nagios configuration is to define services for the configured host. This example uses a predefined "ping" Nagios plug-in that will send an Internet Control Information Protocol (ICMP) echo request to determine whether the host has a response.

define service{
        use                             service-template
        host_name                       ubuntu_1_2
        service_description             PING
        check_period                    24x7
        contact_groups                  server-admins
        notification_options            c,r
        check_command                   check_ping!300.0,20%!1000.0,60%
        }

After completing this configuration, restart your Nagios daemon, wait a few seconds for Nagios to initialize, and then confirm the visibility of the ping service in the Web management interface.

Compile the Nagios plugin

The most exciting aspect of Nagios is that you can easily write your own plug-ins. You only need to understand some simple guiding principles. To manage the plug-in, Nagios generates a sub-process every time it queries the status of a service, and uses the output and exit code from the command to determine the specific status. The exit status code is described as follows:

OK-Exit code 0-indicates that the service works normally.
Warning-Exit code 1-indicates that the service is in the warning state.
Critical-Exit Code 2-indicates that the service is in a dangerous state.
Unknown-Exit code 3-indicates that the service is unknown.

The last status usually indicates that the plug-in cannot determine the service status. For example, an internal error may occur.

The following provides a python sample script to check the average UNIX load. It assumes that more than 2.0 indicates the warning state, and more than 5.0 indicates the dangerous state. These values are hard-coded and always use the average load of the last minute.

#!/usr/bin/env python
import os,sys
(d1, d2, d3) = os.getloadavg()
if d1 >= 5.0:
print "GETLOADAVG CRITICAL: Load average is %.2f" % (d1)
    sys.exit(2)
elif d1 >= 2.0:
print "GETLOADAVG WARNING: Load average is %.2f" % (d1)
    sys.exit(1)
else:
print "GETLOADAVG OK: Load average is %.2f" % (d1)
    sys.exit(0)

After writing this small executable plug-in, register the plug-in using Nagios and create a service definition that checks the average load.

This work is also very simple: Use the following content to create a file named/etc/Nagios-plugins/config/mygetloadavg. cfg file, according to the following example, to services. add a service to the cfg file. Remember that localhost must be defined in the hosts. cfg configuration file.

define command{
    command_name    check_mygetloadavg
    command_line    /path/to/check_getloadavg
}

Plug-in service configuration

define service{
        use                             service-template
        host_name                       localhost
        service_description             LoadAverage
        check_period                    24x7
        contact_groups                  server-admins
        notification_options            c,r
        check_command                   check_mygetloadavg
        }

Compile a complete plug-in

The preceding example illustrates the restriction of a "hard-coded" plug-in, which does not support runtime configuration. In practice, the best way is to create a configurable plug-in. In this way, you can create and maintain a plug-in, use Nagios to register it as a single plug-in, and pass parameters to customize the warning and risk levels for specific situations. The next example also contains a message of use, which is very valuable for plug-ins that have been proven to be used or maintained by several different developers or administrators.

Another good practice is to capture all exceptions and report the unknown service status back so that Nagios can correctly manage notifications about this situation. Plug-ins that allow "failed" exceptions usually exit and return value 1. For Nagios, this indicates a warning state. Make sure that your plug-in is correctly divided into warning and unknown. Note that, for example, when processing it as an unknown result may cause errors, at least some warning notifications can be disabled.

Compile a plug-in using Python

The preceding recommendation-runtime parameterization, a message used, and improved Exception Handling-will get the source code of the sample plug-in, which is several times longer than the previous one. However, you can process errors more securely and reuse the plug-in a wider scope.

#!/usr/bin/env python

import os
import sys
import getopt

def usage():
print """Usage: check_getloadavg [-h|--help] [-m|--mode 1|2|3] \
    [-w|--warning level] [-c|--critical level]"

Mode: 1 - last minute ; 2 - last 5 minutes ; 3 - last 15 minutes"
Warning level defaults to 2.0
Critical level defaults to 5.0"""
    sys.exit(3)

try:
    options, args = getopt.getopt(sys.argv[1:],
"hm:w:c:",
"--help --mode= --warning= --critical=",
        )
except getopt.GetoptError:
    usage()
    sys.exit(3)

argMode = "1"
argWarning = 2.0
argCritical = 5.0

for name, value in options:
if name in ("-h", "--help"):
        usage()
if name in ("-m", "--mode"):
if value not in ("1", "2", "3"):
            usage()
        argMode = value
if name in ("-w", "--warning"):
try:
            argWarning = 0.0 + value
except Exception:
print "Unable to convert to floating point value\n"
            usage()
if name in ("-c", "--critical"):
try:
            argCritical = 0.0 + value
except Exception:
print "Unable to convert to floating point value\n"
            usage()

try:
    (d1, d2, d3) = os.getloadavg()
except Exception:
print "GETLOADAVG UNKNOWN: Error while getting load average"
    sys.exit(3)

if argMode == "1":
    d = d1
elif argMode == "2":
    d = d2
elif argMode == "3":
    d = d3

if d >= argCritical:
print "GETLOADAVG CRITICAL: Load average is %.2f" % (d)
    sys.exit(2)
elif d >= argWarning:
print "GETLOADAVG WARNING: Load average is %.2f" % (d)
    sys.exit(1)
else:
print "GETLOADAVG OK: Load average is %.2f" % (d)
    sys.exit(0)

To use this new plug-in, use the following method to register/etc/Nagios-plugins/config/mygetloadavg2.cfg:

define command{
    command_name    check_mygetloadavg2
    command_line    /path/to/check_getloadavg2 -m $ARG1$ -w $ARG2$ -c $ARG3$
}

In addition, add or modify service entries in the services. cfg file according to the following example. Note that the exclamation point is used! To separate plug-in parameters. As before, localhost must be defined in the hosts. cfg configuration file.

define service{
        use                             service-template
        host_name                       localhost
        service_description             LoadAverage2
        check_period                    24x7
        contact_groups                  server-admins
        notification_options            c,r
        check_command                   check_mygetloadavg2!1!3.0!6.0
 }

References: http://www.ibm.com/developerworks/cn/aix/library/au-nagios/

Http://yahoon.blog.51cto.com/13184/41778

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Nagios Study Notes

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Nagios Study Notes

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support