Build Nagios monitoring under Linux
One, what is Nagios
1.nagios Introduction
Nagios is a monitoring system that monitors the system's operational status and network information.
Nagios can monitor the specified local or remote host and service, as well as provide exception notification functionality.
Nagios runs on top of the Linux/unix platform, while providing an optional browser-based web interface to allow system administrators to view network status.
Various system problems, as well as logs and so on.
Nagios is a very popular, open source and free computer and network system monitoring software.
Nagios is the abbreviation for "Nagios Ain ' t gonna insist on sainthood".
It was first released in 1999 with "Netsaint". Nagios is primarily used for monitoring in Linux and UNIX platform environments,
But through the plugin, you can also monitor the MS Windows System host. Nagios is poll and selected as the most popular it ops tool in LinuxCon.
It was named best open source software by InfoWorld in 2009 and is the best choice for systems management tools for the SourceForge community of the year.
Nagios is also used by many well-known companies, including Aol,dhl,at&t, L ' oreal, Texas Instruments, Siemens COM CZ, Time Warner Cable, Yahoo, etc.
The main features of the 2.Nagios are:
-Monitor network services (SMTP, POP3, HTTP, NNTP, ping, etc.)
-Monitor host resources (processes, disks, etc.)
-Simple plug-in design to easily expand Nagios's monitoring capabilities
-concurrent processing of monitors such as services
-Error notification function (via email, pager, or other user-defined method)
-can specify a custom event handling controller
-Optional browser-based web interface to allow system administrators to view network status, system issues, logs, etc.
-System monitoring information can be viewed from the phone
II. Nagios Monitoring Environment construction
1. Setting Up Environment Introduction:
HostnameIP System
Service side webserver192.168.1.20CentOS 6.6
Client hpf-linux192.168.1.110CentOS 6.6
2. Basic service-side installation:
[[email protected] ~]# yum install-y epel-release//Omit this step if the machine has a epel extension source installed [[email protected] ~]# Yum install-y httpd Nagi Os nagios-plugins nagios-plugins-all Nrpe Nagios-plugins-nrpe//install Nagios related packages [[email protected] ~]# htpasswd-c/etc/nagio S/PASSWD nagiosadmin//Generate login Nagios backend account and password new password:re-type new password:adding password for user nagiosadmin[[email Protected] ~]# nagios-v/etc/nagios/nagios.cfg//Detect Nagios configuration file Error Total Warnings:0total errors:0things look okay-n O Serious problems were detected during the pre-flight check
Start service-side Nagios services and monitoring services:
[[email protected] ~]#/etc/init.d/httpd start[[email protected] ~]#/etc/init.d/nagios start
Log in to the browser to enter Http://ip/nagios to see if the service is Nagios build success
Enter the Nagios backend management by entering the password you just generated;
Click Serviers View monitoring, according to monitor the service is normal debugging;
The HTTP service can have a WARNING at first, with an error prompt for HTTP warning:http/1.1 403 Forbidden-5152 bytes in 0.001 second response t;
The reason for this is: when Nagios monitors HTTP, it will monitor the index.html file under/var/www/html/, and if not, it will prompt an error.
Create a file! After the creation, the monitoring status will be changed to OK;
3. Add Server Nagios Monitor (increase monitoring client)
Client installs Nagios monitoring service and file configuration:
[[email protected] ~]# yum install-y epel-release//Omit this step if the client has installed the Epel extension source [[email protected] ~]# Yum install-y nagios-plug INS Nagios-plugins-all Nrpe Nagios-plugins-nrpe//install Nagios monitoring related packages [[email protected] ~]# vi/etc/nagios/nrpe.cfg found "Allowe d_hosts=127.0.0.1 "changed to" allowed_hosts=127.0.0.1,192.168.1.20 "after the IP for the server IP; The changes to the two configuration files under "Dont_blame_nrpe=0" to "dont_blame_nrpe=1" are changed according to the Monitoring Service (CHECK_HDA1) added by the Nagios service side: command[check_sda1]= /usr/lib/nagios/plugins/check_disk-w 20%-C 10%-p/dev/sda1command[check_sda2]=/usr/lib/nagios/plugins/check_disk- W 20%-C 10%-p/dev/sda2
Note: The option to add COMMAND[CHECK_SDA] to the command options on both the monitor and the monitored side
And after restarting Nrpe and Nagios, it will take a while for Nagios's web pages to mark the original check disk
For the critical option to revert to normal.
Configure the server-side Nagios script file:
[[email protected] conf.d]# vi /etc/nagios/objects/commands.cfg // Under this profile, add the following content define command{ command_name check_nrpe command_line $USER 1$/check_nrpe -h $HOSTADDRESS $ -c $ARG 1$ }[[email protected] ~]# cd /etc/nagios/conf.d/[[email protected] conf.d]# vi 192.168.1.110.cfg define host{ use linux-server host_name 192.168.1.110 alias 1.110 address 192.168.1.110 } define service{ use generic-service host_name 192.168.1.110 service_description check_ping check_command check_ping!100.0,20%!200.0,50% max_check_attempts 5 normal_check_ interval 1 } define service{ use generic-service host_name 192.168.1.110 service_description check_ssh check_command check_ssh max_check_attempts 5 #当nagios检测到问题时, a total of 5 attempts to detect a problem before the alarm, if the value is 1, Then detect the problem immediately alarm normal_check_interval 1 #重新检测的时间间隔, Unit is minutes, default is 3 minutes notification_interval 60 #在服务出现异常后, the failure has not been resolved, and Nagios again notifies the user of the time. Units are minutes. If you think that all events require only one notification, you can set the option here to 0. }define service{ use generic-service host_name 192.168.1.110 service_description check_http check_command check_http max_check_attempts 5 normal_check_interval 1 }define service{ use generic-service host_name 192.168.1.110 service_description check_load check_command check_nrpe!check_load max_check_attempts 5 normal_check_interval 1}define service{ use generic-service host_name 192.168.1.110 service_description check_disk_sda1 check_command check_nrpe!check_sda1 max_check_attempts 5 normal_check_interval 1}define service{ use generic-service host_name 192.168.1.110 service_description check_disk_sda2 check_command check_nrpe!check_sda2 max_check_ attempts 5 Normal_check_interval 1}[[email protected] ~]# nagios -v /etc/nagios/nagios.cfg //detects if the configuration file is correct total warnings: 0total errors: 0things look Okay - no serious problems were detected during the pre-flight check
To start the Nrpe service on the client:
[[email protected] ~]#/etc/init.d/nrpe start
Restart the Nagios service on the server:
[Email protected] ~]#/etc/init.d/nagios restart
See if the monitoring of the Nagios service is displayed on the browser:
4. Configure Email Alerts:
[[email protected] ~]# vim /etc/nagios/objects/contacts.cfgdefine contact{ contact_name nagios1 use generic-contact alias mail1 email &nbSp [email protected] }define contact{ contact_name nagios2 use generic-contact alias mail2 email [ Email protected] }define contactgroup{ contactgroup_name common alias common members nagios1,nagios2 }[[email protected] conf.d]# vi 192.168.1.110.cfg The 192.168.1.110.cfg configuration file above has the following section: Define service{ use generic-service host_name 192.168.1.110 service_description check_load check_command check_nrpe!check_load max_check_attempts 5    NORMAL_CHECK_INTERVAL   1} Add the following four statements to the last section of the configuration: contact_groups common notifications_enabled 1 #是否开启提醒功能.1 is on, 0 is disabled. In general, this option is defined in the main configuration file (Nagios.cfg), with the same effect. notification_period 24x7 #发送提醒的时间段. Very important host (service) I defined as 7x24, the general host (service) is defined as working hours. #如果不在定义的时间段内, no reminders are sent, no matter what happens. notification_ options w,u,c,r # This is the status of the service. W for Waning, u for unknown, c for Critical, r for recover (resumed), #类似的还有一个 host corresponding status:d,u,r d = status is down, u = Status is unreachable , r = state reverts to ok, # Need to be added to the definition configuration of host.
[[email protected] ~]# nagios-v/etc/nagios/nagios.cfg//Detect configuration File Error Total Warnings:0total errors:0things look okay- No serious problems were detected during the pre-flight check
5. Verify that the alert message configuration is successful:
Turn on the Virtual machine Mail Service
[[email protected] ~]# yum install-y sendmail//install mail Service pack [[email protected] ~]#/etc/init.d/sendmail start//Start mail Service [[E Mail protected] ~]# NETSTAT-LNP |grep sendmail//view mail Service open port TCP 0 0 127.0.0.1:25 0.0.0.0:* LISTEN 1011/sendmail
Configure a whitelist of 163 mailboxes on your browser to prevent alert messages from being treated as spam:
Wkiol1wciowtxqqraaufz-fmy94418.jpg
[[email protected] ~]#/etc/init.d/nrpe stop//In the client to turn off the Nrpe service to see whether the server sends alarm messages; Shutting down Nrpe: [OK]
Alarm message send time will have a period of time delay, need to wait patiently;
This article is from the "clear" blog, make sure to keep this source http://duanyexuanmu.blog.51cto.com/1010786/1750019
CentOS builds Nagios monitoring system.