Nagios Monitoring System

Source: Internet
Author: User
Tags php mysql

Nagios Monitoring System
Nagios is an open source free network monitoring tool that can monitor Windows, Linux, and UNIX host status, switch routers and other network devices, when the system or service status is abnormal when sending mail or SMS alarm, the first time to notify the site operations personnel. Traffic monitoring is not his forte, traffic monitoring recommends using cacti, which can draw very intuitive graphics.
.

Nagios can mainly monitor the following areas:

    • Whether the host is down (ping command, if the ping is not going to be considered a host is down, but does not affect the other services being monitored)
    • Server resources (CPU usage, hard disk space remaining, etc.)
    • Network Services (Smtp\pop3\http\)
    • monitoring network devices (routers, switches, etc.)
      .

How Nagios Works
Nagios itself does not include the ability to monitor hosts and services. All monitoring and monitoring functions are done through a variety of plugins. After installing Nagios, the/libexex in the Nagios home directory is included with Nagios's plugin, such as: Check_disk is the plug-in that checks the disk space, Check_load is the plug-in that checks the CPU load, each plug-in can run. check_xxx-h command to check its usage and functionality.
.

Four monitoring statuses for Nagios
Nagios can identify four status return information. 0 (OK) indicates normal status (green display)

    • (WARNING) indicates that a warning (yellow) appears,
    • (CRITICAL) indicates a very serious error (red),
    • (UNKNOWN) indicates an unknown error (dark yellow), and Nagios determines the state of the monitored object based on the value returned by the plug-in, and is displayed through the Web for administrators to detect faults immediately.
      .

Nagios manages the service's work process remotely via the Nrpe plugin

    • Nagios executes the Check_nrpe plugin installed inside it and tells Check_nrpe to detect which services.
    • Connect the Nrpe daemon on the remote machine via the Ssl,check_nrpe.
    • Nrpe runs a variety of local plugins to detect local servers and status (Check_disk,... etc).
    • Nrpe the results of the test to the host side Check_nrpe,check_nrpe and then sends the results to the Nagios status queue.
    • Nagios reads the information in the queue, then displays the results.

Lab Environment:

Build Nagios Monitoring System
.

创建nagios用户和用户组[[email protected] /]#   mount /dev/cdrom /media/[[email protected] /]#   useradd -s /sbin/nologin nagios[[email protected] /]#   mkdir /usr/local/nagios[[email protected] /]#   chown -R nagios:nagios /usr/local/nagios/

.

编译安装nagios(需要提前配置yum)安装支持包:[[email protected] /]#   yum -y install httpd php mysql-devel openssl openssl-devel[[email protected] /]#   umount /dev/cdrom /media/[[email protected] /]#   mount /dev/cdrom /media/[[email protected] /]#   cd /media/

.

配置:[[email protected] /]#   tar zxf nagios-4.0.1.tar.gz -C /usr/src/[[email protected] /]#   cd /usr/src/nagios-4.0.1/[[email protected] /]#   ./configure --prefix=/usr/local/nagios/

.

编译和安装:[[email protected] /]# make install              //安装主程序,CGI和HTML文件 [[email protected] /]# make install-init         //在/etc/rc.d/init.d安装启动脚本 [[email protected] /]# make install-commandmode  //配置目录权限 [[email protected] /]# make install-config       //安装示例配置文件 [[email protected] /]# make install-webconf      //安装nagios的web接口,会在/etc/httpd/conf.d目录中创建nagios.conf文件。

.
6 directories are generated in the/usr/local/nagios directory after the installation is complete
.
Bin:nagios the directory where the executable program resides, the Nagios file is the main program.
Etc:nagios configuration file directory, when make Install-config is finished, the default profile will appear under etc.
The Sbin:nagios CGI file is located in the directory where some external command execution programs are stored.
Share:nagios Web page file directory, storing some HTML files.
Var:nagios log files, PID and other files directory.
Libexec: Where the system default plug-in is stored
.

添加为系统服务器[[email protected] /]#   chkconfig --add nagios[[email protected] /]#   chkconfig --level 35 nagios on

.

安装nagios插件(监控功能通过插件完成)[[email protected] /]#   cd /media/[[email protected] /]#   tar zxf nagios-plugins-1.5.tar.gz -C /usr/src/[[email protected] /]#   cd /usr/src/nagios-plugins-1.5/[[email protected] /]#   ./configure --prefix=/usr/local/nagios/

.

编译并安装:[[email protected] /]#   make && make install

.

安装nrpe(为了监控远程服务器)[[email protected] /]#   cd /media/[[email protected] /]#   tar zxf nrpe-2.15.tar.gz -C /usr/src/[[email protected] /]#   cd /usr/src/nrpe-2.15/[[email protected] /]#   ./configure && make all && make install-plugin

.
At the end of the/etc/httpd/conf/httpd.conf file add authorization, we can copy to the/etc/httpd/conf.d/nagios.conf file without having to hit the hand.

[[email protected] /]#   vim /etc/httpd/conf/httpd.conf 使用:r导入即可r /etc/httpd/conf.d/nagios.conf导入即可,不用修改,保存退出。

.

执行htpasswd命令添加一个访问nagios页面的授权用户[[email protected] /]#   /usr/bin/htpasswd -c /usr/local/nagios/etc/htpasswd.users nagiosadmin用户名和密码都是nagiosadmin

.

启动nagios和httpd[[email protected] /]#   service nagios start[[email protected] /]#  service httpd start


...............

.
Configuring Nagios Monitoring System files

  1. Nagios configuration file:
  2. Nagios.cfg: Master profile, defining the name and location of various configuration files
  3. CGI.CFG: Controlling the CGI configuration file
    Resource.cfg: Resource file, define various variables so that other files can be called
  4. Objects: Other configuration file directory, this directory is mainly:
    .
    COMMAND.CFG: Command configuration file, define various command formats for other file calls
    Contacts.cfg: Contacts and groups, send messages and other alarm information can be called
    LOCALHOST.CFG: Monitor the configuration file for this machine
    TIMEPERIODS.CFG: A configuration file that defines the monitoring time for other file calls
    Hostgroups.cfg: The host (group) that defines the monitoring needs to be created manually. ~~
    .

The relationship between configuration files

Several of the definitions involved in Nagios configuration include hosts, host groups, services, service groups, contacts, contact group, monitoring time, and monitoring commands. As you can see from these definitions, Nagios's various profiles are interrelated and referenced to each other. To successfully configure a Nagios monitoring system, each profile relies on a dependency relationship, and the most important four points are:

  • Define monitoring of those hosts, host groups, services, and service group
  • Define what command to use for this monitoring
  • Defining the time period for monitoring
  • Define the contacts and contact ancestors to notify when there is a problem with the host or server
    .

Configure Nagios

In order to be able to explain the problem more clearly and for ease of maintenance, it is recommended that you create a separate configuration file for each of the Nagios defined objects.

  • Create a conf directory to define host hosts
  • Create a hostgroups.cfg file to define a host group
  • Define contacts and contact groups with the default Contacts.cfg file
  • Define the command with the default commands.cfg file
  • Define the monitoring time period with the default Timeperiods.cfg
  • Use the default Templetes.cfg file as a resource reference file
    .

Configure Nagios, modify the configuration file

[[email protected] /]#  vim /usr/local/nagios/etc/nagios.cfg在cfg_file下面添加两行cfg_file=/usr/local/nagios/etc/object/hostgroups.cfgcfg_dir=/usr/local/nagios/etc/conf [[email protected] /]#   mkdir /usr/local/nagios/etc/conf

.

[[email protected] /]#   vim /usr/local/nagios/etc/objects/commands.cfg 在最下方添加define command{    command_name    check_nrpe    command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$        }

.

[[email protected] /]#    vim /usr/local/nagios/etc/objects/contacts.cfg 添加到contactgroup_name    admins的下面define contact{    contact_name        ydw    alias           ydw    service_notification_period 24x7    host_notification_period    24x7    service_notification_options    w,u,c,r    host_notification_options   d,u,r    service_notification_commands notify-service-by-email    host_notification_commands notify-host-by-email    email root    }

.

新建/usr/local/nagios/etc/objects/hostgroups.cfg(定义主机组)[[email protected] /]#   vim /usr/local/nagios/etc/objects/hostgroups.cfgdefine hosrgroup{    hostgroup_name webmysql    alias webmysql    members 192.168.1.20    }

.
Create a new 192.168.1.20.cfg file under/usr/local/nagios/etc/conf (for monitoring 192.168.1.20 host survival, load, process)

[[email protected]/]# cd/usr/local/nagios/etc/conf/[[email protected]/]# vim 192.168.1.20.cfgdefine Host {host_name 192.168.1.20alias 192.168.1.20address 192.168.1.20check_command check-host-alivemax_check_attempts 5check_period 24x7contact_groups ydwnotification_period 24x7notification_options d,u,r}define service{host_name 192.168.1.20service_description Check-host-alivecheck_command check-host-alivemax_check_attempts 3normal_check_ Interval 2retry_check_interval 2check_period 24x7notification_interval 10notification_period 24x7notification_ Options w,u,c,rcontact_groups ydw}define service{host_name 192.168.1.20service_description Check-procecheck_command Check_nrpe!check_total_procsmax_check_attempts 3normal_check_interval 2retry_check_interval 2check_period 24x7notification_interval 10notification_period 24x7notification_options w,u,c,rcontact_groups ydw}define service{ HOST_NAME 192.168.1.20service_description Check-loadcheck_command Check_nrpe!check_loadmax_check_attempts 3normal_check_interval 2retry_check_interval 2check_period 24x7notification_interval 10notification_period 24x7

.
.

命令解释:define host{         use         linux-server            //定义使用的模板       host_name   nagios             //被监控主机的名称,最好别带空格         alias         nagios               //别名               address      127.0.0.1          //被监控主机的IP地址               check_command    check-host-alive  normal_check_interval? ?3??         //正常检测间隔时间retry_check_interval? ? 2??            //重试检测间隔时间        //监控的命令check-host-alive,这个命令来自commands.cfg,用来监控主机是否存活         max_check_attempts    5     //检查失败后重试的次数         check_period        24x7       //检查的时间段24x7,同样来自timeperiods.cfg中定义 notification_interval  10            //提醒的间隔,每隔10秒提醒一次 notification_period   24x7       //提醒的周期, 24x7,同样来自timeperiods.cfg中定义contact_groups   admins       //联系人组,上面在contactgroups.cfg中定义的adminsnotification_options       d,u,r     //指定什么情况下提醒         }  .

Enter/usr/local/nagios/etc/objects/contacts.cfg at the end of the add

[[email protected] /]#   vim /usr/local/nagios/etc/objects/commands.cfg define contacatgroup{    contactgroup_name   ydw    alias                           ydw    members                     ydw    }

.

重启nagios服务[[email protected] /]#   service nagios restart

.
Open Firewall exception

chcon -R -t httpd_sys_content_t /usr/local/nagios/sbin/ chcon -R -t httpd_sys_content_t /usr/local/nagios/share/

Configuring the managed end 192.168.1.20 (MySQL and web)
Install directly with a script

[[email protected] /]#  mount /dev/cdrom /media/[[email protected] /]#  cd /media/[[email protected] /]#  

.
Copy scripts and software to/USR/SRC

[[email protected] /]#  cp nagios-plugins-1.5.tar.gz /usr/src/[[email protected] /]#  cp nrpe-2.15.tar.gz /usr/src/[[email protected] /]#  cp nagiosclient.sh /usr/src/

.
Replace 6.5 Discs

[[email protected] /]#   umount /dev/cdrom /media/[[email protected] /]#  mount /dev/cdrom /media/[[email protected] /]#  cd /usr/src/.
执行脚本:[[email protected] /]#  

.
After the installation is complete, you need to open vim/usr/local/nagios/etc/nrpe.cfg
Add the address of the Nagios server

[[email protected] /]#  vim /usr/local/nagios/etc/nrpe.cfg allowed_hosts=127.0.0.1,192.168.1.10.
启动nrpe,[[email protected] /]#  /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d

.
If you do not test on the server, you can restart the host and try again
[email protected]/]# reboot
.

(Server test)

[[email protected] /]#   /usr/local/nagios/libexec/check_nrpe -H 192.168.1.20NRPE v2.15


..........

.
Added: You can also add parameters from the 192.168.1.20.CGF file in the Services.cfg file

#vi/usr/local/nagios/etc/objects/services.cfg content is as follows: 1define service{use Local-service host_name Nagio    s service_description Ping Check_command check-host-alive}2define service{use Local-service HOST_NAME Nagios service_groups System Status Check service_description number of logged on users Check_command check-host-users !20!50}3define service{use local-service host_name nagios service_groups system Health Check Servic E_description root partition Check_command check-local_disk!20%!10%!/}4define service{Use Local-service h Ost_name Nagios service_groups System Health Check service_description process Total Check_command check-local_procs!2 50!400! RSZDT}5define service{use local-service host_name nagios service_groups System Health Check Service _description system load Check_command check-host-load!5.0,4.0,3.0!10.0,6.0,4.0}6define service{use Loca L-service Host_name Nagios service_groups System Health Check service_description swap space utilization Check_command CHECK-LOCAL_SWAP!20 !10}7define service{servicegroup_name System Status Check alias system Overview}
check_local_users!20!50       //监测远程主机当前的登录用户数量,如果大于20用户则报warning,如果大于50则报critical.check_local_disk!20%!10%!/        //如果可用空间低于20%会报Warning,如果可用空间低于10%则报Critical:.check_local_procs!250!400!RSZDT   //监测远程主机当前的进程总数,如果大于250进程则报warning,如果大于400进程则报critical,S(休眠)、R(运行)、Z(僵死)、D?(不可中断)、T?(停止).check_load -w 5,4,3 -c 10,6,4这个命令的意义如下* 当1分钟多于5个进程等待,5分钟多于4个,15分钟多于3个则为warning状态* 当1分钟多于10个进程等待,5分钟多于6个,15分钟多于4个则为critical状态服务组并不是必须的,这是配合nagios的监控页面的显示

Nagios Monitoring System

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.