Nagios
Enterprise Monitoring explanation
first, about Nagios
Nagioss is a classic old brand monitoring, along with a generation of operations engineers road growth, even now is also widely used in enterprises. Nagios has three features that determine the advantages it stands for:
First: Nagios is an open source monitoring product that is most convenient to deploy and configure, with no one and easiest to get started than other surveillance products. The company's bosses like to save time and speed to complete their work.
Second: Nagios monitoring of various basic service levels is two words "perfect" thanks to nagios many of the basic calculation methods for monitoring projects are very direct and effective, and very well in line with the General people for the "monitoring" concept of implementation.
Third: Although a lot of new monitoring products, but there is a strange phenomenon: the General Enterprise interview operations engineer/architect, to the monitoring here do not ask other questions like to ask Nagios related issues. For each enterprise, has been used to assess a qualified operational Engineer's basic literacy indicators.
Second, Nagios two core modes of work
1, Nagios's first mode of operation: Remote direct connection acquisition
What does the remote direct-connect acquisition work mode mean? Nagios's monitoring server needs to collect data from the monitored machines, while some of the data on the monitored machines are as follows:
IP Address not pass through
Port does not pass through
2. Nagios Second mode of operation: bridging indirect acquisition
CPU load, hard disk remaining, process existence, etc., such data itself is not a "service" like the number of hard disk, this is a client Linux system itself is not the same as the SSH service can be directly accessible to the outside world, What if the Nagios server has no way to get the client hard drive directly from the network? This will refer to an extended feature of Nagios, Nrpe components.
Three: Start the installation Nagios service-side and NRPE Components
First do Linux server time synchronization
The time server can be viewed from http://www.pool.ntp.org/zh/here
[Email protected] ~]# ntpdate 1.cn.pool.ntp.org
Since our Yum source does not contain Nagios packages, install Epel-release first.
[Email protected] ~]# yum-y install Epel-release
Nagios itself does not have Web server, but it needs to be in the form of a web, so you need to install httpd (Apache) First
1 , installation http
[email protected] ~]# Yum install httpd
After the installation is complete, start the service to check if the installation is successful.
[Email protected] ~]# service httpd restart
[Email protected] ~]# NETSTAT-ANPTU | grep 80
Take a look at the website information
[Email protected] ~]# curl-i 192.168.1.65
2 , installation nagios* , Nrpe
[email protected] ~]# Yum install nagios* Nrpe
Start after installation
[[Email protected] ~]# service Nagios start
After startup, let's set up the initial login password for Nagios:
Here we default to use Nagiosadmin as the login name, the password is 123456
(Note: If the Web Management manager is not using the default nagiosadmin, you need to modify the Cgi.cfg
# vim/usr/local/nagios/etc/cgi.cfg
Change all nagiosadmin to a custom user name)
[Email protected] ~]# htpasswd-c/etc/nagios/passwd nagiosadmin
Next, in our browser, enter:
Http://192.168.1.65/nagios
Four, Nagios How to set a profile (i): node definition .
Configuration file path
[Email protected] ~]# cd/etc/nagios/objects
Here's what we're going to find. The configuration path can see that there are already many default profiles that end with CFG, but these are actually some of the templates we don't currently use, and we'll focus on the localhost.cfg file.
1. Define the Node
[Email protected] objects]# vim localhost.cfg
The contents are as follows:
Define Host{use Linux-server; Name of the host template to use; This host definition would inherit all variables is defined; In (or inherited by) the Linux-server host template definition. Host_name centos_67 alias localhost address 192.168.1.67}
We can add the machine to be monitored by copying this file.
Where to modify:
host_name : centos_67 (Note: centos_67 This is their own name is the unique identification symbol, can not be repeated, the following will be repeated calls, the name can be arbitrarily taken, the proposed FQDN).
alias : alias, any name.
Address : The IP address of the node being monitored.
Restart Nginx after adding nodes
[[Email protected] objects]# service Nagios restart
Next, go back to our browser http://192.168.1.65/nagios/
Click the Host button on the left and the centos_67 host will appear in our monitoring list.
Five, Nagios How to set up a profile (ii): Group definitions and Service definition
The definitions of each node in Nagios are scattered after all, and we can unify the ' nodes ' that belong to a class into a ' group '. Next we look at the definition of the configuration file:
Define hostgroup{hostgroup_name Ceshi; The name of the HostGroup alias Linux Servers; Long name of the group members localhost,centos_67; Comma separated list of hosts that belong to this group}
As the previous paragraph is added to Localhost.cfg, the previous nodes (including the native) are joined to a host_group group.
Hostgourp_name defining group names well understood
Alias is a group alias
The members are followed by the host_name of each node, note that the host_name here is the host_name in the node, must be consistent, the name is exposed with commas, and the alias cannot be used.
Then continue to restart Nagios.
[[Email protected] objects]# service Nagios restart
Open the Web interface: Click the Host Groups button on the left. will show the Ceshi group we just set up.
Six, Nagios Service ( service ) the definition
Defining services is to define the services that we need to monitor to run on the host.
Let's take a look at the following paragraph, taken from Localhost.cfg This section is the default template provided by the service definition method used to monitor the machine? Services (software)
Defineservice{use Local-service; Nameofservice templatetousehost_name localhostservice_description sshcheck_command check_sshnotifications_enabled 0 }
Us e : The fields we keep are not changed.
Ho s T _ N Ame :
Here, the name of the node that is defined in Define_host must be consistent with the previous HOST_NAME or Nagios cannot be found, and here again it proves how critical the node is to be defined, followed by a large number of two references.
ser Vic e _ desc R Iptio N
This is just a service monitoring note, you can write the name you want.
Check_command This is the most critical of all, define this service monitor specifically to monitor what content to tune what monitoring script to take (later I will talk about script customization).
notifications_enabled :
Whether to turn on the alert function. 1 is on, 0 is disabled. That is, whether to start the alarm. That is, whether to turn on the alarm to monitor this service (I'll talk about the alarm pagerduty later).
Next we do a centos_67 on the SSH service monitoring, continue to add in the Localhost.cfg file to define the following:
Define Service{use Local-service; Name of service template to Usehost_name centos_67service_description sshcheck_command check_sshnotifications_enabled 0 }
After restarting the Nagios service, take a look at the Nagios home page.
Select services on the left to discover that centos_67 SSH monitoring is already available.
Next, let's monitor the HTTP service. Add the following configuration after Localhost.cfg:
Defineservice{use Local-service; Nameofservice templatetousehost_name centos_67service_description Httpcheck_command check_httpnotifications_ Enabled 0}
To install the HTTP service and start the client in advance, we take Apache as an example:
[Email protected] ~]# yum-y install httpd
[[Email protected] ~]# service httpd start
Log on to the Nagios Web side to view:
So what if we're going to monitor a site that's not using the default 80 port, but rather the other ports we specify? How can we change the monitoring flexibly?
VII. Configuration of the service monitoring plug-in
Above this section of the service configuration, the red flag part, the reason can directly call check_http is not out of thin air, but there is a basis. This is based on the Nagios plugin (we did not install a lot of nagios-plugins this is the plugin).
Nagios defines the service in which check_command is actually monitored by invoking the pre-set scripts in the plugin (Check_http is actually a script).
Let's analyze the associations of these scripts and plugins:
(1) First find the Nagios plugin associated with http from rpm
[Email protected] objects]# Rpm-qa | grep Nagios-plugins | grep http
Check_http This script actually comes from this plugin.
(2) Let's look at what specific files (scripts) are included in this plugin.
[Email protected] objects]# RPM-QL nagios-plugins-http-2.2.1-4git.el6.x86_64
/usr/lib64/nagios/plugins/check_http
As can be seen above, in fact check_http is placed in a default Nagios plugin path, as a script is called by Nagios.
(3) Check_http After this script is found, we will try it locally on Linux.
[Email protected] objects]#/usr/lib64/nagios/plugins/check_http-h 192.168.1.67
HTTP ok:http/1.1 OK-253 Bytes in 0.001 second response time |time=0.000980s;;;0. 000000 size=253b;;;0
found that the original script can not only be called by Nagios Check_command directly itself can also be directly executed by us and can return the results.
(4) We change the httpd port of centos_67 to 8080
After the change, the original Nagios configuration of the check_http will not be monitored.
[Email protected] objects]#/usr/lib64/nagios/plugins/check_http-h 192.168.0.66
So what do we do? We now try the Linux local check_http script to support port changes.
[Email protected] objects]#/usr/lib64/nagios/plugins/check_http-h 192.168.0.66-p8080
To run the tests locally, we can change them flexibly, and now we need to build a relationship with Nagios, so let's look at how to build such an association. First we'll find this configuration file:
/etc/nagios/objects/commands.cfg
This section of this file is configured,
In fact, the definition of the Check_http name and how the method parameters are invoked are set here. Therefore, in the define_service of the check_http, in fact, the first to be defined in the command.cfg. Take a look at this line:
Command_line $USER 1$/check_http-i $HOSTADDRESS $ $ARG 1$
$USER 1$:/usr/lib64/nagios/plugins/check_xxxx
$ARG 1$: The value of this parameter is to be added in the service (DEFINE_SERVICE,ARG1 parameter) in its own definition.
The next section of Defineservice definition:
Defineservice{use Local-service; Nameofservice templatetousehost_name centos_67service_description Httpcheck_command check_http!-p8080 Notifications _enabled 0}
Note the Red Flag section, which is the way to add extra parameters when invoking the monitoring script in the service! Separate, the parameters are used exactly like the local command line. Next look at the HTTP service of the centos_67 machine that is monitoring is normal.
Viii. Adoption of NRPE Establish bridge mode monitoring
Just to cite a few examples are in fact belong to the first type of nagios monitoring form "Direct connection monitoring", if you want to monitor such as CPU hard disk memory how to do? Nature is to make the second form of "indirect bridging acquisition", by NRPE This plug-in to achieve.
Next, start Installing the Indirect monitoring nrpe:
(1) The first step is to install NRPE and nagios-plugins two components on both the server side of Nagios and the monitored client.
The Yum install Nrpe (nagios-plugins* has already been installed) since the server was installed, so it is not installed now.
Client Installation :
Install the epel-release first to provide our installation package.
[Email protected] ~]# yum-y install Epel-release
Next Install the plugin:
[email protected] ~]# Yum install-y Nrpe nagios-plugins*
(2)
① next goes back to the server centos_65, to the localhost.cfg configuration file, at the end of the following: define Check_nrpe:
Define Command{command_name check_nrpecommand_line $USER 1$/check_nrpe-h $HOSTADDRESS $-C $ARG 1$}
② then add the following service at the end of the Nagios server Localhost.cfg configuration file to monitor the load status of the CPU on the centos_67:
Define service{use Local-service; Name of service template to use host_name centos_67 service_description UPTIME check_command check_nrpe!check_cpu No Tifications_enabled 0}
③ Next Configure the client
Check_cpu This script belongs to the indirect acquisition of data use, so CHECK_CPU can only be called on the client and then collect the CPU load data on the client, previously set on the server side of the Check_nrpe actually can only call to the client Check_nrpe, You cannot call the Check_cpu script on the client. Therefore, only the client's Nrpe calls the local check_cpu script, then the client Nrpe and check_cpu scripts need to be associated.
Check _ CP u must have 2 places on the client.
The 1th place is the setting of the Nrpe command on the client:
[Email protected]~] # cat/etc/nrpe.d/lcgdm-common.cfg Command[check_cpu]=/usr/lib64/nagios/plugins/lcgdm/check_ cpucommand[check_network]=/usr/lib64/nagios/plugins/lcgdm/check_networkcommand[check_process]=/usr/lib64/ Nagios/plugins/lcgdm/check_process-p rfiod,globus-gridftp-servercommand[check_hostcert]=/usr/lib64/nagios/ Plugins/lcgdm/check_hostcert-c/etc/grid-security/hostcert.pem-s 2nd place is:/USR/LIB64/NAGIOS/PLUGINS/LCGDM/CHECK_CPU
[Email protected] ~]# LS/USR/LIB64/NAGIOS/PLUGINS/LCGDM/CHECK_CPU/USR/LIB64/NAGIOS/PLUGINS/LCGDM/CHECK_CPU
The ④ server and the client are to start the Nrpe software, confirming that the operation is on the 5666 Port .
[[Email protected] objects]# service Nrpe startstarting Nrpe [OK][[email protected] objects]# NETSTAT-ANTPU | grep 5666tcp 0 0 0.0.0.0:5666 0.0.0.0:* LISTEN 53633/nrpetcp 0 0::: 5666:::* LISTEN 53633/nrpe
[[Email protected] ~]# service Nrpe startstarting Nrpe [OK][[email protected] ~]# NETSTAT-ANTPU | grep 5666tcp 0 0 0.0.0.0:5666 0.0.0.0:* LISTEN 53202/nrpetcp 0 0::: 5666:::* LISTEN 53202/nrpe
⑤ Modify the Nrpe Master profile on the client, otherwise the Nagios server is not connected.
[Email protected] ~]# vim/etc/nagios/nrpe.cfg
Modified: allowed_hosts=192.168.1.65
The 192.168.1.65 is a server-side IP for Nagios. Restart the RNPE service after the modification is complete.
[Email protected] ~]# service Nrpe restart
⑥ a local test on the service side.
[Email protected] objects]#/usr/lib64/nagios/plugins/check_nrpe-h 192.168.1.67
The Nrpe version number appears, indicating success.
Finally, to the Nagios home page, you have monitored the client CPU.
Nine, write their own monitoring scripts, and embedded into Nagios
The monitoring projects we used previously were provided by Nagios and the nagios_plugins* plug-in package. In fact, Nagios plug-ins are fully supported by their own hand-written, look at the monitoring side of the following shell code:
[[Email protected] ~] # Vim/usr/lib64/nagios/plugins/check_waiting_connect
(Take Check_waiting_connect as our script name.) All scripts written by themselves are placed under the vim/usr/lib64/nagios/plugins/path)
#!/bin/bash state_ok=0 state_critical=2 w= ' Netstat-an | Grep-i Wait | Wc-l ' If [$W-le 1000]; Then echo "ok,waiting_connections<1000 Low" exit $STATE _ok;else Echo "Waring, Waiting_connections>1000high" exit $ State_criticalfi
(The content of the script implementation is very simple, that is, through the netstat command to get the number of waiting links in the Linux system (waitting_connections) if more than 1000 alarm, less than 1000 is normal).
Nagios can identify 4 status return information,
0 (OK) indicates the status is normal/green,
1 (WARNING) indicates a warning/huangs color (System auto-harmony to write pinyin)
2 (CRITICAL) indicates a very serious error/red
3 (UNKNOWN) indicates unknown error/deep Huangs Nagios depending on the value returned by the plug-in, the status of the monitored object is judged and displayed through the Web for the management manager to detect faults in a timely manner.
After writing the script, the monitor side finds the Nrpe configuration file and adds a new command (Nrpe)
[Email protected] ~]# vim/etc/nrpe.d/lcgdm-common.cfg
Command[check_waiting_connect]=/usr/lib64/nagios/plugins/check_waiting_connect
Back to server server Nagios server centos_65 add Defineservice
Define service{use Local-service; Nameofservice templatetouse host_name centos_67 service_description waiting_connects check_command check_nrpe!check_ Waiting_connect notifications_enabled 0}
Next, make sure that the client Nagios user has permission to execute the monitoring script and then come to the server check:
[Email protected] objects]#/usr/lib64/nagios/plugins/check_nrpe-h 192.168.1.67-c check_waiting_connect
Execute successfully, then restart the server server Nagios, Nrpe service.
Service Nagios Restart
Service Nrpe Restart
View Nagios Home page display:
It can be shown here.
Well, here's a summary of Nagios, welcome home.
Nagios Enterprise Monitoring Explained