Nagios monitors remote host survival, disk space monitoring, load monitoring, process Count monitoring, and ip connection

Last Update:2014-06-17 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

These monitors include host survival, disk space monitoring, load monitoring, process Count monitoring, and ip connection monitoring. (1) define the host configuration file hosts. cfgdefinehost {host_namecacti.comaliasnagiosserveraddress192.168.10.195contact_groupsadmin on the monitoring server, including host survival, disk space monitoring, load monitoring, process Count monitoring, and ip connection monitoring.
(1) define the host configuration file hosts. cfg on the monitoring server
Define host {
Host_name cacti.com
Alias nagios server
Address 192.168.10.195
Contact_groups admins
Check_command check-host-alive
Max_check_attempts 5
Icationication_interval 10
Icationication_period 24x7
Notification_options d, u, r
}
Note:
● The Contact Group contact_group is not created and must be completed in subsequent steps.
● The host check command line generally selects check host survival check-host-alive.
● It is best not to set the maximum number of attempts to "1", which is generally reasonable for 3-4 times.
● Notification interval icationication_interval is set according to your actual situation. the unit is minute.
● The notification option icationication_options indicates d-down, u-unreacheable, r-recovery.
(2) define the service configuration file services. cfg on the monitoring server
Define service {
Host_name cacti.com
Service_description check-host-alive
Check_period 24x7
Max_check_attempts 4
Normal_check_interval 3
Retry_check_interval 2
Contact_groups admins
Icationication_interval 10
Icationication_period 24x7
Notification_options w, u, c, r
Check_command check-host-alive
}
Define service {
Host_name cacti.com
Service_description check-disk
Check_command check_nrpe! Check_df
Max_check_attempts 4
Normal_check_interval 3
Retry_check_interval 2
Check_period 24x7
Icationication_interval 10
Icationication_period 24x7
Notification_options w, u, c, r
Contact_groups admins
}
Define service {
Host_name cacti.com
Service_description check-load
Check_command check_nrpe! Check_load
Max_check_attempts 4
Normal_check_interval 3
Retry_check_interval 2
Check_period 24x7
Icationication_interval 10
Icationication_period 24x7
Notification_options w, u, c, r
Contact_groups admins
}
Define service {
Host_name cacti.com
Service_description total_procs
Check_command check_nrpe! Check_total_procs
Max_check_attempts 4
Normal_check_interval 3
Retry_check_interval 2
Check_period 24x7
Icationication_interval 10
Icationication_period 24x7
Notification_options w, u, c, r
Contact_groups admins
}
Define service {
Host_name cacti.com
Service_description ip_connets
Check_command check_nrpe! Check_ip_connets
Max_check_attempts 4
Normal_check_interval 3
Retry_check_interval 2
Check_period 24x7
Icationication_interval 10
Icationication_period 24x7
Notification_options w, u, c, r
Contact_groups admins
}
Note:
● The host name host_name must be the host defined in the host configuration file hosts. cfg.
● Check the check_command, which is defined in the command configuration file or specified in the nrpe configuration file.
.
● The maximum number of retries max_check_attempts is generally set to 3-4 times, so that the network
A false alarm is reported when a transient disconnection occurs.
● The check interval and retry check interval are measured in minutes.
● Notification interval refers to the amount of time at which an alarm is sent after a fault is detected. It is measured in minutes.
● The notification option is the same as the service definition configuration file.
● The Contact Group contact_groups is defined by the configuration file contactgroup. cfg.
● Check that you need to install and configure nrpe for host resources. this process is completed later.
(3) configure nrpe on the monitored end
Modify the configuration file/usr/local/nagios/etc/nrpe. cfg. the changed area is shown in bold:
# Run with a separate daemon
Server_address = 192.168.10.195
# Command [check_hda1] =/usr/local/nrpe/libexec/check_disk-w 20-c 10-p/dev/hda1
Command [check_df] =/usr/local/nagios/libexec/check_disk-x/dev-w 20-c 10
Command [check_ip_connets] =/usr/local/nagios/libexec/ip_conn.sh 8000 10000
Note:
● Command [check_df] =/usr/local/nagios/libexec/check_disk-w 20-c 10 check the entire server
Disk utilization of; if it is a freebsd system, because its/dev partition is 100%, you need to exclude this partition,
Therefore, the command line should be "command [check_df] =/usr/local/nagios/libexec/check_disk-x
/Dev-w 20-c 10 ".
● Command [check_ip_connets] =/usr/local/nagios/libexec/ip_conn.sh 8000 ip connection
Number,
(4) create a monitoring script on the monitored end
[Root @ cacti nagios] # cd/usr/local/nagios/libexec/
[Root @ cacti libexec] # vi ip_conn.sh
The script content is as follows:
#! /Bin/sh
# If [$ #-ne 2]
# Then
# Echo "Usage: $0-w num1-c num2"
# Exit 3
# Fi
Ip_conns = 'netstat-an | grep tcp | grep EST | wc-L'
If [$ ip_conns-lt $1]
Then
Echo "OK-connect counts is $ ip_conns"
Exit 0
Fi
If [$ ip_conns-gt $1-a $ ip_conns-lt $2]
Then
Echo "Warning-connect counts is $ ip_conns"
Exit 1
Fi
If [$ ip_conns-gt $2]
Then
Echo "Critical-connect counts is $ ip_conns"
Exit 2
Fi
[Root @ cacti libexec] # chmod + x ip_conn.sh
I wrote the two parameters required by the script in the nrpe configuration file nrpe. cfg, so this script does not need to be judged.
Disconnects two input values. As long as the number of connections to the current ip address exceeds 8000, the system sends a warning alarm,
If the number exceeds 10000, a "critical" alarm is sent.
(5) restart the nrpe service and check its configuration
[Root @ cacti libexec] # killall-9 nrpe
[Root @ cacti libexec] #/usr/local/nagios/bin/nrpe-c/usr/local/nagios/etc/nrpe. cfg? D
[Root @ cacti libexec] # netstat-nltp | grep 5666
Tcp 0 0 192.168.10.195: 5666 0.0.0.0 :*
LISTEN 780/nrpe
(6) check the plug-in function on the monitoring server
Check nrpe service
[Root @ nagios libexec] #./check_nrpe-H 192.168.10.195
NRPE v2.12
Check Disk utilization through nrpe
[Root @ nagios libexec] #./check_nrpe-H 192.168.10.195-c check_df
Disk OK-free space:/21565 MB (82% inode = 98%);/boot 82 MB (88% inode = 99% );
/Dev/shm 505 MB (100% inode = 99%); |/= 4723 MB; 27699; 27709; 0; 27719
/Boot = 10 MB; 78; 88; 0; 98/dev/shm = 0 MB; 485; 495; 0; 505
Check ip connection count
[Root @ nagios libexec] #./check_nrpe-H 192.168.10.195-c check_ip_connets
OK-connect counts is 5
Check load
[Root @ nagios libexec] #./check_nrpe-H 192.168.10.195-c check_load
OK-load average: 0.93, 1.12, 1.21 | load1 = 0.930; 15.000; 30.000; 0;
Load5 = 1.120; 10.000; 25.000; 0; load15 = 1.210; 5.000; 20.000; 0;
Check the number of processes
[Root @ nagios libexec] #./check_nrpe-H 192.168.10.195-c check_total_procs
Procs OK: 92 processes

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Nagios monitors remote host survival, disk space monitoring, load monitoring, process Count monitoring, and ip connection

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Nagios monitors remote host survival, disk space monitoring, load monitoring, process Count monitoring, and ip connection

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support