Nagios monitors remote host survival, disk space monitoring, load monitoring, process count monitoring, and ip connection

Source: Internet
Author: User

These monitors include host survival, disk space monitoring, load monitoring, process count monitoring, and ip connection monitoring.
(1) define the host configuration file hosts. cfg on the Monitoring Server
Define host {
Host_name cacti.com
Alias nagios server
Address 192.168.10.195
Contact_groups admins
Check_command check-host-alive
Max_check_attempts 5
Icationication_interval 10
Icationication_period 24x7
Notification_options d, u, r
}
Note:
● The Contact Group contact_group is not created and must be completed in subsequent steps.
● The host check command line generally selects check host survival check-host-alive.
● It is best not to set the maximum number of attempts to "1", which is generally reasonable for 3-4 times.
● Notification interval icationication_interval is set according to your actual situation. The unit is minute.
● The Notification option icationication_options indicates d-down, u-unreacheable, r-recovery.
(2) define the service configuration file services. cfg on the Monitoring Server
Define service {
Host_name cacti.com
Service_description check-host-alive
Check_period 24x7
Max_check_attempts 4
Normal_check_interval 3
Retry_check_interval 2
Contact_groups admins
Icationication_interval 10
Icationication_period 24x7
Notification_options w, u, c, r
Check_command check-host-alive
}
Define service {
Host_name cacti.com
Service_description check-disk
Check_command check_nrpe! Check_df
Max_check_attempts 4
Normal_check_interval 3
Retry_check_interval 2
Check_period 24x7
Icationication_interval 10
Icationication_period 24x7
Notification_options w, u, c, r
Contact_groups admins
}
Define service {
Host_name cacti.com
Service_description check-load
Check_command check_nrpe! Check_load
Max_check_attempts 4
Normal_check_interval 3
Retry_check_interval 2
Check_period 24x7
Icationication_interval 10
Icationication_period 24x7
Notification_options w, u, c, r
Contact_groups admins
}
Define service {
Host_name cacti.com
Service_description total_procs
Check_command check_nrpe! Check_total_procs
Max_check_attempts 4
Normal_check_interval 3
Retry_check_interval 2
Check_period 24x7
Icationication_interval 10
Icationication_period 24x7
Notification_options w, u, c, r
Contact_groups admins
}
Define service {
Host_name cacti.com
Service_description ip_connets
Check_command check_nrpe! Check_ip_connets
Max_check_attempts 4
Normal_check_interval 3
Retry_check_interval 2
Check_period 24x7
Icationication_interval 10
Icationication_period 24x7
Notification_options w, u, c, r
Contact_groups admins
}
Note:
● The host name host_name must be the host defined in the host configuration file hosts. cfg.
● Check the check_command, which is defined in the command configuration file or specified in the nrpe configuration file.
.
● The maximum number of retries max_check_attempts is generally set to 3-4 times, so that the network
A false alarm is reported when a transient disconnection occurs.
● The Check Interval and retry check interval are measured in minutes.
● Notification interval refers to the amount of time at which an alarm is sent after a fault is detected. It is measured in minutes.
● The Notification option is the same as the service definition configuration file.
● The Contact Group contact_groups is defined by the configuration file contactgroup. cfg.
● Check that you need to install and configure nrpe for host resources. This process is completed later.
(3) Configure nrpe on the monitored end
Modify the configuration file/usr/local/nagios/etc/nrpe. cfg. The changed area is shown in bold:
# Run with a separate daemon
Server_address = 192.168.10.195
# Command [check_hda1] =/usr/local/nrpe/libexec/check_disk-w 20-c 10-p/dev/hda1
Command [check_df] =/usr/local/nagios/libexec/check_disk-x/dev-w 20-c 10
Command [check_ip_connets] =/usr/local/nagios/libexec/ip_conn.sh 8000 10000
Note:
● Command [check_df] =/usr/local/nagios/libexec/check_disk-w 20-c 10 check the entire server
Disk utilization of; if it is a freebsd system, because its/dev partition is 100%, you need to exclude this partition,
Therefore, the command line should be "command [check_df] =/usr/local/nagios/libexec/check_disk-x
/Dev-w 20-c 10 ".
● Command [check_ip_connets] =/usr/local/nagios/libexec/ip_conn.sh 8000 ip connection
Number,
(4) create a monitoring script on the monitored end
[Root @ cacti nagios] # cd/usr/local/nagios/libexec/
[Root @ cacti libexec] # vi ip_conn.sh
The script content is as follows:
#! /Bin/sh
# If [$ #-ne 2]
# Then
# Echo "Usage: $0-w num1-c num2"
# Exit 3
# Fi
Ip_conns = 'netstat-an | grep tcp | grep EST | wc-l'
If [$ ip_conns-lt $1]
Then
Echo "OK-connect counts is $ ip_conns"
Exit 0
Fi
If [$ ip_conns-gt $1-a $ ip_conns-lt $2]
Then
Echo "Warning-connect counts is $ ip_conns"
Exit 1
Fi
If [$ ip_conns-gt $2]
Then
Echo "Critical-connect counts is $ ip_conns"
Exit 2
Fi
[Root @ cacti libexec] # chmod + x ip_conn.sh
I wrote the two parameters required by the script in the nrpe configuration file nrpe. cfg, so this script does not need to be judged.
Disconnects two input values. As long as the number of connections to the current ip address exceeds 8000, the system sends a warning alarm,
If the number exceeds 10000, a "critical" alarm is sent.
(5) restart the nrpe service and check its configuration
[Root @ cacti libexec] # killall-9 nrpe
[Root @ cacti libexec] #/usr/local/nagios/bin/nrpe-c/usr/local/nagios/etc/nrpe. cfg-d
[Root @ cacti libexec] # netstat-nltp | grep 5666
Tcp 0 0 192.168.10.195: 5666 0.0.0.0 :*
LISTEN 780/nrpe
(6) Check the plug-in function on the Monitoring Server
Check nrpe Service
[Root @ nagios libexec] #./check_nrpe-H 192.168.10.195
NRPE v2.12
Check disk utilization through nrpe
[Root @ nagios libexec] #./check_nrpe-H 192.168.10.195-c check_df
Disk OK-free space:/21565 MB (82% inode = 98%);/boot 82 MB (88% inode = 99% );
/Dev/shm 505 MB (100% inode = 99%); |/= 4723 MB; 27699; 27709; 0; 27719
/Boot = 10 MB; 78; 88; 0; 98/dev/shm = 0 MB; 485; 495; 0; 505
Check ip connection count
[Root @ nagios libexec] #./check_nrpe-H 192.168.10.195-c check_ip_connets
OK-connect counts is 5
Check Load
[Root @ nagios libexec] #./check_nrpe-H 192.168.10.195-c check_load
OK-load average: 0.93, 1.12, 1.21 | load1 = 0.930; 15.000; 30.000; 0;
Load5 = 1.120; 10.000; 25.000; 0; load15 = 1.210; 5.000; 20.000; 0;
Check the number of processes
[Root @ nagios libexec] #./check_nrpe-H 192.168.10.195-c check_total_procs
Procs OK: 92 processes

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.