Description
The work consists of two parts of the monitoring end (123) and the monitored end (iv)
One, nrpe.cfg add script
Add a command index to the NRPE.CFG
COMMAND[CHECK_USED_MEM]=/USR/LOCAL/NAGIOS/LIBEXEC/CHECK_USED_MEM.SH 80 90
Description: Actual memory usage exceeds 80% warnings; more than 90% critical warnings
Second, add specific script
Reference Script check_used_mem.sh
Warn=$1
Critical=$2
All= ' Free | Sed-n ' 2p ' | awk ' {print $} '
Used= ' Free | Sed-n ' 3p ' | awk ' {print $} '
Let "c= $used *100/$all"
if [[$c-lt $warn]]
Then
echo "used Mem/total < $warn% [used: $used, total: $total: $all]"
Exit 0
elif [[$c-lt $critical]]
Then
echo "used mem/total≥ $warn% [used: $used, total: $total: $all]"
Exit 1
Else
echo "used mem/total≥ $critical% [used: $used, total: $total: $all]"
Exit 2
Fi
Explain:
1.1, 2 respectively refers to the input of the first to second parameter, such as:
del.sh script content is
#/bin/bash
echo $
SH del.sh a #第一个参数是a
chmod a+x./del.sh a #第一个参数是a
2. Free to view the use of memory
[Root@xen_202_12/]# Free-m
Total used free shared buffers Cached
mem:3072 2459 612 0 207 1803
-/+ buffers/cache:447 2624
swap:1913 0 1913
Total Memory: 3072
Number of memory used has used: 2459
Free amount of Memory: 612
Shared is currently obsolete no, always 0
Buffers:buffer Cache Memory Number: 13220
Cached:page Cache Memory Number: 2720160
Relationship: total = used + Free
Line 3rd:
The meaning of-/+ Buffers/cache:
-buffers/cache Memory: 447 (equal to used-buffers-cached on line 1th)
+buffers/cache Memory: 2624 (equal to line 1th Free + buffers + cached)
Note: The amount of memory here is a little bit different in size (not knowing what the reason is), after using the above formula.
Visible-buffers/cache reflects the memory that is actually eaten by the program, and +buffers/cache reflects the total amount of memory that can be misappropriated.
3. Sed-n ' 2p ' refers to finding the second line
4. awk ' {print $} ' refers to the second column, which is separated by a space by default. You can use-f to specify a separator
$echo 1b234b56b7 | Awk-f ' B ' {print $} '
234
5. Practical [[]],< used to compare strings;-lt used to compare numbers
Third, restart Nrpe
/usr/local/nagios/bin/nrpe-c/usr/local/nagios/etc/nrpe.cfg-d
Iv. Adding monitoring items to the monitoring side
Define Service {
Use Generic-service
HOST_NAME 100.61.73.2,100,61,73.3
Service_description Memory
Check_command Check_nrpe!check_used_mem
Notifications_enabled 1
}
using Python scripts to monitor Linux hosts in Nagios
192.168.5.110 at the monitored end
1. Put the getload.py in the/usr/local/nagios/libexec first.
[Root@nhserver1 ~]# vim/usr/local/nagios/libexec/getload.py
#! /usr/bin/env python
Import Os,sys
(D1,D2,D3) = Os.getloadavg ()
If D1 >= 5.0:
Print "Getloadavg critical:load average is%.2f"% (D1)
Sys.exit (2)
elif D1 >= 2.0:
Print "Getloadavg warning:load average is%.2f"% (D1)
Else
Print "Getloadavg ok:load average is%.2f"% (D1)
[Root@nhserver1 libexec]# chmod a+x getload.py
[Root@nhserver1 libexec]# chown Nagios:nagios getload.py
2. Add custom commands within Nrpe
[Root@nhserver1 libexec]# Vim/usr/local/nagios/etc/nrpe.cfg
command[nh_check_getload]=/usr/local/nagios/libexec/getload.py
------------------------------------------------------------------------------------------
Test 192.168.5.10 on the Nagios service side
[Root@nhserver2 libexec]#/usr/local/nagios/libexec/check_nrpe-h 192.168.5.110-c nh_check_getload
Getloadavg ok:load Average is 0.06
Add a custom script to the Nagios of the server-side test 192.168.5.10
[Root@nhserver2 ~]# Cd/usr/local/nagios/etc/objects
[Root@nhserver2 objects]# Vim Hosts_192.168.5.110.cfg
Define Host{
Use Linux-server
HOST_NAME 192.168.5.110
Alias 192.168.5.110
Address 192.168.5.110
}
Define Hostgroup{
Hostgroup_name Nh_linuxs
Alias Nh_linuxs
Members 192.168.5.110
}
Define Service{
Use Local-service
HOST_NAME 192.168.5.110
Service_description check-host-alive
Check_command check-host-alive
Max_check_attempts 5
Normal_check_interval 3
Retry_check_interval 2
Check_period 24x7
Notification_interval 10
Notification_period 24x7
}
Define Service{
Use Local-service
HOST_NAME 192.168.5.110
Service_description SSH
Check_command Check_ssh
Max_check_attempts 5
Normal_check_interval 3
Retry_check_interval 2
Check_period 24x7
Notification_interval 10
Notification_period 24x7
}
Define Service{
Use Local-service
HOST_NAME 192.168.5.110
Service_description check_nrpe_check_users
Check_command check_nrpe!nh_check_users
Max_check_attempts 5
Normal_check_interval 3
Retry_check_interval 2
Check_period 24x7
Notification_interval 10
Notification_period 24x7
}
Define Service{
Use Local-service
HOST_NAME 192.168.5.110
Service_description Check_nrpe_check_getload
Check_command Check_nrpe!nh_check_getload
Max_check_attempts 5
Normal_check_interval 3
Retry_check_interval 2
Check_period 24x7
Notification_interval 10
Notification_period 24x7
}
[Root@nhserver2 objects]# Service Nagios Reload
The state that is visible in Nagios services.
192.168.5.110
Check_nrpe_check_getload
OK 04-17-2014 16:21:53 0d 0h 4m 22s 1/5 getloadavg ok:load average is 0.00