Note: There is a file embedded within this post, please visit this post to download the file.
The data center does not have a temperature alarm device. I use this method to control the temperature of the data center. If there is only one alarm, a single machine failure can be considered. If there are several alarms at the same time, it can be considered that there is a problem with the air conditioner in the data center.
The specific implementation method is as follows:
Environment: monitored: CentOS 6.4
1. install hardware sensor monitoring software sensors
# Yum install lm_sensors *
2. Run sensors-detect for Sensor Detection
# Sensors-detect # Press enter all the way. In this step, I will report an error under the virtual machine, but there is no problem on the physical machine.
3. Run sensors to check whether data can be read, as shown below:
[Root @ rd02 ~] # Sensors
Coretemp-isa-0000
Adapter: ISA adapter
Core 0: + 32.0 °C (high = + 76.0 °C, crit = + 100.0 °C)
Core 1: + 32.0 °C (high = + 76.0 °C, crit = + 100.0 °C)
4. # vi/usr/local/nagios/libexec/check_cputemp # paste the content between the following #
######################################## ##################
#! /Bin/sh
######### Check_cputemp ###########
# Date: May 1, 2011
# Licence GPLv2
# INSTALLATION
# The script need to install lm_sensors
# Sensors's output need like below format
######################################## #
# Coretemp-isa-0000 #
# Adapter: ISA adapter #
# Core 0: + 27 °C (high = + 85 °C )#
#
# Coretemp-isa-0001 #
# Adapter: ISA adapter #
# Core 1: + 25 °C (high = + 85 °C )#
######################################## #
# You can use NRPE to define service in nagios
# Check_nrpe! Check_cputemp.sh
# Plugin return statements
STATE_ OK = 0
STATE_WARNING = 1
STATE_CRITICAL = 2
STATE_UNKNOWN = 3
Print_help_msg (){
$ Echo "Usage: $0-h to get help ."
}
Print_full_help_msg (){
$ Echo "Usage :"
$ Echo "$0 [-v]-m sensors-w cpuT-c cpuT"
$ Echo "Sepicify the method to use the temperature data sensors ."
$ Echo "And the corresponding Critical value must greater than Warning value ."
$ Echo "Example :"
$ Echo "$ {0}-m sensors-w 40-c 50 ″
}
Print_err_msg (){
$ Echo "Error ."
Print_full_help_msg
}
To_debug (){
If ["$ Debug" = "true"]; then
$ Echo "$ *">/var/log/check_sys_temperature.log. $2> & 1
Fi
}
Unset LANG
Echo = "echo-e"
If [$ #-lt 1]; then
Print_help_msg
Exit 3
Else
While getopts: vhm: w: c: OPTION
Do
Case $ OPTION
In
V)
# $ Echo "Verbose mode ."
Debug = true
;;
M)
Method = $ OPTARG
;;
W)
WARNING = $ OPTARG
;;
C)
CRITICAL = $ OPTARG ;;
H)
Print_full_help_msg
Exit 3
;;
?)
$ Echo "Error: Illegal Option ."
Print_help_msg
Exit 3
;;
Esac
Done
If ["$ method" = "sensors"]; then
Use_sensors = "true"
To_debug use_sensors
Else
$ Echo "Error. Must to sepcify the method to use sensors ."
Print_full_help_msg
Exit 3
Fi
To_debug All Values are \ "Warning:" $ WARNING "and Critical:" $ CRITICAL "\".
Fi
######### Lm_sensors ##################
If ["$ use_sensors" = "true"]; then
SensorsCheckOut = 'which sensors 2> & 1'
If [$? -Ne 0]; then
Echo $ sensorsCheckOut
Echo Maybe you need to check your sensors.
Exit 3
Fi
To_debug Use $ sensorsCheckOut to check system temperature
TEMP1 = 'sensors | head-3 | tail-1 | gawk '{print $3}' | grep-o [0-9] [0-9]'
TEMP2 = 'sensors | head-4 | tail-1 | gawk '{print $3}' | grep-o [0-9] [0-9]'
SUM = $ ($ TEMP1 + $ TEMP2 ))
TEMP = $ ($ SUM/2 ))
If [-z "$ TEMP"]; then
$ Echo "No Data been get here. Please confirm your ARGS and re-check it with Verbose mode, then to check the log ."
Exit 3
Fi
To_debug temperature data is $ TEMP
Else
$ Echo "Error. Must to sepcify the method to use sensors"
Print_full_help_msg
Exit 3
Fi
######## Comparaison with the warnings and criticals thresholds given by user ############
CPU_TEMP = $ TEMP
# If ["$ WARNING "! = "0"] | ["$ CRITICAL "! = "0"]; then
If ["$ CPU_TEMP"-gt "$ CRITICAL"] & ["$ CRITICAL "! = "0"]; then
STATE = "$ STATE_CRITICAL"
STATE_MESSAGE = "CRITICAL"
To_debug $ STATE, Message is $ STATE_MESSAGE
Elif ["$ CPU_TEMP"-gt "$ WARNING"] & ["$ WARNING "! = "0"]; then
STATE = "$ STATE_WARNING"
STATE_MESSAGE = "WARNING"
To_debug $ STATE, Message is $ STATE_MESSAGE
Else
STATE = "$ STATE_ OK"
STATE_MESSAGE = "OK"
To_debug $ STATE, Message is $ STATE_MESSAGE
Fi
Echo "The TEMPERATURE" $ STATE_MESSAGE "-" The CPU's Temperature is "$ CPU_TEMP" °C !"
Exit $ STATE
######################################## ##################
5. Grant the preceding script execution permission:
# Chmod + x/usr/local/nagios/libexec/check_cputemp
6. Configure nrpe. cfg and add the following line:
Command [check_cputemp] =/usr/local/nagios/libexec/check_cputemp-m sensors-w 38-c 45
Note: The preceding six steps are completed on the monitored machine.
7. Configure the service on the Nagios Server:
Define service {
Use generic-service
Host_name
Service_description CPU Temperature
Check_command check_nrpe! Check_cputemp
}
Save and restart the nagios service.
Nagios details: click here
Nagios: click here
Network Monitor Nagios Overview
Nagios construction and Configuration
Build a Nagios monitoring platform in the Nginx Environment
Configure the basic Nagios System on RHEL5.3 (using Nagios-3.1.2)
CentOS 5.5 + Nginx + Nagios monitoring and control terminal installation and Configuration Guide
Install Nagios Core for Ubuntu 13.10 Server