Some time ago there is a maintenance of the author of the small computer room air conditioning failure, high temperature caused the system card slow, no one found. So I want to use Zabbix to monitor the CPU temperature and set the alarm threshold value, but also can monitor the fan anomaly, the wind board too dirty air, process deadlock caused high CPU usage anomaly, side Monitoring Server a variety of abnormal conditions.
The server has two four-way x86 server, the system is CentOS, uses the software lm_sensors. Use Zabbix monitoring System to customize monitoring items, collect monitoring data and set alarms. This is detailed below.
Install lm_sensors and get CPU temperature
yum install lm_sensors安装后运行 sensors-detect 检测内核模块,在引导下直接enter,使用默认选项检测结束后运行 sensors ,可以看到每颗CPU每个核心的温度
configuring Zabbix, customizing monitoring items, customizing templates, setting triggers
Modify the client configuration file zabbix_agentd.conf
To set Unsafeuserparameters=1 for the first time when customizing monitoring
Add a line to the configuration file:
Userparameter=get_temp_cpu[*],sensors|grep "Physical ID $" |cut-c 17-20
The position of the string intercept is adjusted to the actual result returned by the system
Restarting the client
Create a new template, configure the monitoring item, and set the unit to ℃
Due to the difficulty of setting the discovery rule, there are four monitoring items created for the four-way server, and the two servers replicate a template deletion or disable the useless monitoring item in the host configuration.
To configure a trigger in a template
Add a trigger for four monitoring items, with thresholds customized based on history or personal experience
View Historical monitoring data
You can view the monitoring data after applying the template to the host for a period of time. As shown, there is a noticeable change in CPU temperature during stress testing on one of the servers.
Simple steps to monitor Linux physical server CPU temperature using Zabbix