HP array card troubleshooting example
The company uses HP gen8 and p420i array cards. At the other end of the system, the nagios monitoring system works with the nrpe script check_hpasm to regularly check hardware health.
Recently, in order to make the machine more energetic, coupled with SSD hard drives, the mechanical hard drive is only used for large-capacity storage, while feeling the speed and passion, the tragedy comes one after another.
Nagios monitoring alarm, hardware error:
CRITICAL-da controller 1 in slot 1 needs attention, System: 'liant dl3x0e gen8', S/N: 'cn74xxxxxx', ROM: 'p73 12/20/2013'
Performance Data: pc_1 = 65 fan_5 = 27% fan_6 = 27% fan_7 = 27% fan_8 = 27% temp_1_ambient = 23; 42; 42 temp_2_cpu #1 = 40; 70; 70 temp_4_memory_bd = 24; 87; 87 bytes = 25; 80; 80 bytes = 26; 80; 80 temp_8_memory_bd = 26; 80; 80 temp_9_memory_bd = 25; 80; 80 temp_10_memory_bd = 25; 80; 80 temp_11_memory_bd = 26; 80; 80 temp_12_system_bd = 35; 60; 60 temp_13_system_bd = 44; 105; 105 temp_14_system_bd = 33; 95; 95; temp_17_power_supply_bay = 26; 80; 80 bytes = 25; 80; 80 bytes = 25; 110; 110 temp_20_system_bd = 21; 110; 110 temp_21_system_bd = 24; 110; 110 temp_22_system_bd = 26; 110; 110 temp_23_system_bd = 21; 65; 65 temp_26_system_bd = 35; 100; 100 temp_28_system_bd = 28; 90; 90 temp_29_ I/o_zone = 85; 100; 100 temp_31_ I/o_zone = 32; 80; 80 temp_32_ I/o_zone = 25; 80; 80 temp_33_system_bd = 32; 80; 80 temp_34_system_bd = 30; 80; 80 temp_35_system_bd = 30; 80; 80 temp_36_system_bd = 31; 80; 80 temp_37_system_bd = 29; 80; 80; 80
Because the alarm content is vague, like a ***, it does not dare to officially put the machine into operation and has to be transported from the machine room for careful testing. It seems to be a hardware fault, but after replacing the array card or even re-installing the operating system for N times, the same error is still reported. If SSD is not used and only a common hard disk is used, no error is reported.
The final problem was found to be the hp ssd smart path function in the HP array card. This function can accelerate SSD disk read and write, a bit of mixed hard disk taste, that is, as a mechanical hard disk cache. However, if you use an SSD hard disk to install the operating system, the above error will be reported.
Solution:
Yum install http://downloads.linux.hp.com/SDR/downloads/MCP/CentOS/7/x86_64/10.0/hpssacli-2.0-22.0.x86_64.rpm http://downloads.linux.hp.com/SDR/downloads/MCP/CentOS/7/x86_64/10.0/hpssa-2.0-22.0.x86_64.rpm-y
/Usr/sbin/hpssacli controller slot = 1 array a modify ssdsmartpath = disable
Hpssacli-2.0-22.0.x86_64.rpm, tested, this package centos 6/7 is applicable.
Network Monitor Nagios Overview
Nagios construction and Configuration
Build a Nagios monitoring platform in the Nginx Environment
Configure the basic Nagios System on RHEL5.3 (using Nagios-3.1.2)
CentOS 5.5 + Nginx + Nagios monitoring and control terminal installation and Configuration Guide
Install Nagios Core for Ubuntu 13.10 Server
Nagios details: click here
Nagios: click here
This article permanently updates the link address: