In the past two days, a bunch of large clusters with 500 nodes have been configured, which has been plagued by the IPMI problem for a day and a half. At pm, the problem was finally solved. Here is a summary:
The intelligent platform management interface (IPMI) is an industrial standard used to manage peripheral devices in an Intel-based enterprise-level system, you can use IPMI to monitor the physical health characteristics of servers, such as temperature, voltage, fan operating status, and Power status. The standards are customized by companies such as Intel, Hewlett-Packard (Hewlett-Packard), NEC, Dell, and supermicro. The new version is ipmi2.0 (http://www.intel.com/design/servers/ipmi ).
First, let's talk about the prerequisites for using IPMI-to implement IPMI management for servers, it must meet the requirements of hardware, OS, management tools, and other aspects:
1. The server hardware itself provides support for IPMI
Currently, most servers, such as HP, Dell, and NEC, support IPMI 2.0, but not all servers. Therefore, you should first use the product manual or BIOS to determine whether the server supports IPMI, that is to say, the server must have BMC and other embedded management micro-controllers on the motherboard.
2. The operating system provides the corresponding IPMI driver
When the operating system monitors the server's IPMI information, the system kernel must provide corresponding support. The Linux system provides the IPMI system interface through Kernel support for openipmi (IPMI driver. Before using the driver, start the driver:
Service IPMI start
Or start the module:
Modprobe ipmi_msghandler
Modprobe ipmi_devintf
Modprobe ipmi_si
Modprobe ipmi_poweroff
Modprobe ipmi_watchdog
3. IPMI management tools
Our cluster selects IPMI platform management tool ipmitool in command line mode in Linux. If you do not have any comrades, go to http://ipmitool.sourceforge.net/to download the source file ......
The ipmitool Command needs to access BMC through the corresponding interface. When obtaining information locally, it adopts-I open, which is the openipmi interface. The ipmitool command contains open, lan, and lanplus interfaces.
Open refers to the communication between openipmi and BMC. The LAN communicates with BMC through the IPv4 UDP protocol over the Ethernet LAN. The UDP data segment contains the IPMI request/resoponse message. The message has an IPMI session header and RMCP header.
IPMI supports operating system shutdown (pre-OS and OS-absent) Using Remote Management Control Protocol (RMCP) Version 1. RMCP sends data to UDP port 623. Like LAN interfaces, lanplus uses the UDP protocol of Ethernet LAN to communicate with BMC, but lanplus uses RMCP + protocol (described in ipmiv20) to communicate with Iot platform, RMCP + allows the use of the modified authentication method and data integrity check. The open port is used by the local monitoring system. LAN/lanplus performs remote monitoring over the network.
Command for ipmitool local monitoring: ipmitool-I open command.-I open indicates that the openipmi interface is used. Command has the following items:
A) Raw: Send an original IPMI request and print the reply information.
B) LAN: configure the network (LAN) Channel)
C) Chassis: view the chassis status and configure the Power Supply
D) Event: Send a defined event to BMC to test whether the configured SNMP is successful.
E) MC: view the status and various allowed items of the MC (Management contollor ).
F) SDR: print any metric items in the sensor repository and the values read from the sensor.
G) sensor: Print detailed sensor information.
H) fru: print the built-in field replaceable unit (FRU) information.
I) SEL: Print System Event Log (SEL)
J) WordPress: Configure platform event filtering. The Event Filtering Platform filters events by using the policy in the WordPress when the monitoring system detects an event, and then checks whether an alarm is required.
K) SOL/isol: used to configure LAN monitoring through serial port
L) User: configure the user information in BMC.
M) channel: configure the management controller channel.
The ipmitool-I open sensor LIST command can be used to obtain various monitoring values in the sensor and the monitoring thresholds for this value, including (CPU temperature, voltage, fan speed, power modulation module temperature, power supply voltage and other information)
[Root @ oss11 chenys] # ipmitool-I open sensor list
Cpu1 temp | 0.000 | unspecified | OK | 0.000 | Na | 0.000 | 0.000 | Na
Cpu2 temp | 0.000 | unspecified | OK | 0.000 | Na | 0.000 | 0.000 | Na
System temp | 39.000 | degrees C | OK | 0.000 | 0.000 | 0.000 | 81.000 | 82.000 | 83.000
Cpu1 vcore | 1.048 | volts | OK | 0.808 | 0.816 | 0.824 | 1.384 | 1.392 | 1.400 |
Cpu2 vcore | 1.048 | volts | OK | 0.808 | 0.816 | 0.824 | 1.384 | 1.392 | 1.400
+ 5 v | 5.040 | volts | OK | 4.280 | 4.320 | 4.360 | 5.240 | 5.280 | 5.320 |
+ 12 V | 11.904 | volts | OK | 10.464 | 10.560 | 10.656 | 13.344 | 13.440 | 13.536 |
Cpu1dimm | 1.544 | volts | OK | 1.320 | 1.328 | 1.336 | 1.656 | 1.664 | 1.672 |
Cpu2dimm | 1.544 | volts | OK | 1.320 | 1.328 | 1.336 | 1.656 | 1.664 | 1.672 |
+ 1.5 V | 1.512 | volts | OK | 1.320 | 1.328 | 1.336 | 1.656 | 1.664 | 1.672 |
+ 3.3 V | 3.240 | volts | OK | 2.880 | 2.904 | 2.928 | 3.648 | 3.672 | 3.696 |
+ 3.3vsb | 3.336 | volts | OK | 2.880 | 2.904 | 2.928 | 3.648 | 3.672 | 3.696
Vbat | 3.336 | volts | OK | 2.880 | 2.904 | 2.928 | 3.648 | 3.672 | 3.696 |
Fan1 | 7072.000 | RPM | OK | 340.000 | 408.000 | 476.000 | 17204.000 | 17272.000 | 17340.000 |
Fan2 | 7072.000 | RPM | OK | 340.000 | 408.000 | 476.000 | 17204.000 | 17272.000 | 17340.000 |
Fan3 | Na | RPM | Na | 340.000 | 408.000 | 476.000 | 17204.000 | 17272.000 | 17340.000
Fan4 | Na | RPM | Na | 340.000 | 408.000 | 476.000 | 17204.000 | 17272.000 | 17340.000
PS status | 0.000 | unspecified | OK | 0.000 | Na | 0.000 | 0.000 | Na
Ipmitool-I open sensor get "cpu1 Temp" can obtain the CPU 1 temp monitoring value, CPU 1 temp is the Sensor ID, the server is different, the ID is also different.
[Root @ oss11 chenys] # ipmitool-I open sensor get "cpu1 Temp"
Locating sensor record...
Sensor ID: cpu1 temp (0x1)
Entity ID: 7.1
Sensor Type (analog): Unknown (0xc0)
Sensor reading: 0 (+/-0) Unspecified
Status: OK
Lower non-recoverable: 0.000
Lower critical: Na
Lower non-critical: 0.000
Upper non-critical: 0.000
Upper critical: Na
Upper non-recoverable: Na
Assertion events:
Assertions enabled: LCR-
Deassertions enabled: LCR-
The specific commands are checked by yourself. Here we only talk about the key:
The ipmitool-I open sensor thresh configuration id value is equal to the various restrictions of the ID monitoring item.
Ipmitool-I open chassis status to view the status of the motherboard, including the motherboard power supply information and the working status of the motherboard.
Ipmitool-I open chassis restart_cause
Ipmitool-I open chassis policy list to view the supported chassis power policies.
The ipmitool-I open chassis power on command is used to start the chassis remotely.
Ipmitool-I open chassis power off to close the chassis. Use this command to remotely shut down the chassis.
Ipmitool-I open chassis power reset to implement hard restart. You can use this command to remotely restart
Ipmitool-I open Mc reset to restart BMC
Ipmitool-I open Mc info view BMC hardware information
Ipmitool-I open Mc setenables = [ON | off] and configure the allowed/prohibited options for BMC.
Ipmitool-I open Mc getenables list any options allowed by BMC
Ipmitool-I open LAN print 1 prints the information of Channel 1. This channel is too important, and the problem that troubles me also lies in it. here we will explain in detail:
Remote Server Monitoring Information Retrieval
To remotely obtain server monitoring information, the system hardware must support ipmiv1.5 and ipmiv2.0. When obtaining information, you do not need to install other software on the server. You only need to install IPMI tool software-ipmitool on the monitored client, add the name or address of the remote server to the corresponding command. Ipmitool can use the LAN remote monitoring system. At the same time, the BMC stores a sequence of user names and passwords. for remote access through the LAN, the user name and password are required.
When remotely obtaining server monitoring information, you must add the address of the remote server. Use the following command format:
Ipmitool-H 10.6.77.249-u root-P changeme-I LAN command.
-H indicates the server address followed by the user name,-u indicates that the user name followed by the user password, and-P indicates that the command is the same as the local retrieval information.
So, how to set the IP address and password of the local BMC:
Ipmitool-I open LAN print 1 displays information about the BMC channel. If you do not know which channel the BMC uses, run the following command to confirm:
Ipmitool-I open channel info 1
Ipmitool-I open LAN set 1 ipsrc static IP address can be set only when the local BMC address is set to static
Ipmitool-I open LAN set 1 ipaddr 10.53.11.113 set the local BMC IP Address
Ipmitool-I open LAN set 1 netmask mask 255.255.0 subnet mask, do not forget to set
Ipmitool-I open LAN set 1 defgw ipaddr 10.53.11.254 gateway, which can be set but must be monitored on the same route
Ipmitool User List 1: view the BMC user list
Ipmitool user set name 1 username set username for user 1 of BMC
Ipmitool user set password 1 123456 set password 123456 for BMC user 1
The following describes my problems today:
I have configured IPMI address, subnet mask, user name, password, and so on the monitored end, but cannot connect to the monitored end, and the following information is returned:
Error: unable to establish LAN Session
Get device id command failed
After half a day, I found a flaw in the MAC address:
[Root @ localhost ~] # Ipmitool-I open LAN print 1
Set in progress: Set complete
Auth type support: None md2 MD5 OEM
Auth type enable: callback: None md2 MD5 OEM
: User: None md2 MD5 OEM
: Operator: None md2 MD5 OEM
: Admin: None md2 MD5 OEM
: OEM:
IP Address Source: DHCP address
IP Address: 10.53.11.61
Subnet Mask: 255.255.255.0
MAC address: 00: 30: 48: C9: 61: 60
SNMP community string: Ami
IP header: TTL = 0x00 flags = 0x00 precedence = 0x00 TOS = 0x00
Bmc arp control: ARP responses enabled, gratuitous ARP disabled
Gratituous ARP intrvl: 0.0 seconds
Default gateway IP: 10.53.11.254
Default Gateway Mac: 00: 00: 00: 00: 00: 00: 00
Backup gateway IP: 0.0.0.0
Backup gateway Mac: 00: 00: 00: 00: 00: 00: 00
802.1Q vlan id: Disabled
802.1Q VLAN priority: 0
RMCP + cipher suites: 1, 2, 3, 6, 7, 8, 11, 12, 0
Cipher Suite priv MAX: aaaaxxaaaxxaaxx
: X = cipher suite unused
: C = callback
: U = user
: O = Operator
: A = Admin
: O = OEM
Therefore, you have to add ARP resolution for this MAC address to the monitoring end:
ARP-s 10.53.11.28 00: 30: 48: C9: 61: 60