I found that my own articles seem like Scientific and Technical tips. They don't have much technical content, and it's not very difficult to read them. They may even have no taste like boiled cabbage with clear water. However, these problems in the article are indeed encountered at work. They can be solved only after countless google or consult with colleagues. Some people may encounter the pain they have experienced. Therefore, to achieve the "Chinese Dream" as soon as possible, we should record these problems in any case and hope to help others. I believe that one day the pig will fly.
I used to ask for help on ChinaUnix. Maybe the layout was incorrect. No answer was answered in a few months. Later, the boss helped the driver department and quickly got an answer.
The problem is very simple: we need to monitor the x86-64 Linux Server Power status, to see whether the two power supply is working normally, if one is broken, to send an alarm.
Remember to check the Linux operation documentation at the beginning and find a command or read the power status directly from a file, I even compared the linux directory before and after the power plugging, but there was no clue when I looked for it for a long time. Why did I check CPU and memory for commands, check the power supply and there will be no command? Later, I found that this may not exist!
I don't know the driver, I don't know the hardware, and I can't use the scripts written by others. I try to understand it. The following is your understanding. Please refer to Hai Han for errors.
The first thing we can understand is that this solution is not universal, and the power status varies with hardware, so different servers may have different methods.
The hardware we use uses the I2C bus... What? What is I2C? That's great. Just like me, I will translate some overseas documents below.
Information Source: http://www.esacademy.com/en/library/technical-articles-and-documents/miscellaneous/i2c-bus.html
I2C bus protocol
The I2C bus is physically composed of two active wires and one ground wire. Active line, called SDA and SCL respectively, both work in two directions. SDA is a serial data line, and the SCL is a serial clock line.
Each device connected to the bus has its own address, whether it is MCU, LCD driver, memory, or ASIC. Each chip can be either the receiver or the sender, or both, depending on their functions. Obviously, the LCD driver is only a receiver, and the memory or I/O chip can be both a sender or receiver.
I2C bus is a multi-master bus. That is to say, more than one IC that can initiate data transmission commands can be connected to it. According to the I2C specification, the IC Initiating data transmission can be considered as the Master of the bus, and all other ICs are considered as the Slave of the bus.
Because the bus master is usually a microcontroller, let's take a look at a common "Inter-IC conversation ". Consider the following settings. Assume that the MCU wants to send data to a server Load balancer instance.
First, the MCU sends a START status. This is like sending a "positive" command to all connected devices. All the ICS on the bus will listen for the coming data.
Then, the MCU sends an ADDRESS command that it wants to access, containing both read and write information (assuming it is write ). After receiving this address, all the ICS will compare it with their own addresses. If they do not match, they simply wait until the bus is released after the STOP status is sent; however, if it is a match, the chip will send a ACKNOWLEDGE signal to respond.
Once the MCU receives ACKNOWLEDGE, it starts to send and receive data. In our example, the MCU will send data. After this is done, the MCU sends a STOP status. This signal indicates that the bus has been released, and the IC can wait for the next transmission request.
The example contains several commands: START, ADDRESS, ACKNOWLEDGE, DATA, and STOP. These are unique on the bus. Before we want to further explore the bus status, we need to know a little bit about the physical structure and the hardware of the bus.
==================== Gorgeous split line ==============================
Linux I2C tools were previously part of lm-sensors and have now been split into independent software packages. The reason is that not all hardware monitoring chips are I2C devices and I2C devices are not Hardware Monitoring chips, so it is not reasonable to put everything together. Because the I2C tool is implemented in the kernel, It is supported in all Linux versions.
Our server may have multiple I2C buses, and the power controller is connected to one of them. After hot power plugging, the status information will be stored in the Power Controller. What we need to do is to find the address of the Power Controller through the I2C bus and read the data. This is done by the i2cget command.
I2cget [-y] i2cbus chip-address [data-address [mode]
-Y to enable the interactive mode. This parameter must be added when the script is used to avoid confirmation of information. The i2cbus parameter indicates the I2C bus number to be scanned, which must be in the i2cdetect-l list; chip-address specifies the address of the chip on the bus, ranging from 0x03 to 0x77; data-address specifies the address of the data to be read from the chip, which ranges from 0x00 to 0 x ff. If mode is specified, the read/write mode is used. If this parameter is not specified, the read-only mode is used by default.
Script snippet:
# Check for i2c-toolsif [ ! -f /usr/sbin/i2cget ] ; then echo -e "\nError: i2c-tools are required for this monitoring tool." exitfi# Check for lm_sensors# Load i2c_dev modules if not loadedif [ "`lsmod | grep -c i2c_dev`" == 0 ] ; then modprobe i2c_dev 2> /dev/nullfi# Query for PS modules, 250w PS registered at i2c address 0x20ps_status=`i2cget -y 0 0x20 0x00 2> /dev/null`if [ "$ps_status" == "" ] ; then echo -e "\nError: PS module not responding, check i2c/lm_sensors setup." exitfiecho -e "\nSystem Power Supply Status:"echo -e "---------------------------\n"# Parse PS1 Bottom power supply module status# Bit 0 = PS1 FAN status (0 = normal, 1 = fail)let "ps_fan=$ps_status&1"# Bit 1 = PS1 OTP (0 = normal, 1 = Over 55C, shuts off at 65C)let "ps_otp=$ps_status&2"# Bit 2 = PS1 PG (1 = Power good, 0 = Power good failed)let "ps_pg=$ps_status&4"# Bit 3 = PS1 Present (0 = PS detected on backplane, 1 = PS not present)let "ps_present=$ps_status&8"# Print PS1 resultsif [ "$ps_present" == "0" ] ; then echo -en "PS1 BTM MODULE:\tPRESENT\nPS1 FAN STATUS:\t" if [ "$ps_fan" == "0" ] ; then echo "GOOD" else echo "FAILED" fi if [ "$ps_otp" == "0" ] ; then echo -e "PS1 OTP:\tGOOD (<55C)" else echo -e "PS1 OTP:\tFAILED (>55C)" fi if [ "$ps_pg" == "0" ] ; then echo -e "PS1 POWER:\tFAILED" else echo -e "PS1 POWER:\tGOOD" fielse echo -e "PS1 BTM MODULE:\tNOT DETECTED"fi# Parse PS2 Top power supply module status# Bit 4 = PS2 FAN status (0 = normal, 1 = fail)let "ps_fan=$ps_status&16"# Bit 5 = PS2 OTP (0 = normal, 1 = Over 55C, shuts off at 65C)let "ps_otp=$ps_status&32"# Bit 6 = PS2 PG (1 = Power good, 0 = Power good failed)let "ps_pg=$ps_status&64"# Bit 7 = PS2 Present (0 = PS detected on backplane, 1 = PS not present)let "ps_present=$ps_status&128"# Print PS2 resultsif [ "$ps_present" == "0" ] ; then echo -en "\nPS2 TOP MODULE:\tPRESENT\nPS2 FAN STATUS:\t" if [ "$ps_fan" == "0" ] ; then echo "GOOD" else echo "FAILED" fi if [ "$ps_otp" == "0" ] ; then echo -e "PS2 OTP:\tGOOD (<55C)" else echo -e "PS2 OTP:\tFAILED (>55C)" fi if [ "$ps_pg" == "0" ] ; then echo -e "PS2 POWER:\tFAILED" else echo -e "PS2 POWER:\tGOOD" fielse echo -e "\nPS2 TOP MODULE:\tNOT DETECTED"fi
It is easy to send an alarm when the power status is taken out.