Compared with a PC, the probability of a server failure is much lower, but the loss caused by the failure is also large. As a server maintenance personnel, they need to know some basic knowledge about server troubleshooting, know what they can do during maintenance to solve problems most quickly, and reduce the downtime of faults.
This document is not a complete manual for server troubleshooting. However, if you carefully follow the steps below for maintenance, it may solve most of the problems, but when you do all this, you don't need to worry about it. You can find a repair expert. You can rest assured that these repair steps will not cause major damage, the worst case is "It does not work at all ".
This article mainly consists of three parts. The first part is about the basic principle of server troubleshooting. The second part describes some server troubleshooting hardware instances. The third part describes some server troubleshooting software instances.
Server troubleshooting Part 1 Basic Principles of server troubleshooting
Server troubleshooting I. What should I do if the server is not displayed at startup?
1. Check the power supply environment, zero-fire, zero-ground voltage?
2. Check the power indicator. If it is on, is it normal?
3. When the power switch is pressed, is the indicator on the keyboard on? Do all the fans rotate?
4. Whether the monitor has been replaced or not.
5. Remove the added memory.
6. Remove the added CPU
7. Remove the added third-party I/O Card
8. Check whether the memory and CPU plug-in are reliable.
9. Clear CMOS
10. replace Major spare parts, such as the system board, memory, and CPU
Server troubleshooting II. What are the basic principles of server troubleshooting?
1. Try to restore the default system configuration
A: hardware configuration: Remove spare parts and non-standard spare parts from third-party manufacturers;
B: resource configuration: Clear CMOS and restore initial resource configuration;
C: BIOS, F/W, Driver: Upgrade the latest BIOS, F/W and related drivers;
D: TPL: is the extended third-party I/O Card part of the model's hardware compatibility list TPL?
2. from basic to complex
A: from the individual to the network in the system: first, the faulty server runs independently. After the test is normal, the faulty server is connected to the network for operation. Observe and handle the fault.
B: From the minimum system to the real system on the hardware: refers to the gradual process from the hardware that can run to the real system.
C: software from the basic system to the real system: refers to the process from the basic operating system to the real system.
3. Exchange comparison
A: Switch components with simple operations and obvious effects under the conditions where the maximum possibility is the same;
B: exchange the NOS carrier, not only the software environment;
C: Exchange hardware, that is, exchange the hardware environment;
D: swap the entire machine, both switching the overall environment;
Server troubleshooting III. What information does server troubleshooting need to collect?
Server Information:
1. Machine Model
2. machine serial number (S/N: for example, NC00075534)
3. Bios version
4. whether to add other devices, such as NICs, SCSI cards, memory, and CPU
5. How to configure the hard disk, whether to configure the array, or not?
6. What operating systems and versions are installed, such as Winnt 4, Netware, Sco, and others)
· Fault information
1. Exception information displayed on the screen during POST
2. What is the status of the server indicator?
3. Alarm and BEEP CODES
4. What is the NOS event record file?
5. Events Log File
· Determine the fault type and fault symptom:
1. No display is displayed at startup;
2. faults in the power-on self-check phase;
3. installation phase faults and symptoms;
4. Failed to load the operating system;
5. System Operation faults;