Server software faults occupy the highest proportion of server faults, accounting for about 70%. The process of solving them must be considered more carefully. There are many causes of software faults on the server, the most common is that the BIOS version of the server is too low, the management software of the server or the driver of the server has bugs, application conflicts, and man-made software faults. The following examples illustrate the maintenance methods for various software faults.
There is an HP LH6000R server configured with dual piii xeon 700 with 2 m high-speed cache CPU and M memory. After the system starts up, the system logs the error of the Voltage adjustment Module Exception VRM. The error message is "voltage Regulator Module (VRM) over/under-Voltage 2.88 V/0V ". On the surface, it is very likely that the voltage regulating module of the server or other hardware is faulty, which can easily lead to hardware faults that maintenance personnel think. The maintenance personnel immediately used hardware on other LH6000R for testing and found that the server still reported VRM error even if new accessories were used. At the end of the hard work, the Maintenance Engineer brought the FIRMWARE of the latest CPU Management board CPU Management Control), so after upgrading the FIRMWARE of the CPU Management Board, the server immediately went back to normal.
The FIRMWARE upgrade method is to extract the CPU Management Board CMC from the NAVIGATOR navigation disc of the server) the FIRMWARE refresh program, the program is FLASH.. BINCPU Management Board FIRMWARE) copy to a DOS boot disk, use this disk to start the server. Run "FLASH/cmc a: LH6KC. BIN" in DOS and restart the server. This upgrade method is also suitable for refreshing the system BIOS, but the FLASH command parameters are different and FIRMWARE and BIOS file names are updated differently. For parameters, see server instructions.
FIRMWARE and BIOS of any server have different bugs, because the BUG is inevitable, so we cannot mistakenly think that the BIOS program of the server is perfect, the FIRMWARE and BIOS of the server should be updated frequently, but be cautious before the upgrade. Wrong upgrade methods may cause serious consequences.
Currently, popular medium-and high-end servers have powerful management programs, providing customers with convenient management channels. servers also have drivers in various operating systems, this makes it easier for customers to use in various operating systems. However, any program in the world will have some bugs that will affect user usage. However, server vendors always develop new programs in the first time. Customers only need to update these programs in time to avoid such failures.
When software faults on the server are similar, the performance is also different. In general, management program bugs will slow down the system, increase CPU usage, and fail to use certain functions. Driver bugs will cause crashes and conflicts with some software, the disk is unstable. The best way to check whether a management program has an error is to first disable such management tools in the system, and then observe whether the server is still abnormal. Because the management tool is started with the system, it should be avoided first. Take WINDOWS NT4 as an example. First, disable some server software services in the Administrative Tools service, and then modify the startup entry in the registry. If there is a problem with the driver, enter the system in safe mode to check whether it is normal. However, it should be noted that in security mode, the system speed slows down normally, especially in disk I/O ).
Server administrators should often download the latest management tool programs and drivers from the server website. This will reduce the occurrence of a large part of software faults.
In contrast, it is difficult to judge the faults caused by software conflicts. The management personnel must have a wealth of experience and keen observation.
A friend once told me that he was unable to Install SQL SERVER 2000 on a inspur SERVER and had already reinstalled NT N times, eliminating system faults. The only server will be a very important database server, so it is very anxious. So I accompanied my friend to his company. This server is located in a very standard and complete data center. I checked the server and found that there was no hardware fault, so I ruled out the possibility of poor drive reading power. However, my friend engraved the SQL server 2000 disc caused my doubts. I asked him to take out the genuine SQL SERVER installation, but the result still failed. During the installation process, there was no error at all, but it was automatically exited during running without any prompt. However, I found a message in the system log of the Event Viewer in the management tool: windata.exe causes an invalid data overflow. Windata is a program compiled by a friend and started with the operating system. After I finish the process immediately, I can run SQL again.
For such software faults, the operator should first check the relevant logs to see if there are any suspicious processes in the system. Currently, high-end servers and low-end servers provide reliable support for standard programs such as SQL. Therefore, the focus of troubleshooting is to end suspicious processes.
Another kind of software failure is caused by human factors. It is generally caused by human misoperations, including operations that do not follow the operation process), unexpected shutdown, including sudden power supply, or abnormal shutdown of applications.
Human error factors can be prevented by strengthening management. Here we will describe in detail how to cause a fault caused by unexpected shutdown or abnormal shutdown procedures.
It is very important to close the system program normally, especially the WEB server. One of my friends experienced data corruption or even loss because the system program was not properly closed. My friend is using the HP web hosting server appliance, So I provided him with some usage rules.
These methods are very effective for the maintenance of the server, mainly including the Proper shutdown of system programs, how to avoid data loss, and recovery methods after the abnormal shutdown of the system. The following uses my friend's HP web hosting server appliance as an example to use UNIX, but the idea is effective for other operating systems ).