Today a bit of time, suddenly think of the new company, the server's hardware has a failure, and now write down to give friends a little experience to share

I remember the time when the people who were in my post were leaving, the manager took me to the computer room familiar with the environment, by the way to replace the backup tapes (backup tapes have to be replaced manually, I dizzy AH), because into the room need to access documents, I do not have documents, it needs to wait for the internal personnel to receive, this time also close to work The manager said: "Xiao Liu Ah, you wait for me now, I go in first look, no matter will not go in, next time, I said OK."

After about 20 minutes the manager came out of the engine room, said Xiao Liu I look at our database server front panel LED flashing yellow warning, LED display English letter "BP Driver 0" at the same time have a hard drive small light also flashing yellow, you go back to find out what is the reason?

Sitting in the old, smoky car of the manager, I was thinking about the reason. No clue. Back home Google A bit, originally this may be related to the hard disk, that driver 0 is not the hard disk in the BIOS ID number? Is there a problem? The server uses a RAID5 array that has broken a hard drive and can continue to serve. (The server's hardware is compared to the old Powerage 850 's machine)

The next morning came to the office early, the Dell after the After-sale technical support, explained the situation, Dell's technology tells me that it is the array of a hard drive out of the line (it is really a hard drive problem), you pull down the hard drive in the plug on it,

I'm relieved to have the manager come and talk to him about the situation and see when to deal with the problem.

The manager is a very cautious person, never willing to risk doing things, he understood this situation after said wait in processing it, the current business volume is very large, that library every day in the processing business, find a time to deal with it, (I fainted) had to do so, about 2 weeks, or no problem, I mentioned this in a summary meeting, the manager said, it is not the way to drag down the old, decided to go to the computer room in the morning of Saturday to deal with.

Friday informed the business unit we need downtime maintenance system, everything ready to do. Saturday about to meet in the engine room.

Saturday I prepare the relevant tools and materials early came to the computer room, find the monitor, mouse keyboard and so on. (Computer room technology do not know what to do, want to do everything themselves), connect the monitor, keyboard, mouse, login system, look at the yellow warning, the psychological silently praying, do not work: In accordance with the previous and energy to negotiate a good process, first back up the database data, close the database, perform hot-swappable hard drive. Bad, didn't bring the password, in a cold sweat (so careless), call the manager.

The phone is connected: lead you to the computer room Ah, I forgot to bring the password! The manager shouted on the phone, "Why don't you bring anything?" I hate myself so careless?

Wait for me to come over! Good! Hang up the phone I'm waiting for the manager to come

In a moment the leader came, have scolded me a meal, later do things more careful ah, don't that careless! I promised.

According to the previous process to log in first system to complete the database, using the Rman physical backup, backup time is longer, wait, the manager outside to go out smoking, I wait for the backup to complete, about 20 minutes after the backup completed.

Okay, step two, close the database.

The third step, the friend problem of that piece of hard drive from the front panel unplugged in the Insert (server hard drive support hot-swappable). At this time the front panel LED indicator screen flashing into a normal blue, the error prompted to disappear.

There is no problem at this time, in order to further confirm that there is no worries, we will still reboot the system, after the start of all normal

The fourth step, the data instance to start, no errors, login site Test No problem, this time to solve this problem.

This encounter of the hard drive off the line, according to my experience should be the long-term operation of the server, chassis resonance caused the hard disk interface loosening, or room temperature difference caused by hardware thermal expansion. Of course, also do not exclude man-made reasons.

To solve this problem, I summed up:

Any failure to find the cause of the phenomenon before, and then consult the relevant hardware vendors after the sale of engineers, they received the failure to repair is the most, will soon give you a direction, pointed out the reason for the problem.

There is a personal problem, can not be careless, to the computer room maintenance, to the possible things have come to mind.

