Production accident: Delete/lib64 by mistake, remove/lib64 catalogue

Source: Internet
Author: User
Tags install openssl

Accident background:

One machine is not available for Nagios monitoring, yum install OpenSSL reports a bug about "libkrb5.so.3" conflicts.

Resolution process:

1./lib64 Accident

About the "libkrb5.so.3" conflict error, check out some articles did not solve, think of the libkrb5 unloading, rpm-e libkrb5.rpm, uninstall has associated conflict, and then Rpm-e libkrb5.rpm--nodeps (it turns out, If you do not know the dependencies of the software, it is best not to "--nodeps"), a uninstall found the problem, found that the Yum command can not be used, prompted the lack of "libkrb5.so.3", Then I copied a libkrb5.so.3 from the other machine to the machine, and Yum continued to hint for fewer files, and the experience told me that there might be a lot of other library files missing, because it was a production machine that I didn't want to spend too much time on, so I wanted to copy it from another machine. Lib64 to this machine, want to think about, hand involuntarily knocked a "mv/lib64/tmp", knock finish I regret, hurriedly "LS", found not use, and then found only CD command can use, the other are not used, this time point is about 15:00, tell the truth, a little panic, Because the production has not encountered this kind of thing, on the virtual machine tried "RM-RF/" can delete, but also did not recover.

2. Simulate and handle /lib64 accidents

Thought for a minute, decided to simulate this accident on the virtual machine, open the virtual machine, then mv/lib64/tmp, reboot, go into the system, has been stuck in the boot interface. Then began to search "mistakenly deleted/lib64" related articles, almost no useful, probably because the accident is relatively small, no one in production will do so, the experiment may not do this. Did not search "mistakenly deleted/lib64" article, but there is an article "Linux Repair Mode" reminds me, thought: I can advanced the system, and then the/tmp directory lib64 copy back/root directory, so that does not solve the problem.

Then I tried on the virtual machine, into the Linux repair mode, found that the file in the/tmp directory was deleted, and then think of the file in the/tmp directory is not automatically deleted after reboot, and then I put the CD-ROM mirror system/lib64 Copy to the crash system root directory, restart the system, still no , it is possible that the system that is still missing from the library file is not up.

Very anxious very worried, because now I mistakenly think/tmp/lib64 this directory after rebooting the system was deleted, and the mirror in the/lib64 copy to the crash system is not working, no way, I can only think of the system in the export of things, this is a publishing machine, The colleagues in Shenzhen are dedicated to releasing code, which happens to be used that night.

About 17:00, I contacted the Shenzhen colleague: Brother Yong, the release of the machine was I blew up, want to ask your important documents is not centralized storage where, I try to copy out. After the contact, I basically have been ready to copy the files out.

17:30, the brain suddenly thought, entered the Linux repair mode, the crash system's TMP directory is not/tmp, but/mnt/sysimage/tmp, so I went into/mnt/sysimage/tmp, found lib64 directory is still in, then I am MV/ mnt/sysimage/tmp/lib64/mnt/sysimage/, reboot the system and enter normally. Kai Sen.

3. Handling/lib64 Accidents in production

Then I uploaded an ISO image on the vcenter storage (there is no image on the storage, our new machine is walking cobbler), hateful, it took half an hour to upload the image. It is already 18:30, I am how nervous, 20:00 will use this machine, the manager did not know ... Then set the BIOS boot CD-ROM, select storage, the first two because the wrong storage led to no CD-ROM boot, and then choose the storage also not go into, tried three or four times are not going to go, I panicked, what is the problem, is there a problem with the mirror? No reason, copy no error. I check, check, check, and then a person a word to wake Me up: CD-ROM start, if you do not tick "boot start", then do not look down. Suddenly thought, I chose the mirror but did not check the "boot automatically mount", can not get into it, really busy in error. Everything has been handled, entered the system, and then mv/mnt/sysimage/tmp/lib64/mnt/sysimage/, restart, normal access to the system, but there are new problems: the password is clearly correct, but the prompt password is not correct.

Panic among the measures to take the network management, restart, restart after the problem is still, at this time is 19:00, the manager is gone, most colleagues are gone, no one knows I am dealing with an online accident ... To tell the truth, at this time I am afraid, I am afraid of a step wrong, but I analyzed since the system is in, single-user mode should be no problem, change the password it, and then the system password changed to 123456, restart, the normal entry system, found everything is normal, no problem ... Then I remotely connect this machine, prompt timeout, back to VMware, found that the system's SSHD service is not up, check the service is boot start, and then I started manually, the hint less "libkrb5.so.3", Equal to this libkrb5.so.3 library file not only affect the Yum, also affect the SSH service, and then I went into the Linux repair mode, the image of the libkrb5.so.3 copy to the system, enter the system after the SSHD service normal, remote connection is OK, but there are new problems, can only use the root user , cutting ordinary users to die, what is the problem? Is the Fortress Machine Sequela (company use fortress machine, after accident I put this machine from the fortress machine up and down)? Then I emptied the sshd.config about the fortress machine, or not. If you can only connect can not cut users, Shenzhen, the user posted code there is still a problem, so this problem must be solved, but no ideas ah.

4. Complete recovery

You can not do without thinking, at least the ability to dry first. Add this machine to the fortress machine, and then again, have to say that the pay is good, do not know what the principle, the machine added to the fortress machine, you can normally cut users, this time is already more than 20:00 (code for some reasons delayed release), then I hurriedly contact Shenzhen users, let him see normal, He returned to normal, can be counted as a sigh of relief ...


Similar events:

Soon after the accident, there was another database because the system failed to replace the disk, do the RAID10, replaced a disk, and then the system collapsed (after analysis is a RAID card failure caused). Mount the image, enter the Linux repair mode, the system has five or six partitions, do not know the problem in which partition, and then mount, found that the/boot partition can not be mounted, into the/boot partition, found to be empty, that is,/boot partition file is missing, check the data, said the reload kernel can solve , and then re-install the kernel, the fact found that no, and then from another machine copy/boot partition content to the crashed machine, restart the machine, not like before directly into the grub interface, but after reading the system progress bar after the card is dead, visible or there is a problem. 3M high-availability applications deployed on this database, in order to solve this problem quickly, the reinstall system was selected and the 3M application was redeployed.


Here's a reminder:

1, production operation although the need for hand speed, but enter don't rush to knock.

2, try not to use the RM command, with MV replacement.

3, not sure to handle the accident, in the treatment of a period of time after the best report, otherwise it will be particularly embarrassing, do not report it may not be handled well, report it and feel so late to report a little 2 force.


Linux into Repair mode:

what the rescue model does:

You can change the root password;

Recover hard disk, file system operation;

When the system starts, it can only be started by rescue mode;

The steps to start the rescue mode are as follows:

1, the first boot into the BIOS settings (each computer into the bios of different methods according to their own computer access), boot boot sequence for CD-ROM priority to start CD drive using the keypad of the +-number adjustment up and down order;

If it is a VMware workstation, you can set the BIOS by "virtual machine → power supply → boot into firmware";

If it is a physical machine, direct F1 F2 F12 something into the BIOS, each has a different, look at the hint;

If it is Exsi, right-click the virtual machine, click Edit, Mount the image first, and then modify the boot to the BIOS interface.

650) this.width=650; "src=" Http://www.linuxidc.com/upload/2015_03/15031312026924.jpg "alt=" Linux into rescue mode "style=" border:0px; "/>

2, restart the system after entering the installation boot menu, the up and down keys moved to the Rescue Install system rescue installation systems;

650) this.width=650; "src=" Http://s5.51cto.com/wyfs02/M02/8B/2E/wKiom1hGajnwkW2rAAI0TThGrF4937.png "title=" capture. PNG "alt=" Wkiom1hgajnwkw2raai0tthgrf4937.png "/>

3. Select the language and keep the default 中文版

650) this.width=650; "Src=" Http://s3.51cto.com/wyfs02/M00/8B/2E/wKiom1hGalmQWGFAAAC2C0AZraU241.png-wh_500x0-wm_3 -wmp_4-s_2071952290.png "title=" capture. PNG "alt=" Wkiom1hgalmqwgfaaac2c0azrau241.png-wh_50 "/>

4, select the keyboard type, keep the default us

650) this.width=650; "Src=" Http://s2.51cto.com/wyfs02/M01/8B/2E/wKiom1hGan3hulQwAAB-OE5q4BM974.png-wh_500x0-wm_3 -wmp_4-s_1936671190.png "title=" capture. PNG "alt=" Wkiom1hgan3hulqwaab-oe5q4bm974.png-wh_50 "/>

5, whether to start the network, you need to choose according to your actual situation, if you need to copy data through the network, select Yes, here we choose No;

650) this.width=650; "src=" Http://s2.51cto.com/wyfs02/M02/8B/2E/wKiom1hGapnxkFTCAAB0f9AmDPM289.png "title=" capture. PNG "alt=" Wkiom1hgapnxkftcaab0f9amdpm289.png "/>

6, enter into the rescue interface, select Continue

650) this.width=650; "Src=" Http://s1.51cto.com/wyfs02/M02/8B/2B/wKioL1hGasCBFGXbAAD8ZbDMVLE519.png-wh_500x0-wm_3 -wmp_4-s_1594826663.png "title=" capture. PNG "alt=" Wkiol1hgascbfgxbaad8zbdmvle519.png-wh_50 "/>

7, the local system is mounted under/mnt/sysimage if you want to run the Chroot/mnt/sysimage command in the root environment

650) this.width=650; "Src=" Http://s1.51cto.com/wyfs02/M00/8B/2F/wKiom1hGau6ydB-NAAC8F68zqUY436.png-wh_500x0-wm_3 -wmp_4-s_3979397275.png "title=" capture. PNG "alt=" Wkiom1hgau6ydb-naac8f68zquy436.png-wh_50 "/>

8, three kinds of options: Shell into the command line mode; FAKD is the diagnostic mode; reboot restart the computer; we choose the shell here.

650) this.width=650; "Src=" Http://s3.51cto.com/wyfs02/M00/8B/2B/wKioL1hGawqiF_NrAABIIFifYE0078.png-wh_500x0-wm_3 -wmp_4-s_3882761102.png "title=" capture. PNG "alt=" Wkiol1hgawqif_nraabiififye0078.png-wh_50 "/>

9. Enter the shell command line and the prompt is bash-4.1#

ls/mnt/sysimage/displays files that are mounted as root directories

Execute chroot/mnt/sysimage/to move the files under the/mnt/sysimage/directory to the root directory;

Command after prompt is sh-4.1#

LS is displayed as the root directory of the file;

650) this.width=650; "src=" Http://www.linuxidc.com/upload/2015_03/15031312039835.jpg "alt=" Linux into rescue mode "style=" border:0px; "/>

In fact, the lack of system files will cause "Chroot/mnt/sysimage" error, check also find out what, because no matter what is missing is a unified error "/bin/bash .... ", like I lack/lib64 directory, missing/boot files, in the" Chroot/mnt/sysimage "when the error, and the same error ... Can ignore this command, you do what to do, the modified file modification file, the Copy directory copy directory, does not affect.

10, in the sh-4.1# mode needs to exit first exit, back to bash-4.1# can reboot restart the system;

650) this.width=650; "src=" Http://www.linuxidc.com/upload/2015_03/15031312036987.jpg "alt=" Linux into rescue mode "style=" border:0px; "/>



Production accident: Delete/lib64 by mistake, remove/lib64 catalogue

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.