A Linux system Recovery case due to NAS storage failure

Source: Internet
Author: User

I. Description of fault phenomena

NAS OS kernel for Linux, with the storage of 16 hard disks, a total of two groups, each set of Raid5,linux operating system does not start normally, in the service boot to cups there is stopped, the key CTRL + C forced disconnection is not responding, check the hard disk status, are normal, There is no alarm or warning phenomenon.


Second, the question judgment thought

Through these phenomena, first of all to determine the NAS hardware should be OK, NAS storage disk should be normal, now Linux can not start, it should be the Linux system itself problems, so first from the Linux system to troubleshoot.


Iii. problem-handling process


1, the first processing process


The NAS system itself is a Linux kernel loaded with a file system management software, management software can be the system disk, system services, file systems and other management and operation, under normal circumstances, the Linux kernel-based NAS system should boot into the INIT3 or init5 mode, Since the NAS only uses a Linux kernel module and a few simple services, so that the Linux system under the NAS must be booted into init 3 mode, and now cannot boot into the multi-user character interface, why not let Linux directly into the single user (init 1) mode, Because only a few of the services required by the system are enabled in single-user mode, and the CPUs service is application-level, it is certainly not started in "Init 1" mode, which avoids the problem that cups cannot start, so the following work is going into the single-user mode of Linux.


Many Linux distributions can be launched in the boot interface through the relevant settings into single-user mode, by looking at the boot process of the NAS, the basic judgment of this Linux system and Rhel/centos distribution very similar, therefore, through the rhel/ CentOS enters single-user mode for a try.


Rhel/centos into single-user mode is very simple, is the system boot to the boot Welcome screen, the key E, and then edit the correct kernel boot options, with the most later with the "single" option, the Last Direct button "B" to enter the individual user.


Next, restart the NAS, then hardware self-test, and then start Linux, has been waiting for the launch of the NAS welcome interface, but the welcome interface has not come out, directly into the kernel image, load kernel phase, no kernel boot interface, how to enter a single user Ah, after simple thinking, or decided after the hardware testing is finished directly by pressing the keyboard "E" key, The miracle appeared, but also really can, NAS into the kernel boot interface, through a simple observation, the release of the second is to boot the kernel option, then move the keyboard up and down, select the kernel, and then the key "E", enter the kernel boot editing interface, In the last side of the line, enter "single", then press ENTER to return to the previous interface, then press "B" to start one-user boot, after a minute of time, the system has the desire to enter the single-user shell command line.


After entering the single-user mode, you can do a lot of things, the first thing to do is to the cups service in multi-user mode self-boot shutdown, execute the command as follows:

Chkconfig--levle cups off

After successful execution, reboot the system into multi-user mode to see if the system can start normally.


2, the second processing process

After booting the cups service and shutting down, restarting the NAS, and finding that the problem remains, is the NAS still booting to the cups service and stopped, did the above command fail to execute? Clearly has banned the cups service started, how or started it? So, go ahead and restart the NAS and go into single-user mode again to see where the problem is.

After entering a single user, execute the chkconfig command again, still can succeed, is the cups service has a problem, first look at the configuration file, execute the following command:

Vi/etc/cups/cupsd.conf

Found here a problem, vi open cupsd.conf, the hint "write file in swap", the file clearly exists, how to say in virtual memory, after thinking, only one possible, NAS device Linux system partition should not be mounted correctly, When entering a single user, all files are stored in the virtual memory, to verify very simple, execute the "DF" command to view, as shown in:

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M02/38/D7/wKioL1O0xH7wYz9IAABkm3w7MzU296.jpg "title=" 11.png "alt=" Wkiol1o0xh7wyz9iaabkm3w7mzu296.jpg "/>


As can be seen from here, the Linux system partition is not mounted, the "fdisk-l" check the status of the disk partition, the output as shown:

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M01/38/D8/wKiom1O0xL7D37WAAAKEsIcahS8234.jpg "title=" 22.png "alt=" Wkiom1o0xl7d37waaakesicahs8234.jpg "/>


The output shows that the NAS's system disk is/DEV/SDA, dividing only the/dev/sda1 and/dev/sda2 two system partitions, and the data disk is done RAID5, the device identification on the system is/DEV/SDB1 and/DEV/SDC1, Because the single user does not mount any NAS disks by default, here is an attempt to manually mount the NAS's system disk by executing the following command:

[[Email protected] ~] #mount/dev/sda2/mnt

[[Email protected] ~] #mount/dev/sda1/opt

Here/mnt,/opt is arbitrarily mounted directory, can also be mounted to other empty directory, Mount complete, respectively into the directory to see what the content, as shown:

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/38/D7/wKioL1O0xKCQ-7CmAAEoImVsd5k932.jpg "title=" 33.png "alt=" Wkiol1o0xkcq-7cmaaeoimvsd5k932.jpg "/>


650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M01/38/D8/wKioL1O0xLXSBwSIAAISta6Zzh4914.jpg "title=" 44.png "alt=" Wkiol1o0xlxsbwsiaaista6zzh4914.jpg "/>

From the view of these two content, preliminary judge,/dev/sda2 partition should be the root partition of Linux, and/DEV/SDA1 should be the/boot partition. Now that the partition is mounted, execute the DF command again to see the mount condition, as shown in:

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/38/D8/wKioL1O0xTLi4459AACX8nGmaKE649.jpg "title=" 55.png "alt=" Wkiol1o0xtli4459aacx8ngmake649.jpg "/>


So far, we've found a problem. /DEV/SDA2 disk partition has no available disk space, and this partition is just the root partition of the NAS system, the root partition has no space, then the system boot must be a problem.

The following again to the idea of the previous case, because the system Cups service at startup will write the boot log to the root partition, and the root partition because there is no space, so also can not write the log, resulting in the result is the Cups service can not start, This explains why the NAS system stopped each time it was booted into the cups service in this case.


Four Problem solving

Because the NAS system has only the root partition and the/boot partition, so the system generated related logs are stored in the root partition, now the root partition is full, the first can be cleaned is the/var directory system-related log files, usually can clean up the directory has/var/log, execute the following command to view/var/ Log log directory occupies disk space size:

[Email protected] ~]# Du-sh/var/log

50.1g/var/log

The command output discovers that the/var/log directory occupies only 70% of the root partition, cleans up the log files in this directory to free up most of the root partition space, cleans up, restarts the NAS system, discovers that the system Cups service starts normally, and the NAS service is up and running.

This article from "Technology Achievement Dream" blog, declined reprint!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.