Troubleshooting of Linux operation system and common fault handling

Source: Internet
Author: User


A general approach to the failure of a Linux system

Error Messages---> Review log files---> Analyze location problems---> Resolve issues.


Reasons and solutions for the failure of Linux system to start

There are many reasons why the system cannot be started, and there are several common situations:

1 file system is destroyed, often due to power outages and illegal shutdown caused by inconsistent file system structure. The fix is to use FSCK to name the Force fix, go to single user mode or interactive interface, follow the prompts to enter modify mode, uninstall the corresponding problem disk, and then use the fsck command to repair, the unrecoverable data will be stored under Lost+found. Umount/dev/sda3 fsck.ext4-y/dev/sda3

2 system Configuration/etc/fstab error or missing and cannot start. When the boot time appears starting system logger stop, will find a way to recover/etc/fstab files, using the Linux rescue Repair mode login system, to obtain mount points and partition information, reconstruct/etc/fstab file.

3 system Kernel file is missing, kernel upgrade error, boot program error, hardware failure, etc. will cause unable to start


Three Linux network fault processing thought flow

1 Check that permissions are open, Iptables,selinux

2 service is normal, use Telnet or netstat to check whether the service is open properly

3 Check whether the local network is normal, ping itself IP, the same network segment host, Gateway

4 Check that DNS resolution is normal,/etc/hosts and/etc/resolv.conf

5 Detect network card IP settings, route check routing is correct

6 Check the network hardware, network card, router, hub, network cable, switch (lsmod, ifconfig, IP)


Quad Read-only file System error Resolution

Ideas:

Website program Issues

Disk issues

Troubleshooting website programs, reading wrong information and service log errors, and system logs to locate the problem Read-only file system Most of the disk problem, when the above error, the disk corresponding directory is unable to write, then it will be repaired, file system repair command fsck

First check if there is a user using the disk, Fuser-m/dev/sda1, if any, stop the corresponding port program

Then unmount the file system Umount/www/data

Then fix the file system fsck-v-a/dev/sda1

Last Restore Mount Mount/dev/sda1/www/data


Five Argument list too long error resolved

This error may occur when a large number of files in a directory are deleted, due to the limitation of the Linux system to passing parameters, which can be viewed with getconf Arg_max.

recompiling the kernel parameters can permanently resolve the problem, but the kernel is risky, or the following method is used to insure

Since you can not clear a large number of files, then batch delete or find or loop delete it, you can use the following command to clean up

RM [a-n]*-RF

RM [o-z]*-RF

Find/www/data-type f-print-exec rm-f {} \;


Six Inode depletion failure

When the inode is exhausted, the disk has space, but there will be no room left error

Use the Df-i command to view the usage of all partitions corresponding to the Inode

The inode number of the corresponding file can be viewed with ls-i nginx.log. More information with stat Nginx.log view

In the case of inode exhaustion, clean-up deletes those useless files, especially those that are broken and small.


Seven delete files after the space does not release the problem

The data of the file system is divided into two parts: the data part and the pointer part, when a process is using a file is, execute the delete command, the space is not freed, the deletion is the data file part, the pointer part is not deleted, so the space is not released.

Use lsof |grep Delete to view the deleted file, find the corresponding file to execute the empty command echo "" >/tmp/nginx.log space will be released


Eight "too many open files" error

Service error exception too many open files

View file descriptor 65535 is the maximum value with Ulimit-n

Check the value of a normal user cat/etc/security/limits.conf |grep www

If the average user's value is not 65535 then add this limit to the average user

www soft nofile 65535

www hard nofile 65535

If the value of the normal user above shows 65535 and the error still occurs, consider whether the time to add the limit value is earlier than the last time the application was started, and if the application time is earlier, the application can be restarted directly.





Troubleshooting of Linux operation system and common fault handling

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.