Solution to "crash" in Linux

Source: Internet
Author: User

If the problem persists, the problem is resolved by 80%. For the core of the operating system, if there is a problem to reproduce the method, it can be said that it has solved 99%. The common problem is that the system runs normally for a period of time and then crashes. If the problem cannot be reproduced, it is only analyzed based on what is left on the scene of the crash.

If the system is not clean, such as disk interruption and file system is good, you may be able to keep the log information in the file, but I have never encountered such good luck. If the keyboard interrupt can still respond (Press num lock, you can see the keyboard light off), then luck is good enough, then you can sacrifice the sysrq algorithm, at the same time, press Alt-sysrq-t to obtain the process system stack information, press Alt-sysrq-m to obtain the memory allocation information, and press Alt-sysrq-W to obtain the current register information.

Linux/documentation/sysrq.txt. In addition, it is best to disable the Automatic Blank function of the terminal so that the system can at least see some information on the screen when it is dead. The setting method is as follows:

# Echo 1>/proc/sys/kernel/sysrq

# Setterm-Blank
These two settings are recommended to be added to the system startup script (for example,/etc/rc. d/rc. Local) to ensure that each start can be run.
If it is unfortunate that the keyboard is also dead quietly (more unfortunately, this is common), it is not only a way to wait for death, then you can use the serial port terminal (serial console) send system information

To another system, you can locate the problem by analyzing the information. The setting method is as follows:


1. A monitored server and a PC for monitoring.
2. A serial direct connection.
1. Add a new grub project to the server and add the core parameter "console = ttys0 console = tty1", for example:
Kernel/boot/vmlinuz-2.4.21-9.30AXsmp Ro root = label =/1 Console = ttys0
Console = tty1
2. Modify/etc/sysconfig/syslog on the server and add the klogd option "-C 7" to ensure that more kernel information is output. For example:
Klogd_options = "-X-C 7"
3. Restart the server
4. Use a serial port to connect two machines. test:
1) Run "cat/dev/ttys0" on the PC and "Echo HI>/dev/ttys0" on the server to check whether "hi" output exists on the PC.
2) Run "cat/dev/ttys0" on the PC and "Echo W>/proc/sysrq-Trigger" on the server to check whether the kernel information is output on the PC.
3) Run "cat/dev/ttys0" on the PC and "modprobe loop" on the server to check whether the PC has the corresponding kernel information output.
5. If the test passes, run: CAT/dev/ttys0 | tee/tmp/result on the PC.
In addition, you can use the Windows Super Terminal to obtain the serial port information.
That's it.
In addition, some core support lkcd, netdump, and other debugging functions, you can also try.
The rest depends on experience and luck. Generally, the Linux system crashes due to the following reasons:
System hardware problems (SCSI card, motherboard, RAID card, Nic, hard disk ...)
Peripheral hardware problems (terminal switch, network ...)

Software problems

Driver bug (try to find the updated driver)

Core system bug (go to lkml and try another core)

System settings

Finally, Google. Sometimes you can directly enter "Linux system crashes. What should I do? "

Pe6650 Often crashes. "Check if anyone has encountered the same problems as you. Even if it is not found, it is also a helpful information to analyze the problem, at least that your system may be different from others.

Investigating the Linux system crash is both a science and an art. It involves a lot of hardware and software knowledge and experience. It is a process of continuous learning.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.