Linux system "Crash" solution _unix Linux

Source: Internet
Author: User
If the problem can be reproduced, then the problem has been solved 80%. For the operating system core, if there is a problem to reproduce the method, it can be said that has been resolved 99%. Often the problem is that the system can run normally for a period of time, and then panic. If it is not easy to reproduce the problem, then only according to the crash site left behind the analysis of things.


If the system is not dead clean, such as disk interrupts and the file system is good, then perhaps the log information can be kept in the file, but such a good luck I have never encountered. If the keyboard interrupt can also respond (press num Lock, you can see the keyboard light out), then luck is good enough, then can sacrifice SysRq Dafa, at the same time press ALT-SYSRQ-T to obtain process system stack information, press ALT-SYSRQ-M to obtain memory allocation information, press Alt-sysrq-w Gets the current register information.


Linux/documentation/sysrq.txt. In addition, it is best to turn off the automatic blank function of the terminal so that at least some information can be seen from the screen when the system dies. The Setup method is:


# echo 1 >/proc/sys/kernel/sysrq

# Setterm-blank
These two settings are best added to the system startup script (such as/etc/rc.d/rc.local) to ensure that each boot can be run.
If unfortunately, the keyboard is also dead quietly, (more unfortunately, this situation is very common), then it is not only a way to die, then you can use the serial terminal (serial console) to send the system information

To another system so that you can locate the problem by analyzing the information. Set the method as follows:

Preparatory work

1. A monitored server, a PC that monitors work.
2. A straight string.
Configuration
1. On the server, add a new Grub project, add the core parameters "Console=ttys0 Console=tty1", such as:
KERNEL/BOOT/VMLINUZ-2.4.21-9.30AXSMP ro ROOT=LABEL=/1 console=ttys0
Console=tty1
2. On the server, modify the/etc/sysconfig/syslog and add the KLOGD option "-C 7" to ensure that more kernel information is exported. Such as:
klogd_options= "-x-c 7"
3. Reboot the server
4. Connect two machines with serial port direct connection, test:
1 run "Cat/dev/ttys0" on the PC and run "echo Hi >/dev/ttys0" on the server to see if there is "HI" output on the PC.
2 Run "Cat/dev/ttys0" on the PC and run "Echo w >/proc/sysrq-trigger" on the server to see if the PC has the corresponding kernel information output.
3 Run "Cat/dev/ttys0" on the PC, run "Modprobe loop" on the server, see if the PC has the corresponding kernel information output.
5. If the test passes, then runs on the PC: CAT/DEV/TTYS0 | Tee/tmp/result
In addition, you can also use Windows HyperTerminal to obtain serial information.
That ' s it.
In addition, some core support LKCD, netdump and other debugging functions, you can also try.
The rest, only by experience and luck, the general cause of Linux system crashes are:
System hardware issues (SCSI card, motherboard, RAID card, NIC, hard drive ...)
Peripheral hardware issues (terminal switcher, network ...)

Software problems

Drive Bugs (try to find an updated driver)

Core system bugs (go to lkml, or try a different core)

System Setup


Finally, Google A. Sometimes you can just type "Linux system crashes." "


PE6650 often crash ", see if anyone has encountered the same problem as you. Even if not, it is a message that helps you analyze the problem, at least to show that your system may be different from others.


Investigate the Linux system crash problem, this is both science and art, involving a lot of hardware and software knowledge and experience, is a continuous learning process

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.