Red Hat Linux Fault location technology detailed and examples (3)

Source: Internet
Author: User

Red Hat Linux Fault location technology detailed and examples (3)

On-line fault location is in the event of a fault, the operating system environment is still accessible, fault handlers can be logged into the operating system through console, SSH, etc., in the shell to perform a variety of operation commands or test procedures in the manner of the failure environment to observe, analyze, test, To locate the cause of the failure.

AD:2014WOT Global Software Technology Summit Beijing Station course video release

5, using Kdump tool core Fault Location example

A) Deployment Kdump

The steps to deploy Kdump collect fault information are as follows:

(1) Set the relevant kernel boot parameters.

Add the following to the/boot/grub/menu.lst

    1. crashkernel=[email protected] nmi_watchdog=1

Where the Crashkernel parameter is used to reserve memory for the Kdump kernel; Nmi_watchdog=1 is used to activate NMI interrupts, and we need to deploy NMI watchdog to ensure that the panic is triggered if an outage is not determined if the failure is interrupted. Restart the system to ensure the settings are in effect

(2) Set the relevant sysctl kernel parameters.

Last add a row in/etc/sysctl.conf

    1. Kernel.softlookup_panic = 1

This setting ensures that panic is called when softlock occurs, triggering kdump behavior execution #>sysctl-p ensuring that the settings take effect

(3) Configuration/etc/kdump.conf

Add the following lines to the/etc/kdump.conf

    1. Ext3/dev/sdb1
    2. Core-collector makedumpfile-c–message-level 7-d 31-i/mnt/vmcoreinfo
    3. Path/var/crash
    4. Default reboot

Where/DEV/SDB1 is the file system for placing DumpFile, dumpfile files are placed under/var/crash, to create/DEV/SDB1 directories in advance under the/var/crash partition. "-D 31" Specifies the level of filtering for the dump content, which is important for the dump partition not to hold all of the memory content or if the user does not want the dumping to break the business for too long. The Vmcoreinfo file is placed in the/directory of the/DEV/SDB1 partition and needs to be generated using the following command:

#>makedumpfile-g//vmcoreinfo-x/usr/lib/debug/lib/modules/2.6.18-128.el5.x86_64/vmlinux

The "Vmlinux" file is provided by the Kernel-debuginfo package, which requires the kernel-debuginfo and Kernel-debuginfo-common of the appropriate kernel to be installed two packages before running the Makedumpfile, and the two packages need to be from a http://ftp.redhat.com download. "Default reboot" is used to tell Kdump to restart the system after collecting the dump information.

(4) Activating kdump

Run the #>service kdump Start command, and you will see that a initrd-2.6.18-128.el5.x86_64kdump.img file is generated in the/boot/directory when it is successfully completed. The file is the initrd file of the kernel loaded by kdump, and the job of collecting the dump information is carried out in the INITRD startup environment. Look at the code for the/etc/init.d/kdump script, and you can see that it calls the MKDUMPRD command to create the INITRD file for the dump.

1. Test the effectiveness of Kdump deployment

In order to test the effectiveness of the kdump deployment, I wrote the following kernel module, through Insmod load the kernel module, you can generate a kernel thread, after 10 seconds or so, occupy 100% of the CPU, after 20 seconds or so to trigger kdump. After the system restarts, check the contents of the/oracle partition/var/crash directory to confirm that the Vmcore file is generated.

  1. ZQFTHREAD.C #include
  2. #include
  3. #include
  4. #include
  5. #include
  6. #include
  7. Module_author ("[email protected]");
  8. Module_description ("A module to test ....");
  9. Module_license ("GPL");
  10. static struct task_struct *zqf_thread;
  11. static int zqfd_thread (void *data);
  12. static int zqfd_thread (void *data)
  13. {
  14. int i=0;
  15. while (!kthread_should_stop ()) {
  16. i++;
  17. if (I < ) {
  18. Msleep_interruptible (1000);
  19. PRINTK ("%d seconds\n", i);
  20. }
  21. if ( i = =)//Running in the kernel
  22. i = 11;
  23. }
  24. return 0;
  25. }
  26. static int __init zqfinit (void)
  27. {
  28. struct Task_struct *p;
  29. p = kthread_create (zqfd_thread, NULL, "%s", "ZQFD");
  30. if (p) {
  31. Zqf_thread = p;
  32. Wake_up_process (Zqf_thread); Actually start it up
  33. return (0);
  34. }
  35. Return (-1);
  36. }
  37. static void __exit Zqffini (void)
  38. {
  39. Kthread_stop (Zqf_thread);
  40. }
  41. Module_init (Zqfinit);
  42. Module_exit (Zqffini)
  43. Makefile Obj-m + = ZQFTHREAD.O
  44. Making #> Make-c/usr/src/kernels/2.6.32-71.el6.x86_64/ m= ' pwd ' modules

2. Analyze Vmcore files with crash tools

The command line format for parsing vmcore with the crash command is shown below. Open Vmcore with crash, mainly with DMESG and BT command to print out the problem of the execution path of the call trace, with dis disassembly code, finally confirm the call trace corresponding C source location, and then the logical analysis.

    1. #>crash/usr/lib/debug/lib/modules/2.6.18-128.el5.x86_64/vmlinux/boot/system.map-2.6.18-128.el5.x86_64./ Vmcore

Red Hat Linux Fault location technology detailed and examples (3)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.