Add Kdump Manually

Source: Internet
Author: User
Tags touch command

Background:
Linux embedded device kernel hangs after death, unable to restart automatically, need to restart manually. And if there is no serial port, you can not record the kernel hangs dead stack, so you need to add a way to record the kernel hanging dead information for later debugging use. With the addition of the Kdump function in the device, the stack information that the kernel hangs dead can be recorded for later analysis.

Operation Steps:
1. Add Kdump required programs and configuration files
Methods: At present, it is known that the following file/sbin/kdump/sbin/kexec/bin/kdumpctl/etc/kdump.conf/etc/sysconfig/kdump is required, and then edit the two configuration files in the ETC directory Modify KDUMP kdump_bootdir= "/mnt", kdump_commandline_append= "1 irqpoll maxcpus=1", kdump_img= "kernel.img" Modify the kdump.conf content to Path/mnt/workdir/log, and you need to modify the path of Kdump and Kexec in the Kdumpctl, place the files in the MNT directory of the device and restart, then perform kexec-l/mnt/after rebooting. Kernel.img--initrd=/mnt/rootfs.img--command-line= "' cat/proc/cmdline ' 1 irqpoll maxcpus=1"; kexec-e to see if it was added successfully.

2, the problem: all kinds of tools, configuration files are added successfully, but the execution kdumpctl start command, still error bug:unable to handle kernel paging request at ffff880001497000, Or do not print stack information directly prompt killed.
Workaround: After testing, add to grub.cfg [email protected] to crashkernel=128m, completely omit the "@y" this part, so that kernel will automatically select a starting address for us. This will not lead to problems such as a start address error. When testing the kexec, it is best to add the-D option to print the debug information.

3, the question: after testing, according to the method of operation in the second step, after the completion of KEXEC-E execution, the kernel does not restart after playing the stack, but the direct card is dead.
Workaround: The analysis only found that the Kdump kernel could not be restarted because Reset_devices was added to the/etc/sysconfig/kdump configuration file kdump_commandline_append field. Delete the reset_devices value from the kdump_commandline_append option content.

4, Problem: Kernel panic error occurs, the kernel restarts, but only into the newly decompressed Rootfs, and the system Inittab is not executed, resulting in subsequent scripts are not executed.
Workaround: After finding the relevant data, Linux will extract initrd into Rootfs root file system after kernel loading, then execute LINUXRC or init in root file system to initialize Inittab configuration file, then perform a series of script initialization operations. According to the phenomenon, Inittab did not execute, so we started to find out why Inittab was not executed. Then modify the value of Kdump_commandline_append in the/etc/sysconfig/kdump file to "1 Irqpoll maxcpus=1 INIT=/LINUXRC" and then you can perform the success.

5, the problem: Because the use of the original ROOTFS will cause the device to start many unnecessary programs, services, so should be re-customized a kdump dedicated rootfs.
Workaround: Download the new version of BusyBox, then compile, install the build _install directory, which is part of the file system, and then create the DEV,ETC,LIB,LIB64,VAR,PROC,TMP,MNT, SYS the directories required for these systems and the WORKDIR,MNT directories required by the device. Because of the static compilation error when compiling BusyBox, there is no static compilation, so we need to use LDD busybox to view the dynamic libraries required by BusyBox and then copy the dynamic libraries into the created/lib64 directory. Because the LINUXRC need to initialize the system, it is necessary to add fstab, Inittab and Init.d/rcs in the/etc directory, the three necessary files, because Fstab and Inittab only with the system, so you can borrow the original rootfs of the corresponding file, The initialization script of the RCS System program, you can add the commands you need to implement into the RCS, you need to add the related device node in the Dev directory, add the console with the Mknod command, null and so on, and the shm,pts directory as the mount point. Due to the kdump of the system generated after the Vmcore is too large, CF card storage, can only use Vmcore-dmesg/proc/vmcore command to extract the kernel crash log, so the RCS simple implementation of Mount file system, CF card, extract kernel crash information, Then restart the system's functionality. Since the new rootfs is used in the kdump process and is independent of the original ROOTFS, the program that extracts the kernel crash log needs to be pre-added to the new ROOTFS-related directory. After the entire Rootfs directory file has been added, find is executed in the _install directory. | Cpio-h NEWC--quiet-o | Gzip-9 >~/rootfs.img, package these folders and compress the build rootfs.img into the current user's home directory.

6. Issue: Kdumpctl the name format of the kernel and INITRD is required to start the Kdump service, and the boot.cfg boot configuration file needs to be modified.
Solution: Initially add the kdump required kernel and INITRD files to the installation package in a way that modifies the installation package. However, after trying to not hit the files into the Vsos.bin, later found that the configuration of the Package.ini file, modify the Package.ini configuration, you can add the corresponding files to the Vsos.bin. However, when the device is installed, the newly added files are still not installed. Due to the lack of familiarity with the device unpacking process, a compromise was taken: adding the new INITRD (that is, the Rootfs file required by Kdump) to the/mnt/system directory, Then in the boot script start.sh, add the ability to modify boot.cfg, and move the new Rootfs file to the/mnt directory, while creating a new soft connection to the original kernel file for kdump use, because Kdumpctl will check the kernel files, initrd files, As well as the timestamp of the Kdump configuration file, if the timestamp of the INITRD file is earlier than the other file, Kdumpctl attempts to regenerate the Initrd file, which is likely to cause an error in the process, so the time to update the INITRD file with the Touch command is updated when the file is moved.

7. Problem: The disk number of the CF card is sometimes not/dev/sda1, which causes the kdump to fail to mount after the reboot/dev/sda1
WORKAROUND: Use this command Fdisk-l | grep "83" | grep "\*" | grep Dev | awk ' {print '} ' to obtain the CF card's device number.

Content Summary
The KEXEC-L option is to load the kernel directly and then configure the KEXEC-E command to start the new kernel, whereas the KEXEC-P option is the kernel to enable when the current kernel encounters panic. The echo C>/proc/sysrq-trigger is required to manually trigger the kernel panic.
In fact, you can use kexec instead of kdumpctl, so you can directly use the old kernel name, but also do not consider the kernel, INITRD and configuration file timestamp problem, and because the use of a new ROOTFS system, This is easiest to do by modifying the kernel crash information that is manually implemented by the RCS script, which can be completely detached from the dependency on the Kdump-related configuration file. However, considering the way of booting kdump in standard Linux, it is convenient for the follow-up personnel to maintain, and the KDUMPCTL mode is used to start the Kdump service.
Since the submission on SVN is a packaged img file, in order to modify the contents of the IMG file, you need to copy the IMG file to a new folder, and then execute: MV rootfs.img rootfs.img.gz; Gzip-d rootfs.img.gz; Cpio-id restores the entire file system, then modifies it as required, and then executes find. | Cpio-h NEWC--quiet-o | Gzip-9 >~/rootfs.img to regenerate the img file. The location of the last IMG file cannot be in the current directory in the command to regenerate the img file above.
Gain and Loss analysis:
Because of the kernel-related content is not very familiar with, so the early only through the Internet to find relevant information, a little try. Although this process is difficult, but I learned a lot of things, such as the Linux boot process, such as the detailed process, learned to cut the kernel and busybox, you can build a small mini Linux.
Due to time limitation, the function of extracting vmcore is not further realized. However, it should be easy to add the ability to get vmcore on the basis of a job now.

Add Kdump Manually

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.