Summary of Linux kernel crash debugging methods

Source: Internet
Author: User
Tags python script

Using a null pointer and buffer overflow is the two most common cause of oops.

1, the direct view oops information, first find the source code occurs oops location, by viewing the instruction register EIP value, you can find the location. Then find the function call stack to get more information. Local variables, global variables, and function parameters can be distinguished from the function call stack. The more important information is the instruction pointer (EIP), which is the address of the error instruction.

For example: In the function call stack of the Oops information of functions Faulty_read, the top of the stack is FFFFFFFF, the stack top value should be a value less than FFFFFFFF, for this value, we can not find the call function address, indicating that it is possible to cause a pointer error due to buffer overflow.

If the Oops information shows that the address triggering the Oops is 0xa5a5a5a5, it is most likely caused by not initializing dynamic memory.

2, use the \prebuilts\gcc\linux-x86\arm\arm-eabi-4.7\bin\arm-eabi-addr2line command to find the address corresponding to the program location, display the corresponding program file name and line number.

1.arm-eabi-addr2line translation of Call stack 16 binary values similar to libxxx.so 0x00012345 into filenames and function names

ARM-EABI-ADDR2LINE-E libxxx.so 0x00012345

2.ARM-EABI-NM listing symbol information for a file

Arm-eabi-nm-l-c-n-S libdvm.so > Dvm.data

3.arm-eabi-objdump listing the details of a file

Arm-eabi-objdump-c-D libc.so > Libc.s

Through the analysis of the above tools, we can get a more complete call stack and call logic assembly code.

ADDR2LINE-E-F libc.so 0001173c

Objdump-s-D libc.so > Deassmble_libc.txt

Open this disassembly after the redirection file, when the query input 1173c This offset address, you will see in the vast crowd of

00011684 <pthread_create>:
11684:E92D4FF0 Push {r4, R5, R6, R7, R8, R9, SL, FP, LR}
11688:e24dd01c Sub sp, SP, #28; 0x1c
1168C:E1A06001 mov r6, r1
11690:E1A08002 mov r8, R2
11694:E1A09003 mov R9, r3
11698:E3A04001 mov r4, #1; 0x1
1169c:e59f521c Ldr R5, [PC, #540]; 118C0 <pthread_create+0x23c>
116a0:e58d000c str r0, [sp, #12]
116a4:eb009a35 BL 37f80 <strncmp+0x20>
116a8:e59f2214 LDR R2, [pc, #532]; 118C4 <pthread_create+0x240>
116AC:E1A03000 mov R3, r0
116B0:E1A01004 mov r1, r4
116b4:e593c000 LDR IP, [R3]
116B8:E3A0003C mov r0, #60; 0x3c
116bc:e08f3005 add R3, PC, R5
116c0:e7933002 LDR R3, [R3, R2]
116c4:e5834000 str r4, [R3]
116c8:e58dc010 str IP, [sp, #16]
116cc:eb009a3b BL 37fc0 <strncmp+0x60>
...

1173c:ebffec2b bl c7f0 <__pthread_clone>--> It's him, and it's been a success for you.

...

3. Before analyzing the dump image, the user should reboot into a stable kernel. Users can use GDB to make a limited analysis of the copied dump. When compiling vmlinux, you should add the-G option to generate the symbols for debugging, and then debug vmlinux with the following command:

<dump-file>
First objdump-d Vmlinx to disassemble your kernel
You can then use the following registers to determine:
1. Which function the EPC hangs in
2. Return address of the RA function,
3. Cause this register allows you to analyze what type of exception it is.

4, through the Cat/proc/modules to obtain the module core link base address, with the dead address minus the link base site to get the module offset, and then disassembly, find the corresponding function of the offset to find the crash function.

With objdump-d Test.ko > test.asm words. You can disassemble the module files. But since the module is the target file, you will see that the address of the text segment starts at 0. The module's code snippet is obviously not at the 0 address when the kernel is running. Also, in this snippet you will see calls between functions, and intra-function jumps are branch instructions that start with B (only for MIPS and ARM architecture discussions), and branch instructions jump before and after a PC-based value. Unless the jump symbol does not belong to this module, a 32-bit jump is required. In fact, when the module is linked to the kernel, the module's code snippet is linked to the kernel as a whole, so we just need to know the module link to the base address, and then use the dead address minus the base site this will get, the address in the module, the offset of the code snippet, and then through the disassembly of the file is found to die in And to get the link address of each module is very simple, direct cat/proc/modules to get. It is important to note that the module initialization function, which is decorated with __init, is placed in a separate segment, usually called. Init.text. This section will release the memory after the module is initialized.

5. Google provides a python script that can download the Python script from http://code.google.com/p/android-ndk-stacktrace-analyzer/and then use the
ADB logcat-d > LogFile Export the log of Crash,
Use Arm-eabi-objdump (located under Build/prebuilt/linux-x86/arm-eabi-4.2.1/bin) to convert so or EXE to assembly code, such as:
Arm-eabi-objdump-s mylib.so > Mylib.asm,
Then use the script
Python parse_stack.py <asm-file> <logcat-file>

http://blog.csdn.net/lickylin/article/details/19172725

As on the crash message, the function that crashed is Rb_init_debugfs, the address of the crash is 0x804386f8

1> under Linux , go to the following directory of the project:Kernel/linux, find the file vmlinux, execute the command gdb vmlinux:

Execute the following command under the GDB command to find the file and the number of rows where the error function is located

(GDB) b *0x804386f8

2> If you are unsure if the address of the crash is 0x804386f8, you can do so in the file System.map

The lookup function Rb_init_debugfs Gets the address of the function, and then adds the offset address (in this case, the offset address is 0x14 rb_init_debugfs+0x14/0x70).

3> Direct function name plus offset can also

(GDB) b *rb_init_debugfs+0x14

Above is the error module is compiled into the kernel, for the module compiled into the kernel can be used by GDB vmlinux to determine the number of files and rows of the error function.

What if the error module is dynamically loaded into the kernel?

This requires disassembly using Objdump, with the following command, the C language and assembly language will be displayed simultaneously (need to add the-G command)

#objdump-S **.o-g

If you use the command above, or only the assembly, but not the C language, do not worry, in the makefile you compile the driver module, compile the. o file, add the-G option,

For the newly generated. o file, use the command above to see that the assembly coexists with the C language. The error line can then be found by adding an offset to the function name shown in the kernel panic hint.

http://blog.csdn.net/wuruixn/article/details/38320643

1. In order to test the GDB operation, deliberately add a null pointer operation code to the Do_vfs_ioctl method of the Kernel/linux/fs/ioctl.c file, then compile the image to burn the board, start the single board, the kernel crash, part of the log as follows:CPU 0 Unable to handle kernel paging request at virtual Address 00000000, EPC = = 800a73b8, RA = = 800a793coops[#1]:
Cpu 0
$0:00000000 10008d00 00000000 Ffffffea
$4:FFFFFDFD 10008d01 00000001 00000000
$8:00000000 7fed2e40 00001cb2 00000b3b
$12:00031c7f 2ab5ead7 2aaac7c9 15010000
$16:7fed2e18 878ca5a0 00000000 00000001
$20:2ab84980 00000000 00000007 00000000
$24:00000000 2ab62090
$28:8782c000 8782fe98 7fed2fc8 800a793c
hi:0000002a
Lo:000311fc
Epc:800a73b8 Do_vfs_ioctl+0x88/0x5c8
Not tainted
RA:800A793C sys_ioctl+0x44/0x98
STATUS:10008D03 KERNEL EXL IE
cause:00000008
badva:00000000
prid:0002a080 (Broadcom4350)
Modules Linked In:

Process Init (pid:1, threadinfo=8782c000, task=8782bb68, tls=00000000)
stack:878ca1a0 00000004 00000000 10008d00 00000603 2aabcfff 87b0bea8 00000001
87beeaf0 2aabc000 2aabd000 87aa9cb0 2aabd000 2aabd000 8782ff08 fffffff8
00000001 7fed3258 00000007 00000000 878ca5a0 0000540d 00000000 00000001
2ab84980 800a793c 08100871 00000001 87bea41c 0000fff2 00000000 2abbdff0
7fed2e18 00000001 7fed2e60 2abbdff0 00000120 8001ba7c 00000000 00000000
...
Call Trace: (--raw--
[<800a793c>] sys_ioctl+0x44/0x98
[<8001ba7c>] stack_done+0x20/0x3c

Call Trace:
[<800a73b8>] Do_vfs_ioctl+0x88/0x5c8
[<800a793c>] sys_ioctl+0x44/0x98
[<8001ba7c>] stack_done+0x20/0x3c

code:0c029c9f 02003021 8fbf0064 <8c020000> 8fb40060 8fb3005c 8fb20058 8fb10054 8fb00050
Disabling lock debugging due to kernel taint
Kernel panic-not syncing:attempted to kill init!
Rebooting in 1 seconds. <6>
Stopping CPU 1
Kersysmipssoftreset:called on CPU 0

2. Start GDB, run the GDB tool (or./mips-linux-uclibc-gdb) directly on the host (dev) console, and then typing file .../vmlinux start the kernel with debug information, Note At this point to configure the kernel debug switch to recompile the kernel generation file Vmlinux (10 times times larger than the previous file, 50M or more), configure the kernel debug switch as follows,

Kernel Hacking--->[*] Kernel debugging[*] Compile the kernel with debug infocompile kernel command: Make Kernelbuild, to run the command at the root of project to compile. start the GDB command as follows:3. GDB Debug LocationChecked Call trace logs, the most important part in log are "EPC" (Exception program counter), it's where the crash hap pened. In this example, the ' EPC ' is 0xc0d1d488. For X1000 and X3500, the address like 0x8xxxxxxxx are in kernel, and the address like 0xcXXXXXXX are in some module.The address 0x800a73b8 corresponds to the address (do_vfs_ioctl+0x88), which represents the offset address 0x88 at the function Do_vfs_ioctl. The debugging process is as follows:(GDB) list * (0X800A73B8)0x800a73b8 is in Do_vfs_ioctl (fs/ioctl.c:569).
564 error = Vfs_ioctl (Filp, CMD, arg);
565 break;
566}
567 error = *test;
568 return error;
569}
570
571 Syscall_define3 (IOCTL, unsigned int, fd, unsigned int, cmd, unsigned long, ARG)
572 {
573 struct file *filp;
(GDB) list* (do_vfs_ioctl+0x88)
0x800a73b8 is in Do_vfs_ioctl (fs/ioctl.c:569).
564 error = Vfs_ioctl (Filp, CMD, arg);
565 break;
566}
567 error = *test;
568 return error;
569}
570
571 Syscall_define3 (IOCTL, unsigned int, fd, unsigned int, cmd, unsigned long, ARG)
572 {
573 struct file *filp;
(GDB) list* (sys_ioctl+0x44)You can parse the function near the code to find the error location in line567,test as a null pointer (intentionally assigning a value of NULL before).

Note: If the list* (0x80xxxxxx) command prompts the following message: No source file for address 0x80xxxxxx. The reason is that the corresponding debug switch is not being compiled in make menuconfig.

Summary of Linux kernel crash debugging methods

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.