Analysis on Troubleshooting of ARM Linux kernel driver exceptions

Source: Internet
Author: User

Original Works. For reprinted works, please use a hyperlink to describe the source!

 

Link: http://blog.csdn.net/hunhunzi/article/details/7052032

Recently, we are engaged in ATMEL's sam9x25 platform, a Linux system, for industrial devices. This is the first time that I have participated in the development of industrial equipment. When debugging the Atmel sam9x25 LINUX serial port device, it is found that both reading and writing will produce exceptions. The related exception information is as follows:

========================================================== ========================================================== ========================================

Unable to handle kernel Null Pointer Dereference at virtual address 00000000
PGD = c0004000
[00000000] * PGD = 00000000
Internal error: Oops: 17 [#1]
Last sysfs file:/sys/devices/virtual/VC/vcsa1/dev
Modules linked in:
CPU: 0 not tainted (2.6.39 #1)
PC is at atmel_tasklet_func + 0x110/0x69c
LR is at atmel_tasklet_func + 0x10/0x69c
PC: [<c01a4f30>] LR: [<c01a4e30>]SRS: 20000013
SP: c7825f50 IP: c045e0bc FP: 00000000
R10: c0456a80 R9: 0000000a R8: 00000000
R7: c7874568 R6: c045e0a8 R5: 00000100 R4: c045dfb4
R3: 00000002 R2: 00000ffc r1: 00000001 R0: 00000001
Flags: nzcv irqs on fiqs on mode svc_32 ISA arm segment Kernel
Control: 0005317f table: 27aec000 DAC: 00000017
Process ksoftirqd/0 (PID: 3, stack Limit = 0xc7824270)
STACK: (0xc7825f50 to 0xc7826000)
5f40: 00000100 c7824000 00000001 00000018
5f60: 0000000a c0456a80 c7825f84 00000000 00000100 c7824000 00000001 00000018
5f80: c0456a80 c0047b70 00000006 c0047650 c0432e50 00000000 c7824000 00000000
5fa0: 00000000 c0047938 00000000 00000000 00000000 c00479a0 c7825fd4 c7819f60
Fc50: 00000000 c0058c64 c00335f4 00000000 00000000 00000000 c7825fd8 c7825fd8
5fe0: 00000000 c7819f60 c0058be0 c00335f4 00000013 c00335f4 0c200050 fc3b9beb
[<C01a4f30>] (atmel_tasklet_func + 0x110/0 x69c) from [<c0047b70>] (tasklet_action + 0x80/0 xe4)
[<C0047b70>] (tasklet_action + 0x80/0 xe4) from [<c0047650>] (_ do_softirq + 0x74/0x104)
[<C0047650>] (_ do_softirq + 0x74/0x104) from [<c00479a0>] (run_ksoftirqd + 0x68/0x108)
[<C00479a0>] (run_ksoftirqd + 0x68/0x108) from [<c0058c64>] (kthread + 0x84/0 x8c)
[<C0058c64>] (kthread + 0x84/0 x8c) from [<c00335f4>] (kernel_thread_exit + 0x0/0x8)
Code: 1a000002 e59f057c e59f157c ebfa416c (e5983000)
--- [End trace 6b8e1841ba3a56c9] ---
Kernel panic-not syncing: Fatal exception in interrupt
[<C0037784>] (unwind_backtrace + 0x0/0xf0) from [<c00429f4>] (panic + 0x54/0x178)
[<C00429f4>] (panic + 0x54/0x178) from [<c0035a18>] (die + 0x17c/0x1bc)
[<C0035a18>] (die + 0x17c/0x1bc) from [<c000000c4>] (_ do_kernel_fault + 0x64/0x84)
[<C000000c4>] (_ do_kernel_fault + 0x64/0x84) from [<c003889c>] (do_page_fault + 0x1b8/0x1cc)
[<C003889c>] (do_page_fault + 0x1b8/0x1cc) from [<c002c2f0>] (do_dataabort + 0x38/0 x9c)
[[<C002c2f0>] (do_dataabort + 0x38/0 x9c) from [<c003234c>] (_ dabt_svc + 0x4c/0x60)
Exception stack (0xc7825f08 to 0xc7825f50)
5f00: 00000001 00000001 00000ffc 00000002 c045dfb4 00000100
5f20: c045e0a8 c7874568 00000000 running 000a c0456a80 00000000 c045e0bc c7825f50
5f40:C01a4e30 c01a4f3020000013 ffffffff
[<C003234c>] (_ dabt_svc + 0x4c/0x60) from [<c01a4f30>] (atmel_tasklet_func + 0x110/0 x69c)
[<C01a4f30>] (atmel_tasklet_func + 0x110/0 x69c) from [<c0047b70>] (tasklet_action + 0x80/0 xe4)
[<C0047b70>] (tasklet_action + 0x80/0 xe4) from [<c0047650>] (_ do_softirq + 0x74/0x104)
[<C0047650>] (_ do_softirq + 0x74/0x104) from [<c00479a0>] (run_ksoftirqd + 0x68/0x108)
[<C00479a0>] (run_ksoftirqd + 0x68/0x108) from [<c0058c64>] (kthread + 0x84/0 x8c)
[<C0058c64>] (kthread + 0x84/0 x8c) from [<c00335f4>] (kernel_thread_exit + 0x0/0x8)

========================================================== ========================================================== ========================================

Generally, the address that generates the exception is the value of the LR register. From the exception information above, we can see that the value of [LR] Is c01a4e30.

Next, we can find this address through the disassembly of the kernel image file. After the kernel is compiled, the vmlinux file is generated under the root directory of the kernel code. We can use the following command to disassemble the file:

Arm-None-Eabi-objdump-DZ
-S vmlinux> Linux. Dump

It is worth noting that the parameter-s of arm-None-Eabi-objdump indicates that the original code and the decompiled code are presented as much as possible, the-S parameter must be compiled with the arm-Linux-GCC parameter-g to output the original code simultaneously during disassembly. Therefore, I added the-G compilation parameter to the makefile in the root directory of the Linux kernel code:

Kbuild_cflags: =-G-Wall-wundef-wstrict-prototypes-wno-trigraphs \
-Fno-strict-aliasing-fno-common \
-Werror-implicit-function-declaration \
-Wno-format-security \
-Fno-delete-Null-pointer-Checks

After the makefile is modified, re-compile the kernel. The vmlinux file generated in the root directory contains the original code information. Therefore, the file size is twice the size of the original file!

Finally, execute"Arm-None-Eabi-objdump-Dz-S
Vmlinux> Linux. Dump
", Because the-G compilation parameter is added, it takes a long time to execute this disassembly command (it took me nearly 6 hours to execute it on the Virtual Machine !), The decompiled Linux. Dump File is also significantly larger than the original 44 MB to 503 MB.

Next, you can use ultraedit to open the Linux. Dump File and find the "c01a4e30" string.

The final information found is:

========================================================== ========================================================== ========================================

/*
* Tasklet handling tty stuff outside the interrupt handler.
*/
Static void atmel_tasklet_func (unsigned long data)
{
C01a4e20: e92d45f0 push {R4, R5, R6, R7, R8, SL, LR}
C01a4e24: e24dd01c sub sp, SP, #28; 0x1c
C01a4e28: e1a04000 mov R4, R0
/* The interrupt handler does not take the lock */
Spin_lock (& Port-> lock );

If (atmel_use_pdc_tx (port ))
Atmel_tx_pdc (port );
Else if (atmel_use_dma_tx (port ))
C01a4e2c: ebfffda1 BL c01a44b8 <atmel_use_dma_tx>
C01a4e30: e3500000 CMP r0, #0; 0x0
C01a4e34: e5921334 LDR R3, [R4, #52]
C01a4e38: 0a00007b beq c01a502c <atmel_tasklet_func + 0x20c>

========================================================== ========================================================== ========================================

It can be seen that the exception is generated in the atmel_tasklet_func FunctionElse if (atmel_use_dma_tx (port) Row.

It is estimated that the "Port" parameter of atmel_use_dma_tx (port) is a null pointer!

 

Finally, I removed the DMA function of the serial port and changed it to direct transmission. Although the efficiency was low, the exception occurred and disappeared.

In the future, we will analyze the cause of this exception and thoroughly solve the problem.

 

Keywords: arm Atmel sam9x25 Linux kernel driver exception debugging disassembly objdump

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.