Oops in Linux Kernel

Source: Internet
Author: User
Tags tainted

What is oops in Linux kernel? From a linguistic perspective, oops should be a anthropomorphic word. After a minor accident or an embarrassing task, you can say "oops". translated into Chinese, it is called "". "Sorry, sorry, I didn't mean to break your cup ". That's what oops means.
What is oops in Linux kernel development? In fact, there is no essential difference between it and the above explanation, but the main character of the speech has become Linux. When some fatal problems occur, our Linux kernel will say sorry to us: "Oh, sorry, I screwed it up ". The Linux kernel prints oops information when the kernel panic occurs, and shows the current Register status, stack content, and complete Call trace to us. This will help us locate the error.
Next, let's look at an instance. To highlight oops, the only role of this example is to create a null pointer reference error.
#include <linux/kernel.h>#include <linux/module.h>  static int __init hello_init(void){    int *p = 0;          *p = 1;     return 0;}  static void __exit hello_exit(void){    return;}  module_init(hello_init);module_exit(hello_exit);  MODULE_LICENSE("GPL");}

Obviously, the error is caused by the 8th rows.
Next, we compile this module and use insmod to insert it into the kernel space. As we expected, oops emerged.
[100.243737] bug: unable to handle kernel Null Pointer Dereference at (null)
[100.244985] IP: [<f82d2005>] hello_init + 0x5/0x11 [Hello]
[100.262266] * PVDF = 00000000
[100.288395] Oops: 0002 [#1] SMP
[2, 100.305468] Last sysfs file:/sys/devices/virtual/sound/Timer/uevent
[100.325955] modules linked in: Hello (+) vmblock vsock merge vmhgfs merge into gameport merge into several snd_pcm merge into snd_rawmidi merge snd_seq merge into ppdev psmouse merge fbcon into font bitblit softcursor snd merge soundcore merge VMCI merge into vgastate merge into limit lp parport floppy pcnet32 MII mptspi mptscsih mptbase scsi_transport_spi vmxnet
[100.472178] [100.494931] PID: 1586, COMM: insmod not tainted (2.6.32-21-generic # 32-ubuntu) VMware Virtual Platform
[100.540018] EIP: 0060: [<f82d2005>] eflags: 00010246 CPU: 0
[1, 100.562844] EIP is at hello_init + 0x5/0x11 [Hello]
[100.584351] eax: 00000000 EBX: fffffffc ECx: f82cf040 edX: 00000001
[100.609358] ESI: f82cf040 EDI: 00000000 EBP: f1b9ff5c ESP: f1b9ff5c
[100.631467] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[100.657664] process insmod (PID: 1586, Ti = f1b9e000 task = f137b340 task. Ti = f1b9e000)
[100.706083] Stack:
[100.731783] f1b9ff88 c0101131 f82cf040 c076d240 fffffffc f82cf040 0072cff4 f82d2000
[1, 100.759324] <0> fffffc f82cf040 0072cff4 f1b9ffac c0182340 f19638f8 f137b340 f19638c0
[100.811396] <0> 00000004 09cc9018 09cc9018 00020000 f1b9e000 c01033ec 09cc9018 00015324
[2, 100.891922] Call trace:
[1, 100.916257] [<c0101131>]? Do_one_initcall + 0x31/0x190
[2, 100.943670] [<f82d2000>]? Hello_init + 0x0/0x11 [Hello]
[1, 100.970905] [<c0182340>]? Sys_init_module + 0xb0/0x210
[100.995542] [<c01033ec>]? Syscall_call + 0x7/0xb
[101.024087] Code: <C7> 05 00 00 00 01 00 00 5d C3 00 00 00 00 00 00 00 00 00 00 00
[101.079592] EIP: [<f82d2005>] hello_init + 0x5/0x11 [Hello] ss: ESP 0068: f1b9ff5c
[101.134682] Cr2: 0000000000000000
[101.158929] --- [end trace e294b69a66d752cb] ---
Oops first describes the bug, and then points out the location of the Bug, that is, "IP: [<f82d2005>] hello_init + 0x5/0x11 [Hello]".
Here, we need a helper tool, objdump, to help analyze problems. Objdump can be used for disassembly. The command format is as follows:
Objdump-s hello. o
The following is the result of Hello. O disassembly, which is mixed with the C code and intuitive.

 hello.o:     file format elf32-i386     Disassembly of section .init.text:   00000000 <init_module>: #include <linux/kernel.h> #include <linux/module.h>   static int __init hello_init(void){   0:   55                      push   %ebp    int *p = 0;          *p = 1;          return 0; }   1:   31 c0                   xor    %eax,%eax #include <linux/kernel.h> #include <linux/module.h>   static int __init hello_init(void) {   3:   89 e5                   mov    %esp,%ebp    int *p = 0;      *p = 1;   5:   c7 05 00 00 00 00 01    movl   $0x1,0x0   c:   00 00 00           return 0; }   f:   5d                      pop    %ebp  10:   c3                      ret       Disassembly of section .exit.text:  00000000 <cleanup_module>: static void __exit hello_exit(void) {   0:   55                      push   %ebp   1:   89 e5                   mov    %esp,%ebp  3:   e8 fc ff ff ff          call   4 <cleanup_module+0x4>    return; }   8:   5d                      pop    %ebp   9:   c3                      ret

According to the oops prompts, we can clearly see that the assembly code for the error location hello_init + 0x5 is:
1 5: C7 05 00 00 00 01 movl $0x1, 0x0
The purpose of this Code is to store the value 1 to the address 0. This operation is of course invalid.
We can also see that the corresponding C code is:
1 * p = 1;
Bingo! With the help of oops, we quickly solved the problem.
 
Let's go back and check the above oops to see if there is any other useful information left for us in the Linux kernel.
Oops: 0002 [#1]
Here, 0002 indicates the oops error code (write error, occurs in the kernel space), and #1 indicates that this error occurs once.
The oops error code has different definitions based on the cause of the error. For examples in this article, refer to the following definition (if you find that the oops you encounter cannot match the following, it is best to search in the kernel code ):
* Error_code:
* Bit 0 = 0 means no page found, 1 means protection fault
* Bit 1 = 0 means read, 1 means write
* Bit 2 = 0 means kernel, 1 means user-Mode
* Bit 3 = 0 means data, 1 means instruction
Sometimes, oops prints tainted information. This information is used to indicate the reason why the kernel is tainted "). The specific definition is as follows:
1: 'G' if all modules loaded have a GPL or compatible license, 'P' if any proprietary module has been loaded. modules without a module_license or with a module_license that is not recognized by insmod as GPL compatible are assumed to be proprietary.
2: 'F' if any module was force loaded by "insmod-F", ''if all modules were loaded normally.
3: 's' if the oops occurred on an SMP kernel running on hardware that hasn' t been certified as safe to run multiprocessor. currently this occurs only on various athlons that are not SMP capable.
4: 'R' if a module was force unloaded by "rmmod-F", ''if all modules were unloaded normally.
5: 'M' if any processor has reported a machine check exception, ''if no machine check exceptions have occurred.
6: 'B' if a page-release function has found a bad page reference or some unexpected page flags.
7: 'U' if a user or user application specifically requested that the tainted flag be set, 'otherwise.
8: 'D' if the kernel has died recently, I. e. There was an oops or bug.
9: 'A' if the ACPI table has been overridden.
10: 'W' if a warning has previusly been issued by the kernel. (though some warnings may set more specific taint flags .)
11: 'C' if a staging driver has been loaded.
12: 'I' if the kernel is working around und a severe bug in the platform firmware (BiOS or similar ).
Basically, this tainted information is left for Kernel developers. If you encounter oops when using Linux, you can send oops content to kernel developers for debugging, based on the tainted information, the kernel developer can determine the kernel running environment in the kernel panic. If we only debug our own driver, this information will be meaningless.
 
The example in this article is very simple. Oops does not cause downtime after it occurs, so that we can view the complete information from dmesg. However, the system also goes down when oops occurs. At this time, these error messages are too late to be stored in the file. After the power is turned off, you cannot see them again. We can only record it in other ways: hand copy or photograph.
Even worse, if there are too many oops information, the screen on one page is incomplete. How can we view the complete content? The first method is to use the VGA parameter in grub to specify a higher resolution so that more content can be displayed on the screen. Obviously, this method cannot solve too many problems. The second method uses two machines to print the oops information of the debugging machine to the screen of the host machine through the serial port. But now most laptops do not have serial ports, and this solution also has great limitations. The third method is, use the kernel dump tool kdump to dump the memory and CPU register content when oops occurs into a file, and then use GDB to analyze the problem.
 
The problem that may occur during the development of the kernel driver is strange. The debugging methods are also diverse. Oops is a prompt from the Linux kernel and we should make good use of it.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.