What is oops? From a linguistic point of view, oops should be a quasi-sound word. When a little accident, or do a more embarrassing thing, you can say "Oops", translated into Chinese words is called "Ah Yo." "Oh, sorry, I'm sorry, I didn't mean to break your cup." Look, that's what oops means.
What is oops in the development of Linux kernels? In fact, it does not have the essential difference with the above explanation, only the main character of the speech becomes Linux. When some of the more deadly problems arise, our Linux kernel will also be sorry to say to us: "Ouch (Oops), sorry, I screwed things up." The Linux kernel prints oops information when the kernel panic occurs, showing us the current register status, stack contents, and the full call trace, which helps us locate the error.
Let's look at an example below. In order to highlight the protagonist--oops of this article, the only function of this example is to create a null pointer reference error.
1234567891011121314151617181920 |
#include <linux/kernel.h>
#include <linux/module.h>
static
int
__init hello_init(
void
)
{
int
*p = 0;
*p = 1;
return
0;
}
static
void
__exit hello_exit(
void
)
{
return
;
}
module_init(hello_init);
module_exit(hello_exit);
MODULE_LICENSE(
"GPL"
);
|
Obviously, the wrong place is the 8th line.
Next, we compile the module and then use INSMOD to insert it into the kernel space, as we expected, oops appears.
[100.243737] bug:unable to handle kernel NULL pointer dereference at (NULL)
[100.244985] IP: [<f82d2005>] hello_init+0x5/0x11 [Hello]
[100.262266] *pde = 00000000
[100.288395] oops:0002 [#1] SMP
[100.305468] Last Sysfs file:/sys/devices/virtual/sound/timer/uevent
[100.325955] Modules linked In:hello (+) Vmblock vsock vmmemctl vmhgfs acpiphp snd_ens1371 gameport Snd_ac97_codec ac97_ Bus Snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_ Timer Snd_seq_device ppdev psmouse serio_raw fbcon tileblit font bitblit softcursor snd parport_pc soundcore Snd_page_allo C VMCI i2c_piix4 vga16fb vgastate INTEL_AGP agpgart SHPCHP LP Parport floppy pcnet32 mii mptspi mptscsih mptbase Scsi_tran Sport_spi vmxnet
[100.472178] [100.494931] pid:1586, comm:insmod not tainted (2.6.32-21-generic #32-ubuntu) VMware Virtual Platform
[100.540018] eip:0060:[<f82d2005>] eflags:00010246 cpu:0
[100.562844] EIP is at hello_init+0x5/0x11 [Hello]
[100.584351] eax:00000000 EBX:FFFFFFFC ecx:f82cf040 edx:00000001
[100.609358] esi:f82cf040 edi:00000000 ebp:f1b9ff5c esp:f1b9ff5c
[100.631467] ds:007b es:007b fs:00d8 gs:00e0 ss:0068
[100.657664] Process insmod (pid:1586, ti=f1b9e000 task=f137b340 task.ti=f1b9e000)
[100.706083] Stack:
[100.731783] f1b9ff88 c0101131 f82cf040 c076d240 fffffffc f82cf040 0072cff4 f82d2000
[100.759324] <0> fffffffc f82cf040 0072cff4 f1b9ffac c0182340 f19638f8 f137b340 f19638c0
[100.811396] <0> 00000004 09cc9018 09cc9018 00020000 f1b9e000 c01033ec 09cc9018 00015324
[100.891922] Call Trace:
[100.916257] [<c0101131>]? do_one_initcall+0x31/0x190
[100.943670] [<f82d2000>]? hello_init+0x0/0x11 [Hello]
[100.970905] [<c0182340>]? sys_init_module+0xb0/0x210
[100.995542] [<c01033ec>]? Syscall_call+0x7/0xb
[101.024087] Code: <c7> xx xx 5d C3 00 00 00 00 00 00 00 00 00 00
[101.079592] EIP: [<f82d2005>] hello_init+0x5/0x11 [Hello] ss:esp 0068:f1b9ff5c
[101.134682] cr2:0000000000000000
[101.158929]---[end trace E294B69A66D752CB]---
Oops first describes what kind of bug this is, and then points out where the bug occurred, "IP: [<f82d2005>] hello_init+0x5/0x11 [Hello]".
Here, we need to use an auxiliary tool objdump to help analyze the problem. Objdump can be used to disassemble the command format as follows:
Objdump-s hello.o
The following is the result of hello.o disassembly, and is mixed with C code, very intuitive.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849 |
hello.o: file format elf32-i386
Disassembly of section .init.text:
00000000 <init_module>:
#include <linux/kernel.h>
#include <linux/module.h>
static
int
__init hello_init(
void
)
{
0: 55 push %ebp
int
*p = 0;
*p = 1;
return
0;
}
1: 31 c0 xor %eax,%eax
#include <linux/kernel.h>
#include <linux/module.h>
static
int
__init hello_init(
void
)
{
3: 89 e5 mov %esp,%ebp
int
*p = 0;
*p = 1;
5: c7 05 00 00 00 00 01 movl $0x1,0x0
c: 00 00 00
return
0;
}
f: 5d pop %ebp
10: c3 ret
Disassembly of section .
exit
.text: 00000000 <cleanup_module>:
static
void
__exit hello_exit(
void
)
{
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: e8 fc ff ff ff call 4 <cleanup_module+0x4>
return
;
}
8: 5d pop %ebp
9: c3 ret
|
In contrast to Oops's hints, we can clearly see that the error location of the HELLO_INIT+0X5 assembly code is:
1 |
5:c7 05 00 00 00 00 01 movl $0x1,0x0 |
The function of this code is to put the value 1 to 0 of this address, this operation is of course illegal.
We can also see that the corresponding C code is:
Bingo! With the help of oops, we solved the problem very quickly.
Let's go back and check the oops above to see if the Linux kernel has left us any other useful information.
oops:0002 [#1]
In this case, 0002 represents the Oops error code (write error, which occurs in kernel space), #1表示这个错误发生一次.
Oops's error code will be different depending on the cause of the error, the examples in this article can refer to the following definition (if you find that you encounter the oops and below can not correspond, it is best to go to the kernel code to find):
* Error_code:
* Bit 0 = = 0 means no page found, 1 means protection fault
* bit 1 = = 0 means read, 1 means write
* Bit 2 = = 0 means kernel, 1 means User-mode
* Bit 3 = = 0 means data, 1 means instruction
Sometimes, oops also prints out tainted information. This information is used to indicate what causes the kernel to be tainted (literally "defiled"). The specific definitions are as follows:
1: ' G ' if all modules loaded had a GPL or compatible license, ' P ' if any proprietary module had been loaded. Modules without a module_license or with a module_license that's not recognised by Insmod as GPL compatible be assumed t o be proprietary.
2: ' F ' If any module is force loaded by "insmod-f", "if all modules were loaded normally.
3: ' S ' If the oops occurred on a SMP kernel running on hardware this hasn ' t been certified as safe to run multiprocessor. Currently this occurs only on various athlons that is not SMP capable.
4: ' R ' If a module is force unloaded by "Rmmod-f", "If all modules were unloaded normally.
5: ' M ' If any processor had reported a machine check Exception, "If no machine check Exceptions had occurred.
6: ' B ' If a page-release function has found a bad page reference or some unexpected page flags.
7: ' U ' If a user or user application specifically requested that the tainted flag is set, ' otherwise.
8: ' D ' If the kernel has died recently, i.e. there is an OOPS or BUG.
9: ' A ' If the ACPI table has been overridden.
Ten: ' W ' If a warning have previously been issued by the kernel. (Though Some warnings may set more specific taint flags.)
One: ' C ' If a staging driver has been loaded.
: ' I ' If the kernel is working around a severe bug in the platform firmware (BIOS or similar).
Basically, this tainted message is left to the kernel developers to see. Users in the process of using Linux, if encountered oops, you can send oops content to the kernel developers to debug, kernel developers based on this tainted information can probably determine the kernel panic when the kernel running environment. If we just debug our own drive, this message doesn't make sense.
This example of this article is very simple, oops does not cause downtime, so we can view the complete information from the DMESG. But more of the situation is oops occurs at the same time the system will also go down, at this time these error messages are too late to deposit files, power off after you can no longer see. We can only record them in other ways: by copying or taking pictures.
There are worse cases, if the oops information is too much, one page screen is not full, how can we look at the full content? The first method, in grub, uses the VGA parameter to specify a higher resolution so that the screen can display more content. Obviously, this method can not solve too many problems, the second method, using two machines, the debug machine oops information through the serial port to print to the host screen. But now most laptops do not have a serial port, this solution also has a lot of limitations; The third method uses the kernel Dump tool kdump to dump the contents of memory and CPU registers in a file when oops occurs, and then we use GDB to analyze the problem.
The development of kernel-driven process may encounter problems are strange, debugging methods are also various, oops is the Linux kernel to give us the hint, we need to use it well.
#################################
Amendment:
2010-11-19:oops's error code
Transferred from: http://www.cnblogs.com/wwang/archive/2010/11/14/1876735.html
Oops of the Linux kernel