Oops of the Linux kernel

Source: Internet
Author: User
Tags goto tainted

Original: http://www.cnblogs.com/wwang/archive/2010/11/14/1876735.html

What is oops? From a linguistic point of view, oops should be a quasi-sound word. When a little accident, or do a more embarrassing thing, you can say "Oops", translated into Chinese words is called "Ah Yo." "Oh, sorry, I'm sorry, I didn't mean to break your cup." Look, that's what oops means.

What is oops in the development of Linux kernels? In fact, it does not have the essential difference with the above explanation, only the main character of the speech becomes Linux. When some of the more deadly problems arise, our Linux kernel will also be sorry to say to us: "Ouch (Oops), sorry, I screwed things up." The Linux kernel prints oops information when the kernel panic occurs, showing us the current register status, stack contents, and the full call trace, which helps us locate the error.

Let's look at an example below. In order to highlight the protagonist--oops of this article, the only function of this example is to create a null pointer reference error.

1234567891011121314151617181920 #include <linux/kernel.h>#include <linux/module.h> staticint__init hello_init(void){    int*p = 0;        *p = 1;     return0;} staticvoid__exit hello_exit(void){    return;}module_init(hello_init);module_exit(hello_exit);MODULE_LICENSE("GPL");

Obviously, the wrong place is the 8th line.

Next, we compile the module and then use INSMOD to insert it into the kernel space, as we expected, oops appears.

[100.243737] bug:unable to handle kernel NULL pointer dereference at (NULL)

[100.244985] IP: [<f82d2005>] hello_init+0x5/0x11 [Hello]

[100.262266] *pde = 00000000

[100.288395] oops:0002 [#1] SMP

[100.305468] Last Sysfs file:/sys/devices/virtual/sound/timer/uevent

[100.325955] Modules linked In:hello (+) Vmblock vsock vmmemctl vmhgfs acpiphp snd_ens1371 gameport Snd_ac97_codec ac97_ Bus Snd_pcm_oss snd_mixer_oss snd_pcm snd_seq_dummy snd_seq_oss snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq snd_ Timer Snd_seq_device ppdev psmouse serio_raw fbcon tileblit font bitblit softcursor snd parport_pc soundcore Snd_page_allo C VMCI i2c_piix4 vga16fb vgastate INTEL_AGP agpgart SHPCHP LP Parport floppy pcnet32 mii mptspi mptscsih mptbase Scsi_tran Sport_spi vmxnet

[100.472178] [100.494931] pid:1586, comm:insmod not tainted (2.6.32-21-generic #32-ubuntu) VMware Virtual Platform

[100.540018] eip:0060:[<f82d2005>] eflags:00010246 cpu:0

[100.562844] EIP is at hello_init+0x5/0x11 [Hello]

[100.584351] eax:00000000 EBX:FFFFFFFC ecx:f82cf040 edx:00000001

[100.609358] esi:f82cf040 edi:00000000 ebp:f1b9ff5c esp:f1b9ff5c

[100.631467] ds:007b es:007b fs:00d8 gs:00e0 ss:0068

[100.657664] Process insmod (pid:1586, ti=f1b9e000 task=f137b340 task.ti=f1b9e000)

[100.706083] Stack:

[100.731783] f1b9ff88 c0101131 f82cf040 c076d240 fffffffc f82cf040 0072cff4 f82d2000

[100.759324] <0> fffffffc f82cf040 0072cff4 f1b9ffac c0182340 f19638f8 f137b340 f19638c0

[100.811396] <0> 00000004 09cc9018 09cc9018 00020000 f1b9e000 c01033ec 09cc9018 00015324

[100.891922] Call Trace:

[100.916257] [<c0101131>]? do_one_initcall+0x31/0x190

[100.943670] [<f82d2000>]? hello_init+0x0/0x11 [Hello]

[100.970905] [<c0182340>]? sys_init_module+0xb0/0x210

[100.995542] [<c01033ec>]? Syscall_call+0x7/0xb

[101.024087] Code: <c7> xx xx 5d C3 00 00 00 00 00 00 00 00 00 00

[101.079592] EIP: [<f82d2005>] hello_init+0x5/0x11 [Hello] ss:esp 0068:f1b9ff5c

[101.134682] cr2:0000000000000000

[101.158929]---[end trace E294B69A66D752CB]---

Oops first describes what kind of bug this is, and then points out where the bug occurred, "IP: [<f82d2005>] hello_init+0x5/0x11 [Hello]".

Here, we need to use an auxiliary tool objdump to help analyze the problem. Objdump can be used to disassemble the command format as follows:

Objdump-s hello.o

The following is the result of hello.o disassembly, and is mixed with C code, very intuitive.

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849 hello.o:     file format elf32-i386Disassembly of section .init.text:00000000 <init_module>:#include <linux/kernel.h>#include <linux/module.h>staticint__init hello_init(void){   0:   55                      push   %ebp    int*p = 0;        *p = 1;        return0;}   1:   31 c0                   xor    %eax,%eax#include <linux/kernel.h>#include <linux/module.h>staticint__init hello_init(void){   3:   89 e5                   mov    %esp,%ebp    int*p = 0;        *p = 1;   5:   c7 05 00 00 00 00 01    movl   $0x1,0x0   c:   00 00 00          return0;}   f:   5d                      pop    %ebp  10:   c3                      ret    Disassembly of section .exit.text: 00000000 <cleanup_module>:staticvoid__exit hello_exit(void){   0:   55                      push   %ebp   1:   89 e5                   mov    %esp,%ebp   3:   e8 fc ff ff ff          call   4 <cleanup_module+0x4>    return;}   8:   5d                      pop    %ebp   9:   c3                      ret   

In contrast to Oops's hints, we can clearly see that the error location of the HELLO_INIT+0X5 assembly code is:

1 5:c7 05 00 00 00 00 01 movl   $0x1,0x0

The function of this code is to put the value 1 to 0 of this address, this operation is of course illegal.

We can also see that the corresponding C code is:

1 *p = 1;

Bingo! With the help of oops, we solved the problem very quickly.

Let's go back and check the oops above to see if the Linux kernel has left us any other useful information.

oops:0002 [#1]

In this case, 0002 represents the Oops error code (write error, which occurs in kernel space), #1表示这个错误发生一次.

Oops's error code will be different depending on the cause of the error, the examples in this article can refer to the following definition (if you find that you encounter the oops and below can not correspond, it is best to go to the kernel code to find):

* Error_code:
* Bit 0 = = 0 means no page found, 1 means protection fault
* bit 1 = = 0 means read, 1 means write
* Bit 2 = = 0 means kernel, 1 means User-mode
* Bit 3 = = 0 means data, 1 means instruction

Sometimes, oops also prints out tainted information. This information is used to indicate what causes the kernel to be tainted (literally "defiled"). The specific definitions are as follows:

1: ' G ' if all modules loaded had a GPL or compatible license, ' P ' if any proprietary module had been loaded. Modules without a module_license or with a module_license that's not recognised by Insmod as GPL compatible be assumed t o be proprietary.
2: ' F ' If any module is force loaded by "insmod-f", "if all modules were loaded normally.
3: ' S ' If the oops occurred on a SMP kernel running on hardware this hasn ' t been certified as safe to run multiprocessor. Currently this occurs only on various athlons that is not SMP capable.
4: ' R ' If a module is force unloaded by "Rmmod-f", "If all modules were unloaded normally.
5: ' M ' If any processor had reported a machine check Exception, "If no machine check Exceptions had occurred.
6: ' B ' If a page-release function has found a bad page reference or some unexpected page flags.
7: ' U ' If a user or user application specifically requested that the tainted flag is set, ' otherwise.
8: ' D ' If the kernel has died recently, i.e. there is an OOPS or BUG.
9: ' A ' If the ACPI table has been overridden.
Ten: ' W ' If a warning have previously been issued by the kernel. (Though Some warnings may set more specific taint flags.)
One: ' C ' If a staging driver has been loaded.
: ' I ' If the kernel is working around a severe bug in the platform firmware (BIOS or similar).

Basically, this tainted message is left to the kernel developers to see. Users in the process of using Linux, if encountered oops, you can send oops content to the kernel developers to debug, kernel developers based on this tainted information can probably determine the kernel panic when the kernel running environment. If we just debug our own drive, this message doesn't make sense.

This example of this article is very simple, oops does not cause downtime, so we can view the complete information from the DMESG. But more of the situation is oops occurs at the same time the system will also go down, at this time these error messages are too late to deposit files, power off after you can no longer see. We can only record them in other ways: by copying or taking pictures.

There are worse cases, if the oops information is too much, one page screen is not full, how can we look at the full content? The first method, in grub, uses the VGA parameter to specify a higher resolution so that the screen can display more content. Obviously, this method can not solve too many problems, the second method, using two machines, the debug machine oops information through the serial port to print to the host screen. But now most laptops do not have a serial port, this solution also has a lot of limitations; The third method uses the kernel Dump tool kdump to dump the contents of memory and CPU registers in a file when oops occurs, and then we use GDB to analyze the problem.

The development of kernel-driven process may encounter problems are strange, debugging methods are also various, oops is the Linux kernel to give us the hint, we need to use it well.

==========================================================================

Kernel Source Learning: cross-border access
The mapping failed because of a cross-border access to an address that is not valid, resulting in a page error exception, that is, missing pages.
The main body of the page Exception service program is Do_page_fault (), which will handle the following:
First, check whether the address exceeds the process virtual storage space
Second, check that the exception is not relevant to the current process
Third, check whether the address falls in the system space
Check if this address falls on an already established mapping interval.
Check if the address is above the hole in the stack interval
* Six, check if the address is next to the point where the stack pointer is pointing
VII, send a mandatory signal sigsegv to the process,

Each time the kernel checks the current process for a signal to be processed before returning from an interrupt/exception, the nature of these pending signals and
The choice of process itself decides what to do. The process for SIGSEGV is to display the "Segment Fault" prompt on the display of the process and then end the process.

Arch/i386/mm/fault.c

96/*
* This routine handles page faults. It determines the address,
98 * and the problem, and then passes it off to one of the appropriate
* routines.
100 *
101 * Error_code:
102 * Bit 0 = = 0 means no page found, 1 means protection fault
103 * bit 1 = = 0 means read, 1 means write
104 * Bit 2 = = 0 means kernel, 1 means User-mode
105 */
106asmlinkage void Do_page_fault (struct pt_regs *regs, unsigned long error_code)
Regs points to a copy of the contents of each register in the CPU on the eve of the exception, which is the "live" of the kernel interrupt response mechanism.
Error_code indicates the specific cause of the mapping failure
107{
108 struct task_struct *tsk;
109 struct mm_struct *mm;
vm_area_struct struct * VMA;
111 unsigned long address;
unsigned long page;
113 unsigned long fixup;
int write;
siginfo_t info;
116
117/* Get the address */
118 __asm__ ("Movl%%cr2,%0": "=r" (address));
When I386CPU produces a page fault exception, the CPU causes the linear address of the mapping failure to be placed in the control register CR2
119
Tsk = current;
121
122/*
123 * We fault-in kernel-space virtual memory on-demand. The
124 * ' Reference ' page table is INIT_MM.PGD.
125 *
126 * note! We must not take any locks for the this case. We May
127 * is in an interrupt or a critical region, and should
* Only copy the information from the Master page table,
129 * Nothing more.
130 */
131 if (address >= task_size)
Check if the address exceeds the process virtual storage space
Vmalloc_fault Goto;
133
134 mm = tsk->mm;
135 Info.si_code = Segv_maperr;
136
137/*
138 * If we ' re in an interrupt or has no user
139 * Context, we must not take the fault.
140 */
141 if (in_interrupt () | |!mm)
In_interrupt () returns non 0, indicating that the failure of the mapping occurred in an interrupt service program
The MM pointer is empty, meaning that the mapping for the process has not yet been established
142 goto No_context;
143
144 Down (&AMP;MM-&GT;MMAP_SEM);
145
146 VMA = FIND_VMA (mm, address);
FIND_VMA () tries to find the first interval in a user's virtual storage space that the ending address is greater than the given address
147 if (!VMA)
If not, it means that the address is above the stack, which is more than 3G bytes, which belongs to the system space
148 goto Bad_area;
149 if (Vma->vm_start <= address)
Check to see if this address falls on an already mapped interval.
Good_area Goto;
151 if (! ( Vma->vm_flags & Vm_growsdown))
Check if the interval above the hole is a stack interval with a vm_growsdown flag of 1
Goto Bad_area;
153 if (Error_code & 4) {
154/*
155 * Accessing the stack below%esp is always a bug.
156 * the "+ +" is there due to some instructions (like
157 * Pusha) doing post-decrement on the stack and that
158 * doesn ' t show up until later.
159 */
if (address + < REGS-&GT;ESP)
161 Goto Bad_area;
162}
163 if (Expand_stack (VMA, address))
Extend the user stack area and check that the extension is successful
164 goto Bad_area;


220/*
221 * Something tried to access memory that isn ' t in our memory map.
222 * Fix it, but check if it ' s kernel or user first.
223 */
224bad_area:
225 up (&mm->mmap_sem);
226
227bad_area_nosemaphore:
228/* User mode accesses just cause a SIGSEGV *
229 if (Error_code & 4) {
TSK-&GT;THREAD.CR2 = address;
231 Tsk->thread.error_code = Error_code;
232 tsk->thread.trap_no = 14;
233 Info.si_signo = SIGSEGV;
234 Info.si_errno = 0;
235/* Info.si_code has been set above */
236 info.si_addr = (void *) address;
237 Force_sig_info (SIGSEGV, &info, tsk);
238 return;
239}

(go) oops of the Linux kernel

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.