Most bugs describe null pointers or use other incorrect pointer values to express themselves. Such Bugs usually output a oops message.
Any address used by the processor is almost a virtual address, which is mapped to a physical address through a complex page table structure (except the physical address used by the memory management subsystem ). when an Invalid Pointer is referenced, the paging mechanism cannot map the pointer to a physical address, and the processor sends a page error to the operating system. if the address is invalid, the kernel cannot "page in" the missing address; it (often) generates an oops which occurs when the processor is in management mode.
An oops shows the processor status when an error occurs, including the CPU register content and other seemingly incomprehensible information. the message is generated by the error-processed printk Statement (ARCH/*/kernel/traps. c) and as described in the previous section "printk.
Let's look at this message. This is the result from a null pointer on the PC running the 2.6 kernel. The most relevant information here is the command pointer (EIP), the address of the wrong command.
Unable to handle kernel NULL pointer dereference at virtual address 00000000 printing eip: d083a064 Oops: 0002 [#1] SMP CPU: 0 EIP: 0060:[<d083a064>] Not tainted EFLAGS: 00010246 (2.6.6) EIP is at faulty_write+0x4/0x10 [faulty] eax: 00000000 ebx: 00000000 ecx: 00000000 edx: 00000000 esi: cf8b2460 edi: cf8b2480 ebp: 00000005 esp: c31c5f74 ds: 007b es: 007b ss: 0068 Process bash (pid: 2086, threadinfo=c31c4000 task=cfa0a6c0) Stack: c0150558 cf8b2460 080e9408 00000005 cf8b2480 00000000 cf8b2460 cf8b2460 fffffff7 080e9408 c31c4000 c0150682 cf8b2460 080e9408 00000005 cf8b2480 00000000 00000001 00000005 c0103f8f 00000001 080e9408 00000005 00000005 Call Trace: [<c0150558>] vfs_write+0xb8/0x130 [<c0150682>] sys_write+0x42/0x70 [<c0103f8f>] syscall_call+0x7/0xbCode: 89 15 00 00 00 00 c3 90 8d 74 26 00 83 ec 0c b8 00 a6 83 d0
Write a message generated by a device owned by the bad module. The implementation of the write method of faulty. C, which is intentionally used to demonstrate the failure, is fine-grained:
ssize_t faulty_write (struct file *filp, const char __user *buf, size_t count, loff_t *pos){ /* make a simple fault by dereferencing a NULL pointer */ *(int *)0 = 0; return 0;}
As you can see, what we are doing here is to reference a null pointer. because 0 is always an Invalid Pointer value, an error occurs, from the kernel to the previously displayed oops message. the called process is then killed.
The error module has different error conditions in its read implementation:
ssize_t faulty_read(struct file *filp, char __user *buf, size_t count, loff_t *pos){ int ret; char stack_buf[4]; /* Let's try a buffer overflow */ memset(stack_buf, 0xff, 20); if (count > 4) count = 4; /* copy 4 bytes to the user */ ret = copy_to_user(buf, stack_buf, count); if (!ret) return count; return ret;}
This method copies a string to a local variable. Unfortunately, the string is longer than the destination array. when the function is returned, the cache overflow causes an oops. it is difficult to trace such errors because the returned commands make the instruction pointer unknown, and you get the following:
EIP: 0010:[<00000000>]Unable to handle kernel paging request at virtual address ffffffff printing eip: ffffffff Oops: 0000 [#5] SMP CPU: 0 EIP: 0060:[<ffffffff>] Not tainted EFLAGS: 00010296 (2.6.6) EIP is at 0xffffffff eax: 0000000c ebx: ffffffff ecx: 00000000 edx: bfffda7c esi: cf434f00 edi: ffffffff ebp: 00002000 esp: c27fff78 ds: 007b es: 007b ss: 0068 Process head (pid: 2331, threadinfo=c27fe000 task=c3226150) Stack: ffffffff bfffda70 00002000 cf434f20 00000001 00000286 cf434f00 fffffff7 bfffda70 c27fe000 c0150612 cf434f00 bfffda70 00002000 cf434f20 00000000 00000003 00002000 c0103f8f 00000003 bfffda70 00002000 00002000 bfffda70 Call Trace: [<c0150612>] sys_read+0x42/0x70 [<c0103f8f>] syscall_call+0x7/0xb Code: Bad EIP value.
In this case, we only see part of the call stack (vfs_read and faulty_read are lost), and the kernel complains about a "bad EIP value ". both this complaint and the wrong address (ffffffff) listed at the beginning indicate that the kernel stack has been destroyed.
Generally, when you are dealing with a oops, the first thing is to view the location where the problem occurs, which is often listed separately from the call stack. The first oops shown above, the related rows are:
EIP is at faulty_write+0x4/0x10 [faulty]
Here we can see that the faulty_write function is located in the faulty module (listed in square brackets ). the hexadecimal number indicates that the instruction pointer is 4 bytes in the function. The function seems to be 10 (hexadecimal) bytes long. this is often enough to know what the problem is.
If you need more information, call the stack to show you how to know where the bad thing is. the stack itself is printed in the form of a 16 mechanism. to do some work, you can often decide the value and function parameters of local variables from the stack list. experienced kernel developers can benefit from some pattern recognition here. For example, if you look at the stack list from faulty_read Oops:
Stack: ffffffff bfffda70 00002000 cf434f20 00000001 00000286 cf434f00 fffffff7 bfffda70 c27fe000 c0150612 cf434f00 bfffda70 00002000 cf434f20 00000000 00000003 00002000 c0103f8f 00000003 bfffda70 00002000 00002000 bfffda70
The ffffffff at the top of the stack is part of our bad string. in x86 systems, the user space stack starts at 0xc0000000 by default. Therefore, the cyclic value 0xbfffda70 may be a user stack address. In fact, it is the cache address passed to the read System for calling, it is copied every time it is passed down the system call chain. on x86 (again, by default), the kernel space starts at 0xc0000000, so the value above this is almost certainly the address of the kernel space, and so on.
Finally, when you look at an oops list, we will always monitor the "Slab poisoning" value discussed in this chapter. for example, if you get a kernel oops with the incorrect address 0xa5a5a5a5a5 in it, you are almost sure-the dynamic memory is initializing somewhere.
Note that the call stack of the symbol is displayed only when the config_kallsyms option is enabled in your kernel. otherwise, you will see a bare, 16 mechanism list, which is far useless unless you decode it in other ways.
4.5.2. System Suspension
Although most bugs in kernel code end with oops messages, sometimes they may completely suspend the system. if the system is suspended, no message is printed. for example, if the Code enters an infinite loop and the kernel stops scheduling, [15] and the system does not respond to any action, including the magic Ctrl-alt-Del key combination. you have two options to handle system suspension-or block them beforehand, or debug them afterwards.
You can prevent infinite loops by inserting Schedule references on a strategic point. schedule calls the scheduler (as you may have guessed). Therefore, other processes are allowed to steal CPU data from the current process. if a process loops in the kernel space due to your drive bug, schedule calls enable you to kill the process after tracking what happened.
You should know, of course, how to call schedule may create an additional re-import call source to your driver because it allows other processes to run. this re-entry should not be a problem normally. Suppose you have used a proper lock in your drive. however, make sure that the schedule cannot be called at any time when your driver holds a spin lock.
If your driver really hangs the system and you do not know where to insert schedule calls, the best way is to add some print messages and write them to the console (if necessary, change the console_loglevel value ).
Sometimes the system may appear to be suspended, but not. for example, this may happen when the keyboard is locked in a strange way. these fake hangs can be detected by viewing the output of the program you are running for this purpose. A clock or system load table on your monitor is a good status monitor; as long as it continues to update, the scheduler is working.
An essential tool for many locks is the "magic sysrq key", which is available in most systems. the magic key sysrq is issued by combining ALT and sysrq on the PC keyboard, or by using other special keys on other platforms (for details, see documentation/sysrq.txt). It is also available on the serial port console. A third key, pressed together with the two, performs one of the many useful actions:
R closes the original mode of the keyboard. Using a crashed application (such as the X server) may make your keyboard into a strange state.
K calls the "Security note key" (SAK) function. SAK kills all processes running on the current console and provides you with a clean terminal.
S for an emergency synchronization of all disks.
Uumount. Try to reload all disks in read-only mode. This operation is often called immediately after S, which can save a lot of File System check time when the system is in serious trouble.
B boot. restart the system immediately. Confirm to synchronize and reload the disk first.
P prints processor messages.
T print the current task list.
M prints memory information.
Other magic sysrq functions exist. For the complete content, see sysrq.txt in the documentation directory of the kernel source code. note that the magic sysrq must be explicitly enabled in the Kernel configuration. Most releases do not enable it because of obvious security reasons. for systems used to develop drivers, however, enabling the magic sysrq is worth the trouble of creating a new kernel for itself. the magic sysrq may be disabled at runtime. Run the following command:
Echo 0>/proc/sys/kernel/sysrq
If a non-privileged user can access your system keyboard, you should consider disabling it to prevent intentional or unintentional damages. some earlier kernel versions disable sysrq by default, so you need to enable it at runtime by writing 1 to the same/proc/sys file.
Sysrq operations are very useful, so they are available to system administrators who cannot access the console. file/proc/sysrq-trigger is a write-only entry point. Here you can trigger a special sysrq action by writing associated command characters; then you can collect any output data of kernel logs. this sysrq entry point is always working, even if sysrq is disabled on the console.
If you experience a "active mounting", your drive is stuck in a loop, but the system as a whole functions normally, there are several technologies worth understanding. often, the sysrq P function directly points to the function with an error. if this does not work, you can also use the kernel profiling function. create a kernel to open the profiling and start it with profile = 2 in the command line. use the readprofile tool to reset the profiling counter and then bring your driver into its loop. after a while, use readprofile to check the time consumed by the kernel. another more advanced option is oprofile. You can also consider it. file documentation/basic_profiling.txt tells you everything you need to know when you start the parser.
When chasing a system suspension, a valid precaution is to attach your disks (or detach them) in read-only mode ). if the disk is read-only or detached, there is no risk of damage to the file system or make it inconsistent. another possibility is to use a computer that uses NFS and network file systems to load all of its file systems. The kernel's "nfs-root" function must be enabled, special parameters must be passed at startup. in this case, even if you do not rely on sysrq, you will avoid File System damages. Because file systems are managed by NFS servers, your device driver will not close it.
From http://oss.org.cn/kernel-book/ldd3/ch04s05.html