Linux Kernel Analysis--18th chapter debugging

Last Update:2016-04-02 Source: Internet

Author: User

Tags repetition syslog system log

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

18th Chapter Commissioning

18.1 ready to start

1, in the user-level program, the bug performance is more straightforward, but not clear in the kernel.

2, kernel-level development of debugging work far more difficult than user-level development.

3. The preparatory work needs to be:

　　(1) A bug

(2) A kernel version that hides bugs

(3) Knowledge and luck related to kernel code

18.2 bugs in the kernel

1. There are various bugs in the kernel.

2. Referencing a null pointer produces a oops, and garbage data can cause the system to crash.

3. Both the timing limit and the race condition allow multiple threads to run concurrently in the kernel to produce results.

18.3 Debug by printing

First, the robustness of

1. Robustness--it can be called at any time, anywhere.

2. PRINTK () in the kernel is ubiquitous:

(1) Called in the context of the interrupt and the context of the process

(2) is called when any lock is held

(3) is called at the same time on multiple processors and does not have to use locks.

3, unless at the beginning of the start-up process will be output on the terminal, otherwise you can think PRINTK () under what circumstances can work.

Second, the log level

1, PRINTK () and printf () in the use of the main difference is that the former can specify a log level, the kernel based on this level to determine whether to print messages on the terminal. The kernel displays all messages with a lower level than a specific value on the terminal.

2, kern_waring and KERN_DEGUG are simple macro definitions in <linux/kernel.h>. The kernel uses the specified record level and the current terminal's record level console_loglevel to determine whether to print to the terminal.

3, the default level: Kern_warning.

4, the kernel set the most important kernel level Kern_emerg "<0>", the insignificant record level kern_debug as "<7>".

5, there are two ways to give record rank:

(1) Keep the default record level of the terminal unchanged, give all debug information Kern_crit or lower level.

(2) to kern_debug all debug information level, adjust the terminal's default record level.

Third, record buffer

1. Kernel messages are stored in a log_buf_len-sized ring queue. On a single-processor system its default value is 16KB.

2, the advantage: in the interrupt context can also be convenient to use.

Make records easier to maintain.

Disadvantage: Messages may be lost.

Iv. Syslogd and KLOGD

1. The daemon process of the user space klogd the kernel messages from the record buffer and saves them in the system log file through the syslogd daemon.

2. The KLOGD program can read these messages either from the/proc/kmsg file or through a syslog () system call. The default is/proc mode. In both cases, the KLOGD will block until a new kernel message is available for reading, and the default processing after wake-up is to pass the message to the SYSLOGD daemon. You can change the record level of the terminal by using the-C flag .

3. The syslogd daemon adds all the messages it receives to a file, which is/var/log/messages by default. It can also be re-specified through the/etc/syslog.conf configuration file.

V. Conversion from printf () to PRINTK ()

1. Recurring mistakes will soon make you start developing new habits.

18.4 oops

1. Oops is the most common way that the kernel informs the user of the unfortunate occurrence.

2, the kernel is difficult to self-repair, can not kill themselves, can only release oops, the process includes:

(1) Output error message to Terminal

(2) information stored in the output register

(3) Output traceable clues for traceability

3, usually after sending oops, the kernel will be in an unstable state.

4. When oops occurs:

(1) Occurs when the context is interrupted: The kernel cannot continue, will fall into chaos, causing the system to panic

(2) Occurs during the idle process (PID 0) or the init process (PID 1), and the result is the system is in chaos.

(3) Occurs when another process is running, and the kernel kills the process and attempts to continue execution.

5. There are many possible causes for oops: memory access is out of bounds, illegal instructions, etc.

6. Important information contained in OOPS: register context and backtracking thread

(1) Backtracking clue: shows the chain of function calls that caused the error to occur.

(2) Register context information is also useful, such as helping to rush into the scene that caused the problem

First, Ksymoops

1. The address in the backtracking thread needs to be translated into meaningful symbolic names for ease of use, which calls the Ksymoops command and also provides the system.map generated when the kernel is compiled. If you are using a module, you also need some module information. It is generally possible to call it this way: Ksymoops saved_oops.txt.

Second, kallsyms

1. config_kallsyms defines the configuration option enabled.

2. Config_kallsyms_all not only stores the function name, but also holds all symbol names.

3, Config_kallsyms_extra_pass will cause kernel build process to ignore the kernel's target code again.

18.5 Kernel Debug Configuration options

1. In Kernel development menu item of kernel Configuration Editor, rely on Config_debug_kernel.

2, the use of the lock when sleep is the culprit causing the deadlock.

18.6 raise a bug and print information

1, Common bug () and bug_on ().

2, the call will throw oops, resulting in stack backtracking and error message printing. These calls can be used as assertions, and you want to assert that a situation should not occur:

if (bad_thing)

BUG ();

Or:

BUG_ON (bad_thing);

3. Calling panic () will not only print the error message, but will also suspend the entire system.

4. Dump_stack () prints the trace thread of the register context and function only on the terminal.

18.7 the Magic System request key

1. This feature can be enabled by defining the CONFIG_MAGIC_SYSRQ configuration option. The SYSRQ (System request) key is the standard key on most keyboards.

2, when the function is enabled, regardless of the state of the kernel, you can communicate through the special key combination and the kernel.

3, in addition to the configuration options, but also through a sysctl to mark the opening or closing of the feature, the Start command is as follows: Echo 1 >/proc/sys/kernel/sysrq

4, SysRq of several commands:

5, in a single line to send the combination of these three keys can restart the dying system, which is more secure than pressing the reset key of the machine directly.

18.8 The legend of the kernel debugger

First, GDB

1. The running kernel can be viewed using the standard GNU debugger. The way to boot the debugger against the kernel is roughly the same as the process-oriented approach: GDB Vmlinux/proc/kcore

2. The Vmlinx file is an uncompressed kernel image, not a compressed zimage or bimage, which is stored in the root directory of the source code tree.

3,/proc/kcore as a parameter option, is used as a core file, through which can access to the kernel resides in the high-end memory. Only Superuser can read data from this file

4. You can use all of GDB's commands to get information. For example: Print the value of a variable: p global_variable

Disassembly a function: Disassemble functions

The-G parameter can also provide more information.

5. Limitations:

(1) No way to modify kernel data

(2) cannot step into kernel code, cannot add breakpoint.

Second, kgdb

1, is a patch, you can let us on the remote host through the serial port using all of GDB's functions to debug the kernel.

2. Requires two computers: deportment runs the kernel with kgdb patches, and the second child uses GDB to debug the first one via a serial line.

3. All functions of kgdb,gdb can be used: reading and modifying variable values, setting breakpoints, setting interest variables, stepping through, etc.

18.9 Detection System

First, use the UID as the choice condition

1, under normal circumstances, when adding the characteristics, as long as the original algorithm to retain the new algorithm to add to other locations, the basic can ensure security.

2, the user ID (UID) can be used as a selection criteria to achieve this function:

A selection condition is used to schedule exactly which algorithm to execute.
For example:

if (current-> uid!=7777) {

/* Old algorithm ... * *

}else {

/* NEW algorithm ... */

}

That is, in addition to uid=7777 users, all other users are using the old algorithm, so this 7777 users can be dedicated to testing the new algorithm.

Ii. Use of condition variables

1. You can use conditional variables if the code is not process-independent, or if you want a mechanism that can be used for all situations to control an attribute.

2, this method is simpler than using UID, only need to create a global variable as a conditional selector switch. If the variable is 0, use the code on one of the branches; otherwise, select another branch.

3, control mode: Some kind of interface, or debugger.

Iii. Use of statistics

1, need to grasp the occurrence of a particular event of the law, need to compare multiple events and derive the law from the creation of statistics and provide some mechanism to access their statistical results.

2. Define two global variables:

unsigned long foo_stat = 0;

unsigned long bar_stat = 0;

Whenever an event occurs, add 1 to the corresponding variable. Then output it where it feels right. You can create a file in the/proc directory and a new system call. The simplest approach is to access them directly through the debugger.

3. Note that this implementation is not SMP-safe, and a better approach is to implement it through atomic operations.

Four, repetition frequency limit

1, when the system debugging information too many times, there are two ways to prevent such problems occur:

(1) Repetition frequency limit

(2) Number of occurrences limit

2, in order to avoid the debug information blowout, you can perform operations every few seconds.

18.10 using dichotomy to find the change that caused the crime

1, it is often useful to know when a bug is introduced into the kernel source code.

18.11 use Git to search for two points

1, the specific submission of the code caused a bug, you can use Git binary search.

$ git bisect start//Tell Git to do a binary search

$ git bisect bad <revision>//The earliest kernel version known to be problematic

$ git bisect bad//The current version is the original version of the bug that was raised with this command

$ git bisect good v2.6.28//latest Operational kernel version

Next, Git will use the binary search method in the Linux source tree, automatically detect the normal version of the kernel and the bug kernel version between the version of the hidden Trouble, and then compile, run, and test the version being detected.

If this version works: $ git bisect good

If this version runs with an exception: $ git bisect bad

2. For each command, GIT will search the source tree repeatedly on the basis of each version, and return to the next kernel version, until it can no longer search the location, eventually git will print out the problematic version number.

18.12 When all efforts fail: the Community

1. You should send an e-mail to the kernel mailing list for a complete and concise description of the bug.

2. The community and its most important forum--linux the kernel mailing list (LKML).

Summarize:

This chapter discusses kernel debugging-the debugging process is actually a way of seeking to achieve deviation from the target behavior. It examines several techniques: from the kernel's built-in debug architecture to program debugging, from logging to git dichotomy.

The content of this chapter is critical to anyone trying to kind in kernel code.

Linux Kernel Analysis--18th chapter debugging

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More