A summary of the Linux kernel debugging methods __linux

Source: Internet
Author: User
Tags syslog system log dmesg

One of the more difficult aspects of kernel development than user space development is the difficulty of kernel debugging. Kernel errors often cause the system to go down and it is difficult to keep the scene when the error occurs. The key to debugging the kernel is your deep understanding of the kernel.

Preparation before commissioning

Before debugging a bug, the preparation we have to do is:

There is a confirmed bug. The kernel version number containing the bug needs to be analyzed in which version of the bug is introduced, which is of great help in solving the problem. You can use the binary lookup method to gradually lock the bug into the version number. The deeper the understanding of the kernel code, the better, and the need for a little bit of luck. The bug can be reproduced. If you can find the recurrence rule, then the reason to find the problem is not far away. Minimize the system. Rule out the factors that could cause bugs.

Bugs in the second kernel

Bugs in the kernel are also varied. They are produced for countless reasons, and appearances are also changeable. From the error hidden in the source code to the bug shown in front of the witness, the attack is often a series of chain-reaction events that may start. Although the kernel debugging has some difficulties, but through your efforts and understanding, perhaps you will like this challenge.

Three-Kernel Debug configuration options

Learn to write a driver to build your own kernel (the standard mainline kernel). One of the most important reasons is that the kernel developer has established several functions for debugging. However, because these features result in additional output and can degrade, distribution vendors typically prohibit debugging in the release kernel.

1 in order to implement kernel debugging, several additions have been made to the kernel configuration:

Kernel Hacking--->

[*] Magic SYSRQ Key

[*] Kernel debugging

[*] Debug Slab Memory Allocations

[*] Spinlock and Rw-lock Debugging:basic checks

[*] Spinlock Debugging:sleep-inside-spinlock Checking

[*] Compile the kernel with debug info

Device Drivers--->

Generic Driver Options--->

[*] Driver Core verbose Debug messages

General Setup--->

[*] Configure standard kernel features (for small systems)--->

[*] Load all symbols for debugging/ksymoops

Enable options for example:

Slab layer Debugging (Slab layer Debug option)

High-memory Debugging (High-end memory debugging option)

I/O mapping Debugging (I/O mapping debugging Option)

Spin-lock Debugging (Spin lock debug option)

Stack-overflow Checking (Stack overflow check option)

Sleep-inside-spinlock checking (Spin lock sleep option)

2 Debugging atomic Operations

Developed from Kernel 2.5, the kernel provides an excellent tool for examining various issues raised by atomic operations.

The kernel provides an atomic operating counter that can be configured to print warning messages and provide tracking clues once the atom is in the process of entering sleep or doing some sleep-causing operations.

Therefore, all potential bugs can be detected by including the call to schedule () when using the lock and the blocking request for allocating memory when using the lock.

The following options allow you to maximize this feature:

Config_preempt = y

Config_debug_kernel = y

config_kllsyms = y

Config_spinlock_sleep = y

Four raise bugs and print information

1 Some kernel calls can be used to easily mark Bugs, provide assertions and output information. The two most commonly used are bugs () and bug_on ().

Defined in <include/asm-generic>:

#ifndef Have_arch_bug

#define BUG () do {

PRINTK ("Bug:failure at%s:%d/%s ()!", __file__, __line__, __function__);

Panic ("bug!"); /* Raise more serious error, not only print error message, and the whole system industry will suspend * *

} while (0)

#endif

#ifndef have_arch_bug_on

#define BUG_ON (condition) do {if (unlikely (condition)) BUG (), while (0)

#endif

When these two macros are invoked, they cause oops, which causes the stack's backtracking and error messages to print.

※ These two calls can be used as assertions, such as: bug_on (bad_thing);

2 Dump_stack ()

Sometimes, just print the stack's backtracking information on the terminal to help you debug. You can use Dump_stack () at this time. This function prints only the trace thread of the register context and function on the terminal.

if (!debug_check) {

PRINTK (kern_debug "provide some information.../n");

Dump_stack ();

}

Five PRINTK ()

The formatted print function provided by the kernel.

1 robustness of the PRINTK function

Robustness is one of the most receptive qualities of PRINTK, almost anywhere that the kernel can call it (interrupt context, process context, hold lock, multiprocessor processing, and so on).

2 The fragility of the PRINTK function

It cannot be invoked in some places until the terminal is initialized during system startup. If you really need to debug the beginning of the system startup process, you can use the following methods:

Using serial port debugging, the debugging information is exported to other terminal equipment.

Using EARLY_PRINTK (), this function has the ability to print early in the system startup. But it only supports part of the hardware system.

3 Log level

A major difference between PRINTK and printf is that the former can specify a log level. The kernel determines whether to print messages on the terminal based on this level. The kernel displays all messages that are higher than the specified level in the terminal.

You can specify a log level by using the following methods:

PRINTK (Kern_crit "Hello, world!\n");

Note that the first argument is not a real argument because there is no comma (,) for the separator level (Kern_crit) and format characters. The kern_crit itself is just an ordinary string (in fact, it represents the string "<2>"; table 1 lists the full log-level list). As part of the preprocessor, C automatically combines the two strings using a feature named string concatenation. The result of the combination is that the log level and user-specified format strings are included in a string.

The kernel uses this specified log level with the current terminal log level console_loglevel to determine whether to print to the terminal.

Here is the log level that you can use:

#define Kern_emerg "<0>"/* System is unusable * *

#define Kern_alert "<1>"/* action must be taken immediately * *

#define KERN_CRIT "<2>"/* Critical conditions * *

#define KERN_ERR "<3>"/* ERROR conditions * *

#define Kern_warning "<4>"/* WARNING conditions * *

#define KERN_NOTICE "<5>"/* Normal but significant condition * *

#define KERN_INFO "<6>"/* Informational *

#define KERN_DEBUG "<7>"/* debug-level Messages * *

#define KERN_DEFAULT "<d>"/* Use the DEFAULT kernel loglevel * *

Note that if the caller does not provide the log level to PRINTK, the system uses the default value kern_warning "<4>" (which means that only log messages above kern_warning level will be logged). Because the default values change, it is best to specify the log level when used. One of the benefits of having log levels is that we can select the output log. For example, we only need to print kern_warning level above the key log, but when debugging, we can choose to print kern_debug and more detailed log. None of this requires us to modify the code, only to modify the default log output level by command:

MTJ @ubuntu: ~$ CAT/PROC/SYS/KERNEL/PRINTK

4 4 1 7

MTJ @ubuntu: ~$ cat/proc/sys/kernel/printk_delay

0

MTJ @ubuntu: ~$ cat/proc/sys/kernel/printk_ratelimit

5

MTJ @ubuntu: ~$ cat/proc/sys/kernel/printk_ratelimit_burst

10

The first item defines the log level currently used by the PRINTK API. These log levels represent the log level of the console, the default message log level, the minimum console log level, and the default console log level. The Printk_delay value represents the number of latency milliseconds between PRINTK messages (used to improve the readability of some scenarios). Note that it has a value of 0, and it cannot be set through/Proc. Printk_ratelimit defines the minimum time interval allowed between messages (currently defined as the number of kernel messages per 5 seconds). The number of messages is defined by Printk_ratelimit_burst (currently defined as 10). This is useful if you have an informal kernel and use a bandwidth-constrained console device, such as through a serial port. Note that in the kernel, the speed limit is controlled by the caller, not in the PRINTK. If a PRINTK user asks for a speed limit, the user needs to call the Printk_ratelimit function.

4 record Buffers

Kernel messages are stored in a log_buf_len-sized ring queue.

About Log_buf_len Definition:

#define __log_buf_len (1 << config_log_buf_shift)

※ The variable config_log_buf_shift is defined by the configuration file at kernel compile time, and for the i386 platform, the value is defined as follows (in Linux26/arch/i386/defconfig):

Config_log_buf_shift=18

Log buffer operation:

When the ① message is read out to user space, the message is removed from the ring queue.

② when the message buffer is full, the new message overwrites the old message in the queue if there is another PRINTK () call.

③ The synchronization problem can be easily resolved when reading and writing to the ring queue.

※ This record buffer is called a ring because its reading and writing are operated in the form of an annular queue.

5 SYSLOGD/KLOGD

On standard Linux systems, the user-space daemon klogd The kernel message from the record buffer and saves the messages in the system log file through the syslogd daemon. KLOGD processes can read these messages either from the/proc/kmsg file or through the syslog () system call. By default, it selects the Read/proc method for implementation. The Klogd daemon is blocked until there is a new message in the message buffer. Once a new kernel message is available, the KLOGD is awakened, and the kernel message is read and processed. By default, the processing routine is to pass the kernel message to the SYSLOGD daemon. The SYSLOGD daemon typically writes incoming messages to the/var/log/messages file. However, you can configure it by/etc/syslog.conf files, and you can select other output files.

6 DMESG

The DMESG command can also be used to print and control the kernel ring buffers. This command uses the KLOGCTL system call to read the kernel ring buffer and forward it to the standard output (stdout). This command can also be used to clear the kernel ring buffer (using the-C option), set the console log level (-n option), and define the buffer size for reading kernel log messages (-s option). Note that if you do not specify a buffer size, then DMESG uses the Klogctl syslog_action_size_buffer operation to determine the buffer size.

7 Note:

A) Although PRINTK is very strong, but look at the source code you know, the efficiency of this function is very low: when copying a character copy only one byte at a time, and to call the console output may also produce interrupts. So if your driver does a performance test or release after the function is debugged, remember to minimize the PRINTK output, and only output a small amount of information when the error occurs. Otherwise, the output of unwanted information to the console affects performance.

b printk Temporary cache printk_buf only 1 k, all PRINTK functions can only record <1k information to log buffer, and PRINTK use of "ring buffer".

8 The overall structure of the kernel PRINTK and logging system:

9 Dynamic debugging

Dynamic debugging is the dynamic opening and banning of some kernel code to obtain additional kernel information.

First kernel option Config_dynamic_debug should be set. All information that is printed through Pr_debug ()/dev_debug () can be displayed dynamically or without display.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.