※
Difficult debuggingIs a significant feature that distinguishes kernel-level development from user-level development. ※
Control kernel debuggingTo a large extent, it depends on experience and understanding of the entire operating system.
I. Preparations before debuggingKernel-level bug
Unreliable Behavior,
Unclear definitionOr many specific features that are hard to reproduce bring about great difficulties for kernel-level BUG Tracking and debugging. ※For bugs with unclear definitions, the key to the problem is to find
Bug SourceIn many cases, when you precisely reproduce a bug, you are not far from success.
Ii. bugs in the kernelFrom errors hidden in the source code to bugs displayed in front of witnesses, the attack is often triggered by a series of chain reactions. Although kernel debugging is difficult, you may like it through your efforts and understanding.
Challenges.
Iii. printk ()The formatting function provided by the kernel. 1. the robustness and robustness of the printk function is one of the most acceptable features of printk, almost anywhere, the kernel can call it at any time (Interrupt context, process context, hold lock, multi-processor processing, etc ). ※During system startup,
Before terminal InitializationAnd cannot be called in some places. 2. The record level printk function can specify a record level based on which the kernel determines whether messages are printed on the terminal. The record level is defined in <Linux/kernel. h>:
# Define kern_emerg "<0>"/* system is unusable */
# Define kern_alert "<1>"/* action must be taken immediately */
# Define kern_crit "<2>"/* critical conditions */
# Define kern_err "<3>"/* error conditions */
# Define kern_warning "<4>"/* warning conditions */
# Define kern_notice "<5>"/* normal but significant condition */
# Define kern_info "<6>"/* informational */
# Define kern_debug "<7>"/* debug-level messages */call method: printk (ker_debug "this is a debug notice! /N); the kernel uses this specified record level to compare with the current terminal's record level lele_loglevel to determine whether to print to the terminal. For more information about <Linux/kernel. h> console_loglevel definition: # define console_loglevel (lele_printk [0]) <printk. c> definition: int console_printk [4] = {default_console_loglevel,/* console_loglevel */default_message_loglevel,/* default_message_loglevel */minimum_console_loglevel,/* minimum_console_loglevel */default_console_loglevel, /* default_console_loglevel */}; 3. The kernel messages in the Record Buffer are stored in a logo_buf_len queue. Definition of log_buf_len: # DEFINE _ log_buf_len (1 <config_log_buf_shift) ※variable config_log_buf_shift is defined by the configuration file during kernel compilation. For the i386 platform, the value is defined as follows (in linux26/ARCH/i386/defconfig): config_log_buf_shift = 18
Record Buffer operations: ① When a message is read to the user space, the message will be deleted from the ring queue. ② When the message buffer is full, if another printk () call is made, the new message will overwrite the old message in the queue. ③ When reading and writing a circular queue, the synchronization problem is easily solved. ※This record buffer is called a ring because its read and write operations are performed in a ring queue. 4. syslogd and klogd on the standard Linux system, the user space daemon klogd obtains kernel messages from the record buffer, then, use the syslogd daemon to save these messages to the system log file. The klogd process can read messages either from the/proc/kmsg file or through a syslog () System Call. By default, it is implemented in Read/proc mode. The klogd daemon remains in the blocking status until a new message exists in the message buffer. Once there is a new kernel message, klogd is awakened, read the kernel message and process it. By default, the processing routine is to pass the kernel message to the syslogd daemon. The syslogd daemon generally writes received messages to the/var/log/messages file. However, you can still configure it through the/etc/syslog. conf file, and select other output files. Figure 1 x-ray shows the process:
Iv. OopsOops (also called panic) messages contain details of system errors, such as the content of CPU registers. It is the most common way for the kernel to inform users of the unfortunate occurrence. The kernel can only publish oops. This process includes outputting error messages to terminals, outputting information stored in registers, and outputting tracing clues for tracking. Generally, after oops is sent, the kernel is in an unstable state. There are many possible causes for oops, including out-of-bounds memory access or invalid commands. ※As a kernel developer, oops is always processed. ※The important information contained in oops is identical to all the machines in the architecture:
Register context and tracing clues (The tracing clue shows the function call chain that causes the error.
). 1. in Linux, the traditional method for debugging system crashes is to analyze oops messages sent to the system console when a crash occurs. Once you have mastered the details, you can send the message to the ksymoops utility, which will try to convert the code into instructions and map the stack value to the kernel symbol. ※For example, the address in the trace will be converted to a visible function name through ksymoops. Figure 2x the process of formatting the oops message is as follows:
Ksymoops requires several items: Oops message output, system. map file from the running kernel,/proc/ksyms, vmlinux, And/proc/modules. For more information about how to use ksymoops, see the complete instructions on kernel source code/usr/src/Linux/documentation/oops-tracing.txt or on the ksymoops manual page. Ksymoops disassemble the Code Section to identify the wrong instruction and display a trace section to show how the code is called. 2. The kallsyms 2.5 kernel introduces the kallsyms feature, which can be enabled by defining the config_kallsyms compilation option. This option can be loaded
Symbol name of the memory address corresponding to the kernel Image(That is, the function name). Therefore, the kernel can print the trace clues After decoding. Correspondingly, the system. Map and ksymoops tools are no longer required for decoding oops. In addition, this will increase the kernel because the symbol name corresponding to the address must always reside in the memory of the kernel. # Cat/proc/kallsyms c0100240 T _ stext c0100240 t run_init_process c0100240 t stext c0100269 t init...
V. kernel debugging configuration optionsDuring kernel compilation, the kernel provides many configuration options to facilitate code debugging and testing. ※Enable options include slab layer debugging (slab layer debugging options), high-memory debugging (high-end memory debugging options), and I/O mapping debugging (I/O ing debugging options) spin-lock debugging, stack-overflow checking, and sleep-inside-spinlock checking. 1. debug atomic operations developed from kernel 2.5
Check all kinds of problems caused by atomic operationsThe kernel provides excellent tools. The kernel provides
Atomic operation counterIt can be configured to print the warning information and provide tracing clues once the city goes to sleep or performs some operations that may cause sleep during the atomic operation. Therefore
Use lockSchedule () is called, and the memory allocation is requested in blocking mode when the lock is being used. Various potential bugs can be detected. The following options can be used to the maximum extent: config_preempt = y config_debug_kernel = y config_kllsyms = y config_spinlock_sleep = y
6. Cause bugs and print information1. Some kernel calls can be used to easily mark bugs, provide assertions, and output information. The two most common examples are bug () and bug_on (). Defined in <include/ASM-generic>:
# Ifndef have_arch_bug
# Define bug () do {
Printk ("bug: failure at % s: % d/% s ()! ", _ File __, _ line __, _ function __);
Panic ("bug! ");/* Cause a more serious error, not only printing the error message, but also suspending the entire system industry */
} While (0)
# Endif
# Ifndef have_arch_bug_on
# Define bug_on (condition) do {If (unlikely (condition) Bug ();} while (0)
# Endif when calling these two macros, they will trigger oops,Cause stack tracing and error message Printing. ※These two calls can be used as assertions, for example, bug_on (bad_thing); 2. In some cases, you only need to print the stackBacktracking InformationTo help you debug. In this case, you can use dump_stack (). This function only prints the register context and function trace clues on the terminal. If (! Debug_check) {printk (kern_debug "provide some information... /N "); dump_stack ();} remarks: Most of the content is introduced in Linux kernel design and implementation-version 2nd.