First, ready to begin
1, a fixed bug, but most of the bugs are usually not reliable and well-defined.
2. A kernel version that hides bugs.
3. Knowledge and luck related to kernel code.
Second, the bug in the kernel
1, the appearance of the bug:
Unmistakable error codes, errors that occur during synchronization, mis-management of hardware, reduced performance of all programs, destruction of data, and the system in a deadlock state.
2. Referencing a null pointer can result in a oops, and garbage data can cause the system to crash.
Third, through the printing to debug
1, PRINTK () is the kernel format printing function, and the C library provides the printf () function is basically the same function.
2, PRINTK () can call it at any time, anywhere. In addition to the system startup process, the terminal has not been initialized before, and it cannot be used in some places.
EARLY_PRINTK () has the ability to print on the terminal at the beginning of the start-up process, but is lack of portability.
3. The most important differences between PRINTK () and printf ():
PRINTK () can specify a log level at which the kernel determines whether to print messages on the terminal.
The kernel is console_loglevel with this specified record level and the current terminal's record level to determine whether to print to the terminal.
If you do not specifically specify a record level, the function chooses the default default_message_loglevel.
The kernel Kern_emerg the most important record level to "<0>" and kern_debug the insignificant record level to "<7>".
Cases:
When the compilation preprocessing is complete, the code is compiled into the following format:
4. The method of assigning a record level to the PRINTK () call
(1) Keep the default record level of the terminal unchanged, give all debug information Kern_crit or lower level.
(2) Kern_debug the level of all debug information and adjust the default record level of the terminal.
5. Kernel messages are stored in a log_buf_len-sized ring queue.
The buffer size can be adjusted at compile time by setting Config_log_buf_shift, whose default value is 16KB on a single-processor system.
* * Advantages and disadvantages of using ring queue * *
6. The daemon process of the user space klogd the kernel messages from the record buffer and saves them in the system log file through the syslogd daemon.
Iv. OOPS
1. Oops is the most common way that the kernel informs the user of the unfortunate occurrence.
This process involves outputting error messages to the terminal, storing information in the output register, and outputting trace traces that can be traced.
2, if the Oops occurs when the context is interrupted, the kernel simply cannot continue, it will fall into chaos, causing the system to panic.
If the oops occurs during the idle process (PID 0) or the init process (PID 1), the result is that the system is in chaos.
3. Important information contained in OOPS: register context and backtracking clues.
The addresses in the backtracking thread need to be translated into meaningful symbolic names: Call the Ksymoops command and provide the system.map that is generated when the kernel is compiled, and if you are using a module, some module information is required.
Ksymoops Saved_oops.txt
4, kallsyms features, it can be enabled by defining the config_kallsyms configuration option.
Five, kernel Debug configuration options
Atomic operations: Refers to things that can not be separated from execution, and cannot be interrupted at execution time or it is not finished code.
Vi. causing bugs and printing information
1, some kernel calls can be used to easily mark Bugs, provide assertions and output information. The two most common are bug () and bug_on ().
When called, the Oops is raised. These calls can be used as assertions, and you want to assert that a situation should not occur:
BUG_ON () is clearer and more readable than the bug (), and bug_on () puts its declaration into unlikely () as a statement.
BUILD_BUG_ON () has the same effect as bug_on () and is only called at compile time.
A more serious error can be raised with panic (): Print the error message, suspend the entire system, but use it in the worst case scenario.
Seven, the Magic system request key
1, the Magic System request key (Magic SysRq key), this feature can be enabled by defining the CONFIG_MAGIC_SYSRQ configuration option. When this feature is enabled, no matter what state the kernel is in, it can communicate with the kernel through a special combination of keys.
2, in addition to the configuration options, but also through a sysctl to mark the opening or closing of the feature:
Echo 1 >/proc/sys/kernel/sysrq
Eight, the legend of the kernel debugger
1. gdb
The running kernel can be viewed using the standard GNU debugger: GDB vmlinux/proc/kcore
Where the Vmlinux file is uncompressed, the kernel image is stored in the root directory of the source code tree.
/proc/kcore, as a parameter option, is used as a core file to gain access to high-end memory where the kernel resides.
If you compile the kernel with the-G parameter, GDB can also provide more information.
2, Kgdb
is a patch that allows us to debug the kernel with all of GDB's capabilities on the remote host via the serial port.
Nine, detection system
1. Use UID as the selection condition
2. Use condition variables
You can use conditional variables if your code is not process-independent, or if you want to have a mechanism that can be used for all situations to control an attribute.
3. Usage statistics
It is necessary to master the regularity of a particular event or to compare multiple events and draw a pattern from it.
4, repetition frequency limit: an event occurs very frequently, but also need to observe its overall progress.
Occurrence limit: Verify that a piece of code is actually executed under certain circumstances.
In either case, the variables used should be statically static, and should be limited to the local scope of the function, in order to ensure that the values of the variables can still be preserved after several function calls.
X. Finding the change that caused the crime by using a binary search method
At first, a reliable and reproducible error is required, preferably a bug that can be verified when the system is started.
Next, you need a kernel that will ensure that it is not a problem.
Next, you need a kernel that is definitely problematic.
Xi. Two-point search using Git
In the beginning, tell Git to do a two-point search: Git bisect start.
If this version is OK, you can run the following command: Git bisect good.
If you prove that a given kernel version has bugs, you can run: Git bisect bad.
If you already know the source that caused the bug (for example, the boot code of the x86 model), you can specify that git simply searches for the patch submitted in the list of directories associated with the error: Git bisect start-arch/x86
Summarize:
This chapter mainly introduces the knowledge of system invocation, which complements the video teaching in Mooc, and is supplemented in many details, especially in the introduction of how to add a new system call. In addition, it is worth noting that the concepts and kernel interface specifications that are followed by writing a canonical, optimized, secure system call.
System calls are made using the mechanism of soft interrupts, coordinated by API, POSIX, and C libraries. Although the system call is very convenient to use, but still try to avoid each new abstraction to simply add a new system call, and the new system calls to increase the frequency is very low also reflects that Linux is a relatively stable and function of the more complete operating system.
"Linux kernel design and implementation" Learning summary CHAP18