18th debugging one, the bug in the kernel
Kernel bugs may be caused by:
- 错误代码- 同步时发生的错误,例如共享变量锁定不当- 错误的管理硬件- ……
Symptoms of a kernel bug attack may include:
- 降低所有程序的运行性能- 毁坏数据- 使得系统处于死锁状态- ……
Kernel development takes into account unique issues over user development, such as:
- 定时限制- 竞争条件- ……原因是允许多个线程在内核中同时运行。
Second, through the printing to debug 1. Robustness
- A function with excellent elasticity: it can be called anytime, anywhere
- Can be called in the context of the interrupt and the context of the process
- Can be called when any lock is held
- Can be called at the same time on multiple processors
- Unless the initial phase of the restart process is output on the terminal
2. Log level
The main difference between the use of
-
PRINTK ()
and printf ()
is that the former can specify a log level. The kernel uses this level to determine whether messages are printed on the terminal. The
-
Kernel displays all messages with a lower level than a specific value on the terminal.
-
If you do not specify a record level, the function chooses the default default_massage_loglevel
. The
-
Kernel Converts these record levels to N, from 0-7, from top to bottom in the corresponding table, and the smaller the number, the more important.
0 Kern_emerg Most important 7 kern_debug the least important
-
Debugging information, there are two ways to assign a record level:
-
- keeps the default record level of the terminal unchanged, giving all debug information kern_crit or lower levels. The
- kern_debug the level of all debug information, adjusting the terminal's default record level. The output log levels for
-
Printk are as follows:
level |
Description |
Kern_ Emerg |
An emergency situation |
Kern_ ALERT |
An error that needs to be noticed immediately |
Kern_ Crit |
A critical condition |
Kern_ ERR |
An error |
Kern_ WARNING |
A warning |
Kern_ NOTICE |
An ordinary one, but there are situations that may require attention. |
Kern_ INFO |
A piece of informal news |
Kern_ DEBUG |
A debug message--typically redundant information |
3. Buffers
- Kernel messages are recorded in the ring queue, read and write in a queue, size can be adjusted by setting configLOGbuf_shift
- On a single processor, the buffer size defaults to 16KB, which means that messages that exceed the old message will be overwritten
- Advantage:
- Read and write synchronization problems are easy to solve
- Record maintenance is more convenient
4. Related processes
- The daemon of the user space--klogd reads the kernel information from the record buffer and saves them in the system log file through the syslogd daemon
- KLOGD will block until a new kernel message is available for reading. After being awakened, it reads out the new kernel message and processes it (by default, it is passed to syslogd);
- SYSLOGD will add all received messages to (by default, messages) files
Third, oops
oops is the most common way that the kernel informs users of unfortunate occurrences .
The kernel is very difficult to repair itself or kill itself, only to publish Oops, the process includes:
- 向终端上输出错误消息- 输出寄存器中保存的信息- 输出可供跟踪的回溯线索
Normally after Oops is sent, the kernel is in an unstable state.
About the timing of the occurrence of oops:
- Occurs in the interrupt context: The kernel cannot continue, will fall into chaos, causing the system to panic
- Occurs in the idle process or the INIT process (process No. 0 and process 1th), ibid.
- Occurs when another process is running, the kernel kills the process and attempts to continue execution
Internal and Debug Configuration items:
- Configuration item: ConfigDEBUGKERNEL
- Option sleep-inside-spinlockchecking (Spin lock sleep option)-The code that is using spin lock or no preemption is atomic and cannot be changed
- Detection range:
- Calling schedule () when the lock is being used;
- When the lock is being used, it is requested to allocate memory in blocking mode;
- Sleep when referencing single CPU data
To raise a bug and print information:
- Take advantage of Bug () and bug_on () (because most architectures define these two functions as some kind of illegal operation that can trigger oops)
- As an assertion or conditional statement
- Calling the panic () function suspends the system while printing the error message
- Panic ("terrible Thing", terrible_thing);
- Call Dump_stack () to print the register context and the trace thread of the function only on the terminal
System Request Key
- magic SYSR configuration option to start, in addition, by/PROC/SYS/KERNEL/SYSRQ the switch that marks the feature
- Pros: No matter what state the kernel is in, you can communicate with the kernel through special key combinations
- common commands
- sysrq-o: Turn off the machine
- sysrq-s: Flush all installed file systems to disk
Important information contained in the OOPS: register context and backtracking clues
- Backtracking clue: Shows the chain of function calls that caused the error to occur.
- Register context information is also useful, such as helping to burst into the scene that caused the problem
Addresses in the backtracking thread need to be translated into meaningful symbolic names
--You need to invoke the Ksymoops command.
You must also provide the system.map that are generated when the kernel is compiled. If you are using a module, you also need some module information.
kysmoop saved_oops.txt
The Sysmoops tool is not required in the current version, as many problems may occur, and the new version introduces Kallsyms pain, which can be enabled by defining the config_kallsyms configuration option.
Quad, kernel debugger 1.gdb
The running kernel can be viewed using the standard GNU debugger.
The method for booting the debugger against the kernel is roughly the same as for the process:
gdb vmlinux /proc/kcorevmlinx:未经压缩的内核映像,区别于zImage或bImage,它存放于源代码树的根目录上。/proc/kcore作为一个参数选项,是作为core文件来用的,通过它能够访问到内核驻留的高端内存。只有超级用户才能读取此文件的数据
You can use all of GDB's commands to get information. For example:
打印一个变量的值:p global_variable反汇编一个函数:disassemble function-g参数还可以提供更多的信息。
Limitations:
- No way to modify kernel data
- Cannot step into kernel code
2.kgdb
is a patch that allows us to debug the kernel with all of GDB's capabilities on a remote host via the serial port.
Requires two computers: deportment runs the kernel with kgdb patches, and the second child uses GDB to debug the first one via a serial line.
All features of the KGDB,GDB can be used:
- 读取和修改变量值- 设置断点- 设置关注变量- 单步执行
V. Detection System 1. Use UID as selection criteria
In general, when adding a feature, it is essential to keep the original algorithm and add the new algorithm to other locations, so as to ensure security.
The user ID (UID) can be used as a selection condition to implement this function:
A selection condition is used to schedule exactly which algorithm to execute.
For example:
if (current-> uid !=7777) { /* 老算法…… */} else { /* 新算法…… */}
That is, in addition to uid=7777 users, all other users are using the old algorithm, so this 7777 users can be dedicated to testing the new algorithm.
2. Using Condition variables
You can use conditional variables If your code is not process-independent, or if you want to have a mechanism that can be used for all situations to control an attribute.
This is easier than using UID, just to create a global variable as a conditional selector switch:
- If the variable is 0, the code on one of the branches is used;
- Otherwise, select a different branch.
Control mode: Some kind of interface, or debugger.
3. Usage statistics
This method is often used when the user needs to master the regularity of a particular event.
method is to create statistics and provide some mechanism to access their statistical results.
4. Repetition Frequency limit
There are two ways to prevent this from happening when the system has too many debugging information:
- Repetition frequency limit
- Number of occurrences limit
20135302 Wei Quiet--textbook 18 study notes