"Linux kernel design and implementation" Fifth week reading Notes (Chapter 18th)

Last Update:2016-04-05 Source: Internet

Author: User

Tags message queue system log using git

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

18th Chapter Commissioning

20135307 together

18.1 preparing to start a bug in the 18.2 kernel

There are many bugs in the kernel, and they can be produced for countless reasons, and their appearances are changeable, from unmistakable error codes (such as not keeping the correct values in place) to errors that occur during synchronization (such as improper sharing of variables) and to mis-managing hardware (for example, Send the wrong command to the wrong control register). From reducing the performance of all programs to destroying data to making the system deadlocked, it can be a symptom of a bug attack.

18.3 Debug 18.3.1 robustness with Print

Robustness is a trait that the PRINTK () function is most likely to allow people to accept. At any time, it can be called anywhere, and PRINTK () in the kernel is ubiquitous. It is very important that it is a very elastic function. PRINTK () is so useful that it can be called at any time. PRINTK () function under the robust shell will inevitably have a loophole in the system startup process. The terminal has not been initialized before it can be used in some places. But honestly, if the terminal is not initialized, where can you output it? That's not a problem. Unless you're debugging the first steps of the startup process (for example, in a function that is responsible for performing the initialization actions of the hardware architecture), it is challenging to do so without any printing function, and it does make the problem even trickier.

18.3.2 Log Level

The main difference between PRINTK () and printf () is that the former can specify a log level.
The kernel depends on this level to determine whether to print messages on the terminal.
The kernel displays all messages with a lower level than a specific value on the terminal.

18.3.3 Record Buffer

Kernel messages are stored in a circular queue of logBUFlen size. The buffer size can be adjusted at compile time by setting Config_logBUFshift. On a single-processor system its default value is 16KB. In other words, the kernel can only hold 16KB of kernel messages at the same time. If the message queue has reached the maximum value, the new message overwrites the old message in the queue if there is another PRINTK () call. This record buffer is called a ring because its read and write operations are performed in the form of a circular queue.
There are many advantages to using a ring queue. Since the simultaneous reading and writing of the ring buffer, its synchronization problem is easy to solve, so even in the context of the interrupt can be easily used PRINTFK (). In addition, it makes it easier to maintain records. If a large number of messages are generated at the same time, the new message can only overwrite the old message. When a problem raises a lot of messages. The record will only overwrite itself, not lose a lot of memory because of runaway. The only downside of the ring buffer-the possibility of losing the message-is worth the price compared to the benefits of simplicity and robustness.

18.3.4 Syslogd and KLOGD

On a standard Linux system, the daemon of the user space klogd the kernel messages from the record buffers, and then saves them in the system log file through the syslogd daemon, KLOGD programs can be either from/proc/kmsg files or via syslog () The system calls to read these messages, by default, it chooses to read the/PROC implementation, either way, KLOGD will block until a new kernel message is available for reading. After being awakened, it reads out the new kernel message and processes it, by default it passes the message to the SYSLOGD daemon.

18.3.5 conversion from printf () to PRINTK () 18.4 oops

Oops is the kernel that informs the user that there is no way God is most commonly used.
The important information contained in Oops is identical for all architectures: the register context and the backtracking thread.
The backtracking clue shows the chain of function calls that caused the error to occur. So we can see what's going on: The machine is idle, the idle loop is executing, and the CPUidle () loop calls defaultidle (). At this time the timer interrupt is generated, it causes the processing of the timer, Tulip_timer () This timer handler function is called, but it refers to the null pointer. You can even find the statement that caused the problem by offset.

18.4.1 Ksymoops

The address in the backtracking thread needs to be translated into meaningful symbolic names for ease of use, which calls the Ksymoops command. You must also provide the system.map that are generated when the kernel is compiled. If you are using a module, you also need some module information.

18.4.2 kallsyms

Configuration option Config_kallsyms_all means not only the name of the function, but also all the symbol names.

18.5 Kernel Debug Configuration options

At compile time, the kernel provides many configuration options to facilitate debugging and testing of kernel code. The kernel Development menu in the kernel configuration Editor. Of these options, they all depend on the configDEBUGKERNEL.

18.6 Raising a bug and printing information

Some kernel calls can be used to conveniently mark Bugs to provide assertions and output information. The two most common are bug () and some declaration bug_on (). When called, they trigger oops, which causes the stack's backtracking and error messages to be printed. Most architectures define bugs () and bug_on () as an illegal operation that causes Oops to be associated with the hardware architecture, which naturally generates the required oops. These calls can be used as assertions and you want to assert that a situation should not occur.
18.7 Magical System Request keys
The System Request key feature can be enabled by defining the configMAGICsysrq configuration option.
When this feature is enabled, no matter what state the kernel is in, it can communicate with the kernel through a special combination of keys. This feature allows you to do some useful work in the face of a dying system. In addition to the configuration options, a sysctl is used to mark the opening or closing of the feature.
Use the following command when you need to enable it: Echo 1>/PROC/SYS/KERNEL/SYSRQ

18.8 The legend of the kernel debugger

Many kernel developers have been hoping to have a debugger for the kernel. Unfortunately, Linus does not want to include a debugger in its kernel source tree. He thinks the debugger will mislead the developer into introducing a bad fix, and no one can challenge his logic by actually understanding the code, which is a better guarantee of the correctness of the correction. However, many kernel developers still want an officially released debugger for the kernel. Since this requirement does not seem to be met immediately, many patches have come into being, and they have added support for kernel debugging to the standard kernel, although these are not officially sanctioned patches, but they are perfectly functional and powerful. Before we dive into these solutions, it's a good idea to look at the standard debugger gdb can give us some help.

18.8.1 gdb18.8.2 Kgdb

Kgdb is a patch that allows us to debug the kernel with all of GDB's capabilities on the remote host via the serial port. This requires two computers: the first runs the kernel with the Kgdb patch, and the second uses GDB to debug the first one via a serial line. All features of kgdb can be used: reading or modifying variable values, setting breakpoints, setting interest variables, stepping through, and so on. Some versions of GDB even allow the execution of functions. Setting up the KGDB and connecting the serial lines can be tricky, but once you're done, debugging becomes easy.

18.9 Detection System

。

18.9.1 using UID as the selection condition

Suppose you rewrite the fork () system call in order to add an exciting new feature. Unless the first attempt is perfect, the system debugging is a nightmare. If the fork () system call is not normal, you do not have to expect the entire system to work properly. Of course, like at any time, hope is always there, in general, as long as the original algorithm to keep your new algorithm to add to other locations, the basic guarantee of security: You can use the user ID as a selection criteria to achieve this function, through this selection criteria, you can arrange the exact implementation of which algorithm.

18.9.2 Using condition variables

You can use conditional variables if your code is not process-independent, or if you want to have a mechanism that can be used for all situations to control an attribute. This is easier than using UID, just to create a global variable as a conditional selector switch. If the variable is zero, the code on one branch is used. If it is nonzero, select a different branch. This variable can be manipulated by some interface, or it can be manipulated directly through the debugger.

18.9.3 usage Statistics

There are times when you need to master the regularity of a particular event. There are times when you need to compare multiple events and draw patterns from them. It is easy to meet this requirement by creating statistics and providing some mechanism to access their statistical results. For example, suppose we want to get the frequency of Foo and bar, it is of course best to define two global variables in a file that defines the event.

18.9.4 Repetition frequency limit 18.10 Find the change that caused the crime by using a binary search method 18.11 using git for binary searches

The GIT source management tool provides a useful binary search mechanism. If you use Git to control a copy of the Linux source tree, git will run the binary search process automatically. In addition, Git does a binary search in the revision so that it can find out which commit code is causing the bug. Many git-related tasks are cumbersome, but using Git for binary search is not so difficult.

18.12 when all efforts fail: the Community

Maybe you've done all the work you can think of, and you've worked on the keyboard for a few hours. In fact, it may be countless days, the answer still does not favor you. At this point, if the bug is in the mainstream of the Linux kernel, you can seek help from other developers in the kernel development community. You should send an email to the kernel mailing list, complete and concise, and your findings may help you find the final answer.

18.13 Summary

This chapter discusses debugging of the kernel. Debugging is actually a way of seeking to achieve deviation from the goal, we looked at several techniques: from the kernel built-in debug architecture to the debug program, from logging to git dichotomy lookup, because debugging the Linux kernel is difficult, non-debug user program can be compared, so, The information in this chapter is critical for anyone trying to kind in kernel code.

"Linux kernel design and implementation" Fifth week reading Notes (Chapter 18th)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More