Introduction to Linux Debugging technology __linux

Source: Internet
Author: User
Tags syslog

One of the most compelling questions for anyone writing kernel code is how to finish debugging. Because the kernel is a feature set that is not associated with a process, its code cannot be easily executed in the debugger and cannot be traced.

This chapter describes the techniques you can use to monitor kernel code and track errors.

Debugging with print information

The most common debugging technique is monitoring, which is to add printf calls to the appropriate points within the application. When you debug the kernel code, you can use PRINTK to accomplish this task.

PRINTK

In the previous chapters, we simply assumed that PRINTK worked like printf. Now is the time to introduce the difference between them.

One of the differences is that PRINTK allows you to classify messages by appending different "record levels" to their severity, or to give message priority. You can use macros to indicate record levels. For example, Kern_info, which we've seen earlier, is added to the front of the print statement, which is a possible message logging level. The record-level macro expands to a string that is spliced together at compile time and message text; This is why the precedence and format characters in the following example Ging without commas. Here are two examples of PRINTK, one for debugging information and one for critical information:

Code

8 Types of record-level strings are defined in <linux/kernel.h>. A PRINTK statement that does not specify a priority uses the Default_message_loglevel priority by default, which is an integer defined in KERNEL/PRINTK.C. The specific value of the default record level has changed several times during Linux development, so I recommend that you always specify a suitable record level.

Depending on the record level, the kernel prints the message to the current text console: If the priority is lower than the Console_loglevel value, the message is displayed on the console. If the system runs both KLOGD and SYSLOGD, the kernel appends the message to/var/log/messages regardless of console_loglevel value.

The variable console_loglevel is initially initialized to Default_console_loglevel, but can be modified through Sys_syslog system calls. As shown in the KLOGD manual, you can modify this variable by specifying the-c switch when you start the KLOGD. In addition, you can also write a program to change the console record level. You can find a program in the source file on the O ' Reilly site that I wrote about this feature, MISCPROGS/SETLEVEL.C. The new priority is specified by an integer value from 1 to 8.

You may need to lower the record level after the kernel fails (see debug system failure) because the expiration code will raise the Console_loglevel to 15 and all messages will appear on the console. To see your debugging information, if you are running kernel 2.0.x, you need to raise the record level. Kernel 2.0 release reduces minimum_console_loglevel, while older versions of KLOGD print many control messages by default. If you happen to use this old version of the daemon, unless you raise the record level, Kernel 2.0 will print fewer messages than you expect. This is why the <1> tag is used in hello.c to ensure that the message is displayed on the console.

The kernel version of the 1.3 43一来 provides a flexible recording strategy by allowing you to send messages to the specified Virtual Console. By default, the console is the current virtual terminal. You can also select a different virtual terminal to receive messages, and you simply invoke IOCTL (Tioclinux) to the selected virtual terminal. The following program, Setconsole, can be used to select which virtual terminal receives kernel messages; it must run as Superuser. If you're not sure about IOCTL, you can skip this to the next section until after you read the 5th chapter, "IOCTL" of the "extended operation for character device drivers", and then back here to read the code.

Code

Setconsole uses a special IOCTL command tioclinux for Linux-specific features. To use Tioclinux, you pass it a pointer to a byte array. The first byte of the array is the encoding of the requested subcommand, and the subsequent bytes vary according to the command. Subcommand 11 is used in Setconsole, and the latter byte (stored in bytes[1]) is marked with a virtual Console. Tioclinux's complete introduction can be found in the kernel source drivers/char/tty_io.c.

how the message is recorded

The PRINTK function writes a message to a circular buffer of length Log_buf_len bytes. It then wakes up any process that waits for messages, that is, those that sleep during the call to the Syslog system call or read the/PROC/KMESG process. The interfaces of the two access record engines are equivalent. However,/PROC/KMESG files are more like a FIFO file, which is easier to read from data. A simple cat command to read the message.

If the loop buffer is filled, the PRINTK fills in the new data at the beginning of the buffer, overwriting the old data. As a result, the record process loses the oldest data. This problem is negligible compared to the benefits derived from the use of the circular buffer. For example, a circular buffer can allow the system to run without recording a process, without wasting memory. Another feature of Linux's approach to handling messages is that PRINTK can be invoked anywhere, even in interrupt processing functions, and there is no limit to the amount of data. The only drawback to this approach is that some data may be lost.

If KLOGD is running, it reads kernel messages and assigns them to SYSLOGD, which then checks/etc/syslog.conf to find how to process the data. SYSLOGD is based on a "facility" and "priority" shard message; The value that can be used is defined in <sys/syslog.h>. Kernel messages are recorded in the Log_kern facility according to the priority specified in the corresponding PRINTK. If the KLOGD is not running, the data is saved in the loop buffer until there is a process to read the data or overflow the data.

If you don't want to mess up your system record by monitoring your driver's messages, you give Klogd the-f (file) option or modify/etc/syslog.conf to write the record to another file. Another method is a tough way: Kill Klogd, print messages to unused virtual terminals *, or perform CAT/PROC/KMESG display messages on a xterm.

use preprocessing to facilitate monitoring processing

Early in driver development, PRINTK can be very helpful for debugging and testing new code. However, when you formally release the driver, you should remove, or at least close, these print statements. Unfortunately, you may soon find that you may have to add new features as you want to stop needing those messages and you'll need them again. There are several ways to solve these problems-how to open and close messages globally and how to turn individual messages on and off.

Here's a lot of the code I use to process messages, which has the following features:

You can open or close each statement by adding a letter to the macro name or by removing a letter.

You can turn off all messages at once by modifying the cflags variable before compiling.

The same print statement can be used either in a kernel state (a driver) or in a user state (demo or test program).

These features are implemented by the following snippets of code that come directly from scull.h.

Code

Conforming to Pdebug and Pdebugg depends on whether the scull_debug is defined, and they are all similar to printf calls.

To further facilitate this process, add the following lines to your makefile.

Code

The code given in this section relies on GCC extensions to the ANSI C precompiled compiler, which can support macros with variable number of parameters. This dependency on GCC is not a problem because the kernel relies more heavily on GCC features. In addition, Makefile relies on the GNU gmake, which is not a problem based on the same reasoning.

If you are familiar with the C precompiled compiler, you can extend the definition above to support the "debug level" concept, and you can assign an integer (or bitmap) to each level to show how trivial the message is at this level.

But each driver has its own capabilities and monitoring requirements. Good programming skills find a trade-off between flexibility and efficiency, and I can't say which one is best for you. Remember, pre-compiler conditions (and constant expressions in code) run only at compile time, and you must recompile the program to turn the message on or off. Another approach is to use the C conditional statement, which runs at run time, so that you can turn the message on or off during the execution of the program. This is a good feature, but every time the code execution system has to do extra processing, even after the message is closed, it still affects performance. Sometimes this loss of performance is unacceptable.

Personal view, although the macros given above force you to recompile and reload the module every time you want to add or remove messages, I think it's good to use these macros.

Debugging by Query

The previous section talked about how PRINTK works and how to use it. But it doesn't talk about its flaws.

Because SYSLOGD will keep refreshing its output file, each print row will cause a disk operation, so overuse of PRINTK can severely degrade system performance. At least from the point of view of SYSLOGD. It will write all the data to disk, in case the system crashes after printing the message, however, you do not want to reduce the performance of the system because of debugging information. This problem can be solved by adding a/etc/syslogd.conf to the name of the file in the document, but sometimes you don't want to modify your configuration file. If not, you can also run a KLOGD program (such as the CAT/PROC/KMESG described earlier), but this does not provide a suitable environment for normal operation.

The best way to do this is to get the information through the query system when you need it, rather than continually generating the data. In fact, every UNIX system provides a number of tools for obtaining system Information: Ps,netstat,vmstat, and so on.

There are a number of technologies that are appropriate to query the system with driver developers, in short, creating files and using the IOCTL driver method under/Proc.

using the/proc file system

The/proc file system in Linux has nothing to do with any devices the files in ――/proc are core-created when they are read. These documents are ordinary text files that can be understood by ordinary people and can be understood by tool programs. For example, for most Linux PS implementations, it obtains process table information by reading the/proc file system. The creation of the/proc virtual files has been used by several modern operating systems and has been very successful.

The current implementation of/proc can dynamically create an I node, allowing the user module to create entry points for easy information retrieval.

In order to create a sound file node in/proc (you can read,write,seek and so on), you need to define the file_operations structure and the inode_operations structure, which has a similar effect and size to the former. Creating such an I node is no different than creating an entire character device. We don't discuss this issue here, and if you are interested, you can get further details in the source tree Fs/proc.

As with most/proc files, it's easy to create them if the file nodes are just for reading, and I'll introduce this technique here. Unfortunately, this technology can only be used in Linux 2.0 and later versions.

Here is the creation of a scull code called the/proc/scullmem file, which is used to get the memory information used by scull.

Code

Completing the/proc file is easy. Your function gets a free page to fill in the data; it writes the data into the buffer and returns the length of the written data. Other things are handled by the/proc file system. The only limitation is that the written data cannot exceed Page_size bytes (macro page_size defined in header file <asm/page.h>; it's architecture-related, but you can at least have 4KB size).

If you need to write more than one page of data, you must implement a full-featured file.

Note that if a process that is reading your/proc file sends out several read calls, each gets new data, and even though only a small amount of data is read, your driver will rewrite the entire buffer every time. This extra work can degrade system performance, and if the file produces a different data than the next one, the subsequent read call will reassemble the unrelated part, which can result in data dislocation. In fact, performance is not a problem because every application using C library reads data in large chunks. However, as dislocation occurs, it is a question worth considering. After fetching the data, the library call calls at least 1 times read―― the end of the file is only reported when Read returns 0 o'clock. If the driver happens to generate more data than before, the system returns to the user space for additional bytes and is misaligned with the preceding data block. We will cover/proc/jiq* in the "Task Queue" section of chapter 6th "Time Flow", when we will also encounter dislocation problems.

Cleanup_module should use the following statement to unregister the/proc node:

Code

The arguments passed to the function are the I-node number that contains the directory name and file for which you want to undo the file. Because the I node number is allocated automatically, it is not known at compile time and must be read from the data structure.

IOCTL Method

IOCTL, which is discussed in detail in the next chapter, is a system call that can be manipulated on a file descriptor; it receives a "command" number and (optionally) an argument, usually a pointer.

As a workaround for the/proc file system, you can implement several IOCTL commands for debugging. These commands copy the relevant data from the driver space to the process space and examine the data in the process space.

Only using IOCTL to get information is more difficult than/proc because you have a program that calls IOCTL and displays the results. You must write such a program, compile it, and maintain consistency with the module you are testing.

But sometimes this is the best way to get information, because it's much faster than reading/proc. If some processing work must be done before the data is written to the screen, it is much more efficient to get the data in binary form than to read the text file. In addition, IOCTL does not limit the size of the returned data.

One advantage of the IOCTL approach is that debugging commands can remain in the driver when debugging is turned off. The/proc file is visible to anyone viewing the directory, but unlike the/proc file, IOCTL commands are not usually noticed. Also, if there are any exceptions to the drivers, they can still be used for debugging. The only drawback is that the module will be slightly larger.

by monitoring debugging

Sometimes the problems you encounter are not particularly bad, and by running the application in user space to view the interaction between the driver and the system can help you catch small problems and verify that the driver is working properly. For example, I have more confidence in scull after seeing Scull's read implementation of the read request to handle different volumes of data.

There are many ways to monitor the work of a user-state program. You can use the debugger to track its functions step-by-step, insert a print statement, or run a program with Strace. The last technique is useful when the actual goal is to view kernel code.

The Strace command is a very powerful tool that can be invoked by all systems in a realistic program. Not only can it display the call, but it can also display the parameters of the call and display the return value in a symbolic manner. When a system call fails, the wrong symbolic value (for example, Enomem) and the corresponding string (out of memory) are displayed at the same time. Strace also has a number of command-line options; the most common is-t, which displays the time that the call took place,---------------T, display the time spent by the call, and-O to redirect the output to a file. By default, strace prints all trace information to the stderr.

Strace receives information from the kernel. This means that a program can be tracked regardless of whether it is compiled as debugged (with the GCC-g option) or if the symbolic information is removed. With a debugger that can connect to a running process and control it similarly, you can also track a process that is already running.

Trace information is often used to generate error report reports to application developers, but it is also useful for kernel programmers. We can see how the system call executes the driver code; Strace allows us to check the consistency of each call input output.

For example, the following screen output gives the last lines of command Ls/dev >/dev/scull0:

Code

It is obvious that in the first call to write after LS completes the retrieval of the target directory, it attempts to 4KB. Oddly enough, only 4,000 bytes were written, and then the operation was retried. However, we know that the write implementation of Scull writes only one quantum at a time, and I see here a partial write. After a few steps, everything is emptied and the program exits normally.

Another example, let's read the scull device:

Code

As expected, read reads only 4,000 bytes at a time, but the total amount of data is unchanged. Note how the retry work is organized in this example, and note that it contrasts with the write trace above. The WC is optimized for fast read data, bypassing the standard library to read more data each time with one system call. You can see that the WC reads 16KB at a time from the read line you're tracking.

UNIX experts can find a lot of useful information in the strace output. If you get fog by these symbols, I can just look at how the file method (Open,read, etc.) works.

Personally, the tracking tool is most useful in identifying Run-time errors in system calls. The perror call in the application or demo program is not sufficient for debugging, and it is also helpful to find out what parameters trigger the system call errors.

Debug System Failure

Even if you use all of the monitoring and debugging techniques, sometimes there are still errors in the driver, and when such a driver executes it can cause a system failure. When this happens, it is vital that you get enough information to solve the problem.

Note that "failure" does not mean "panic". Linux code is very robust and can respond well to most errors: Failure typically causes the current process to terminate, but the system continues to run. The system may panic if there is a failure outside of the process context, or if an important component of the system fails. But the problem with the driver is that it usually only causes the failed process to terminate-that is, the process that uses the driver. The only unrecoverable loss is that the process context allocates memory lost when the process is terminated; For example, dynamic lists that are allocated by the driver through Kmalloc may be lost. However, because the kernel will call close on a device that is still open, your driver can release any resources that are assigned by the Open method.

As we have said, some useful information is displayed on the console when the kernel behaves abnormally. The next section explains how to decode and use these messages. Although they are rather obscure to beginners, the processor gives data that is interesting, often with no extra testing to pinpoint bugs.

Oops Message

Most errors are null pointer references or use other incorrect pointer values. These errors usually result in a oops message.

Addresses used by the processor are "virtual" addresses and are mapped to physical addresses through a complex structure called page tables (see section in Chapter 13th, "Mmap and DMA"). When an illegal pointer is referenced, the page mapping mechanism cannot map the address to the physical address, and the processor issues a "page invalidation" to the operating system. If the address is indeed illegal, the kernel cannot "change pages" from the invalid address, and if it is handled at this time by the superuser, the system generates a "oops". It is noteworthy that in version 2.1 The kernel processing failed in a way that can handle the illegal address references in the Superuser state. The new implementation will be introduced in the 17th chapter, "Recent development", "Processing kernel space invalidation".

Oops displays the processor state at the time of the failure, the module CPU register contents, the location of the page description chart, and other information that seems incomprehensible. These are generated by the PRINTK statements in the failure-handling function (ARCH/*/KERNEL/TRAPS.C) and are dispatched as described in the previous "PRINTK" section.

Let's take a look at such a message. Here is a traditional personal computer (x86 platform), running Linux 2.0 or newer version of Oops―― version 1.2

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.