Use systemtap to debug the kernel in Linux

Source: Internet
Author: User
Tags ftp site systemtap

Systemtap is a novel Linux kernel diagnostic tool that provides the ability to quickly and securely obtain information from the running Linux kernel. Systemtap is a good news for Kernel developers and system administrators because it enables them to collect real-time kernel data by writing or reusing simple scripts, without having to endure the long run of modifying the source code, compiling the kernel, and restarting the system. This article introduces the installation, use, and basic principles of systemtap, and uses some interesting examples to reveal the powerful capabilities provided by systemtap.

Before systemtap appeared, it was often a nightmare for Linux programmers or system administrators to debug the kernel. For example, if you suspect that the FD parameter passed to the system to call read has a problem and want to print it out, what you need to do is: first obtain the kernel source code and find sys_read () insert the printk () statement into the function body, recompile the kernel, and then restart the system with the new kernel. Thank God, you have finally seen what you want to see, but you will soon find a new problem: Unless you restart the system to the original kernel, printk () will print it endlessly.

The purpose of systemtap is to save people from this quagmire. Systemtap provides a simple command line interface and a powerful scripting language, while predefines a wide range of script libraries. Based on the kprobe in the kernel, systemtap allows you to freely collect debugging information and performance data from the running kernel for later analysis and processing. You can start or stop this collection process at any time without the need for lengthy code modifications, kernel compilation, and system restart. Systemtap simplifies the problem above. It is as simple as a command:


stap -e 'probe { printf("fd = %d/n",fd) }

Systemtap features similar to Sun's dtrace and IBM's dprobe tool. But unlike them, systemtap is an open-source software project that complies with GPL. His appearance gave the Linux community a powerful and easy-to-use dynamic kernel debugging tool. Currently, the main R & D members of systemtap are RedHat, IBM, Intel, and Hitachi, and include engineers from the IBM China R & D center.

Install systemtap

Before installing systemtap, make sure that the other two software packages are installed in the system:

Kernel-debuginfo RPM: systemtap uses kernel debugging information to locate kernel functions and variables. For the general release version, the kernel-debuginfo rpm is not installed. We can download it from the Release Version Download Site. For Fedora Core 6 on my ThinkPad, this address is:

Elfutils RPM: systemtap requires the library functions provided by the elfutils software package to analyze debugging information. The current systemtap requires installation of elfutils-0.123 or later versions. The latest version is 0.124-0.1. If necessary, we can download the RPM or source code from the systemtap site to upgrade. Yes:

Then you can install systemtap, which can be installed through RPM or source code:

1. Install Fedora Core 6 with RPM. systemtap is installed by default.

2. Install the source code and download the latest source code from the FTP site of systemtap.

The installation is as follows:


/root > tar -jxf SystemTap-20061104.tar.bz2            /root > cd src            /root/src> ./configure            /root/src> make            /root/src> make install

Run systemtap.

The root permission is required to run systemtap.

There are three ways to run systemtap:

1. Read and run the script STAP [Option] document name from the document (usually with. STP as the document name suffix.

2. Read and run the script from the standard input: STAP [Option].

3. Run the script STAP [Option]-E in the command line.

4. Run the script file directly (the executable attribute is required and the # is added to the first line #! /Usr/bin/STAP):./use "Ctrl C" to stop systemtap.

Systemtap options are constantly being expanded and updated. The most common options include:

-V -- print intermediate information;

-P num -- stop after pass num is run (the default value is pass 5 );

-K -- the temporary documents are retained and not deleted after the running is completed;

-B -- use the relayfs document system to transmit data from the kernel space to the user space;

-M -- valid only when the-B option is used. Separate data documents of each CPU are not merged at the end of the operation;

-O file -- output to the document instead of the standard output;

-C cmd -- after the probe is started, run the CMD command and exit after the command is completed;

-G -- uses the Guru mode, allowing the script to embed C statements;


For more options, see the STAP manual.

Systemtap syntax

We use a simple systemtap script to introduce the syntax of systemtap:


#!/usr/local/bin/stap            global count            function report(stat) {            printf("stat=%d/n", stat)            }            probe kernel.function("sys_read") {              count            }            probe end {            report()            }

Probe: Each systemtap script must define at least one probe point, that is, specify the location of the kernel for the probe. The group of braces that follow the probe point name defines the operations that need to be run each time the kernel runs to the probe point. After these operations are completed, the probe point is returned and the following command continues. This section describes any types of probe points currently supported by systemtap.

Global: used to define global variables. The local variables used in a single probe function body do not need to be pre-defined, but if a variable needs to be used in multiple probe function bodies, it must be defined as a global variable.

Function: defines the functions required in the probe function body. In addition to defining functions in the scripting language, you can also define functions in the C language, but the braces after the function name need to be replaced with % {% }. For example, the previous report () function can be written as follows:


function report(stat) %{            _stp_printf("stat=%d/n", THIS->stat);            %}

Example of systemtap

After learning about the basic usage of systemtap, let's take a few interesting examples. Count the top 10 system calls with the largest number of calls in the current system. During performance analysis, we often need to know that the number of function calls is the most, in order to carry out targeted analysis. The following simple example can print the system calls with the most calls in the past five seconds.


#!/usr/bin/env stap            #            # display the top 10 syscalls called in last 5 seconds            #            global syscalls            function print_top () {            cnt=0            log ("SYSCALL/t/t/t/tCOUNT")            foreach ([name] in syscalls-) {            printf("%-20s/t/t]/n",name, syscalls[name])            if (cnt   == 10)            break            }            printf("--------------------------------------/n")            delete syscalls            }            probe syscall.* {            syscalls[probefunc()]              }            probe {            print_top ()            }

The output result is clear at a glance:


Let's see who is stealing my documents.

Sometimes, if we have malicious virus software, we will find that some documents have been modified inexplicably. The following example can help you monitor who is modifying your documents.


#!/usr/bin/env stap            #            # monitor who is messing my file of secrets            #            probe {            if(filename == "secrets")            printf("%s is opening my file: %s/n", execname(), filename)            }

Run this script and perform some operations in another window to view the output result:


Print ANSI string


Systemtap is not only a simple debugging tool, but also a powerful script language that allows him to do something interesting. The following example shows how to beautify the output characters:


#!/usr/bin/env stap            #            # print colorful ANSI strings            #            probe begin {            printf("a // b |");            for (c = 40; c < 48; c  )            printf("   %d   ", c);            printf("/12");            for (l = 0; l < 71; l  )            printf("-");            printf("/12");            for (r = 30; r < 38; r  )            for (t = 0; t < 2; t  ) {            printf("%d    |", r);            for (c = 40; c < 48; c  )            printf("/033[%d;%d%s %s /033[0;0m",            r, c, !t ? "m" : ";1m", !t ? "Normal" : "Bold  ");            printf("/12");            }            exit();            }

Let's take a look at his output:


Basic Principles of systemtap


Now you are familiar with the basic usage of systemtap. Before the end, let's take a look at the basic principles and workflow of systemtap to deepen our understanding.

It can be seen that the running process of systemtap is divided into five stages, which are usually called pass 1-pass 5. As mentioned above, adding the-P num option in the command line can make systemtap stop after running the pass num instead of running it to pass 5. This allows you to analyze the output of systemtap in each stage and is especially useful for debugging scripts.

The following describes the main functions of each stage:

Pass 1-Parse: This phase mainly checks whether the input script has syntax errors, such as whether the braces match or whether the variable definition is standard.

Pass 2-elaborate: in this phase, the probe point defined in the input script or the function used is expanded. Not only must the pre-defined script library of systemtap be integrated, you also need to analyze the debugging information of the kernel or kernel module.

Pass 3-translate: At this stage, convert the expanded script to the C document. The functions of the first three phases are similar to those of the compiler. The. STP document is compiled into a complete. C document, which is also called a converter ).

Pass 4-Build: At this stage, compile the C source document into the kernel module, and use the Runtime Library Function of systemtap in this process.

Pass 5-run: At this stage, the compiled kernel module is inserted into the kernel to start data collection and transmission.


Systemtap is a brand new tool, but it has demonstrated powerful functions and wide applicability. Systemtap makes it easy to dynamically collect Linux kernel information and performance data. This frees people from tedious data collection and focuses on data processing and analysis, this is undoubtedly the gospel of kernel developers and system administrators. With the increasing user experience, more and more bugs will be reported and corrected, more and more new features will be added, and systemtap will become more and more stable and complete.



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.