First, Introduction
Systemtap is an open source software that troubleshoots performance or functionality issues with Linux systems. It makes it easier and easier to diagnose the Linux system at runtime. With it, developers or debuggers no longer need to recompile, install new kernels, reboot, and other annoying steps.
In order to diagnose system problems or performance, developers or debuggers only need to write some scripts and then use the command-line interface provided by SYSTEMTAP to diagnose and debug the running kernel, such trivial tasks as previously needed to modify or insert debug code, recompile the kernel, install the kernel, and reboot. Currently, the tool does not support diagnostic debugging for user-state applications, but they will be added later. Currently the main developer of the project is engineers from Red Hat, IBM, Intel and Hitachi. Where Redhat is primarily responsible for script translation/translator and runtime libraries, IBM is responsible for Kprobe and Relayfs,intel for converter security checks and Performance Monitor Tapset.
Back to top of page
Second, the principle of Systemtap
Systemtap uses a scripting language similar to awk and the C language (similar to DTrace's D language), which uses only three data types, integers (integers), strings (strings), and associative arrays (associative Arrays). It has a complete control structure, including blocks (blocks), conditions (conditionals), loops (loops), and functions (functions). Statement separators, which are optional, do not need to declare types, they are automatically inferred and checked according to the context, it uses the interface provided by Kprobe to implement probing, and for each probe, it is necessary to define the probe point and the corresponding processing function. A probe point is a function or instruction address (also known as a kernel event) that is probed in kprobe, but in Systemtap, the user can specify the original file, a line of the original code, or an asynchronous event, such as a periodic timer, and the probe point uses a hierarchical naming method. The probe point handler functions to output data immediately, similar to PRINTK, and it can also view kernel data. The script is then converted by a translator into C code and compiled into a kernel module. The probe point maps to the virtual address of the kernel based on the kernel's dwarf debug information (so SYSTEMTAP requires the user to have ready-to-use kernel debugging information), and all the script content is strictly checked at the time of conversion and also checked at run time (such as infinite loops, memory usage, recursion, invalid pointers, etc.). Therefore, there is good security and does not affect the running system (which is very important to the production system). Systemtap contains a blacklist that lists functions that cannot be probed by SYSTEMTAP because they cause problems such as infinite probing loops, lock re-entry, and so on.
The working principle of SYSTEMTAP is intuitively given:
Figure 1. Systemtap Working principle
The Systemtap script file is a. stp suffix file, and the scripting language used is the script language described earlier in Systemtap's own definition, a SYSTEMTAP script that describes the probe points to be probed and defines the associated processing functions, each of which corresponds to a kernel function or event or one of the inside of a function Reset The associated handler function is executed when the kernel executes to the corresponding probe point.
Tapsets is a script library that contains many tapset, each tapset a set of probe points, auxiliary functions, or global variables for a user script or other tapset reference for a kernel subsystem or a specific function block. Some of the data it defines can be used by each detection point processing function or script, which is usually exported by using a processing Function statement block (HSB Handler Statement block), and the variables in the HSB statement block are the exported data. Tapset is generally written by the developer of the kernel subsystem or by a developer who knows the subsystem well, using both the scripting language and the C language, and it has been tested and validated to be safe to use. Tapsets is part of the SYSTEMTAP release package.
Systemtap implements a script translator/translator, and when the user executes a SYSTEMTAP script, Systemtap will first analyze it and some security checks if it references a function provided by the Systemtap predefined script library, Systemtap will also read the script library to get the corresponding code, for some kernel variables or symbols reference, it must be based on the kernel debug information to resolve to the corresponding address. It is then converted to C code, in which SYSTEMTAP will add the necessary lock and security code as needed. The variables shared between the probe points are converted to the appropriate static declarations and are lock-protected, and each set of local variables is converted into a composite call frame structure to avoid consuming the kernel's stack space. The handler that is associated to the probe point is encapsulated as an interface function that invokes the appropriate Kprobe interface function to register the probe point.
The resulting C code contains references to runtime Tapset, and the runtime Tapset Library provides many SYSTEMTAP interface functions, such as universal query tables, limited memory management, startup, shutdown, I/O operations, and other functions. Generates a Loadable kernel module after the generated C code compiles the link. To quickly get the results of the run, Systemtap uses RELAYFS, and when the generated kernel module is loaded, the initialization function of the module initializes itself, and then invokes the probe point defined in the Kprobe interface function registration script. When the kernel runs to the registered probe point, the corresponding handler function is called, and the output statement of the user in the processing function calls the RELAYFS interface function to output the result data, and the user can also invoke some kernel performance measurement functions in the handler function. When the user actively stops or the conditions of the script set are met, the module will call the Exit function to unload the already registered probe point and do some cleanup to unload the module itself.
Systemtap initiates a process at runtime that is specifically responsible for reading the output data of the module through RELAYFS and outputting it to the user in a timely manner.
Back to top of page
Iii. comparison of Systemtap and DTrace
Project aspects:
Figure 2. Systemtap and DTrace comparison: project aspects
Languages used:
Figure 3. Systemtap vs. DTrace: Languages used
Detection capability:
Figure 4. Systemtap vs. DTrace: Detection capability
Security:
Figure 5. Systemtap and DTrace comparison: security
Graphical user interface:
Figure 6. Systemtap and DTrace comparison: graphical user interface
Back to top of page
Iv. installation of Systemtap
The prerequisites for running Systemtap are:
- The kernel supports and configures Kprobe (2.6.11 and above)
- Kernel module compilation environment (i.e. kernel header files required to compile kernel modules and module configuration information for Fedora Core or Redhat kernel-devel or kernel-smp-devel rpm packages)
- Kernel debug information (for Fedora Core or redhat refers to kernel-debuginfo RPM package)
- C compilation environment (i.e. libc Library header file and compilation toolchain)
- Have LIBDWFL elfutils (only support LIBWDFL Elfutils,systemtap to work properly, if your system elfutils older, you must download Elfutils source code package compilation, Systemtap can compile with elfutils)
- Root privileges (you must have root privileges in order to run Systemtap)
If you are using Fedora Core 4 or an updated Fedora Core version, installing SYSTEMTAP is easy:
# yum Install kernel-devel# yum--enablerepo=core-debuginfo--enablerepo=updates-debuginfo Install kernel-debuginfo# Yum Install Systemtap
Then run the following command to verify success.
# stap-ve ' probe begin {log ("Hello World") exit ()} ' # Stap-c df-e ' probe syscall.* {if (Target () ==pid ()) log (name. ") ". Argstr)} '
If you want to install the latest SYSTEMTAP, you can build it yourself with a source package, with the following steps:
1. Ensure Elfutils support Libdwlf
If your elfutils does not have a libdwlf, you need to download it.
Systemtap can build it automatically, so in this case, you only need to download the latest elfutils
ftp://sources.redhat.com/pub/systemtap/elfutils/elfutils-NNNN.tar.gz ftp://sources.redhat.com/pub/ Systemtap/elfutils/elfutils-portability.patch
, unzip the elfutils-nnnn.tar.gz and patch Elfutils-portability.patch. The specific commands are as follows:
# cd/home/yangyi# wget ftp://sources.redhat.com/pub/systemtap/elfutils/elfutils-NNNN.tar.gz# wget ftp:// sources.redhat.com/pub/systemtap/elfutils/elfutils-portability.patch# Tar zxvf elfutils-nnnn.tar.gz# CD elfutils-nnnn# Patch P0. /Elfutils-portability.patch
2. Download the Systemtap source package and unzip it
# cd/home/yangyi# wget ftp://sources.redhat.com/pub/systemtap/snapshots/systemtap-YYYYMMDD.tar.bz2# TAR-JXVF systemtap-yyyymmdd.tar.bz2# CD SYSTEMTAP-YYYYMMNN/SRC
Or
# cvs-d:p server:[email Protected]:/cvs/systemtap Login
Note: The password is Anoncvs
# cvs-d:p server:[email PROTECTED]:/CVS/SYSTEMTAP Co src# cd SRC
3. Installation
#./configure [--with-elfutils=/home/yangyi/elfutils-nnnn] [other autoconf options]# make all check# make install
Note that the option-with-elfutils with the parameter is the path of the Elfutils source package, which is not necessary if you have installed the latest elfutils.
For other Linux distributions, there is no convenient way to install them, you must build the kernel yourself and set the prerequisites for SYSTEMTAP requirements. Sometimes you might want to use the latest kernel, or you can do it this way.
To compile a kernel that supports SYSTEMTAP, you must configure these kernel options:
Kernel Hacking ---> [*] Kernel debugging [*] Compile the Kernel with debug infoinstrumentation Support ---> [*] Kprobes (experimental) general setup ---> [*] kernel->user space Relay Support (formerly RELAYFS)
You can use the following symbol grep to generate the configuration file. config to confirm the success of these configurations:
Config_debug_infoconfig_kprobesconfig_relay
If successful, they should all be y.
After booting the system with the built-in kernel, you must ensure that Systemtap is able to locate the kernel image file (i.e., vmlinux) for that core, which must be a non-compressed kernel image that does not remove debug and symbolic information (that is, the Vmlinux file in the kernel building root directory). Systemtap will be in the following three locations
/boot/vmlinux-' uname-r '/usr/lib/debug/lib/modules/' uname-r '/vmlinux/lib/modules/' uname-r '/vmlinux
Look for the kernel image, so you must make sure it is on one of these three locations. Of course, these three can be symbolic links.
You also need to create the following two symbolic links to the source tree of your kernel.
/usr/src/kernels/' uname-r '/lib/modules/' uname-r '/source
You also need to establish the following symbolic link to point to your kernel's build tree.
/lib/modules/' Uname-r '/build
For example, suppose your kernel source tree is/home/yangyi/linux-2.6.20, and your kernel build tree is/home/yangyi/kernel-build
(Note that the 2.6 kernel's build directory can be different from the source directory, specifically
Cd/home/yangyi/linux-2.6.20make o=/home/yangyi/kernel-build menuconfigmake O=/home/yangyi/kernel-buildsudo make O=/ Home/yangyi/kernel-build Modules_install Install
So the same source tree can do multiple builds, each build can be re-build as needed without affecting the other build, so we recommend that you use this way to build the kernel. )
In particular, your system must now start with the kernel you have just requested, or you must start your system with that kernel before you perform the following operations.
# ln-s/home/yangyi/linux-2.6.20/vmlinux/boot/vmlinux-' uname-r ' # mkdir-p/usr/src/kernels# ln-s/home/yangyi/linux-2 .6.20/usr/src/kernels/' uname-r ' # mkdir-p/lib/modules/' uname-r ' # ln-s/home/yangyi/kernel-build/lib/modules/' uname -R '/build# ln-s/home/yangyi/linux-2.6.20/lib/modules/' uname-r '/source
For readers using Debian Linux, you can install SYSTEMTAP using the following methods:
# apt-get BUILD-DEP systemtap# apt-get--compile source systemtap# dpkg-i Systemtap*deb
If you want to use the new kernel for Systemtap in Debian Linux, you can do this by following these steps:
1. Download and configure the latest kernel source package
# apt-get Install linux-source-2.6.20 kernel-package fakeroot# cd/usr/src# tar jxvf linux-source-2.6.20.tar.bz2# CD Linux -source-2.6.20# cp/boot/config-2.6-xxxxxxxx. # make Menuconfig
Configure the following kernel options:
Kernel Hacking ---> [*] Kernel debugging [*] Compile the Kernel with debug infoinstrumentation Support ---> [*] Kprobes (experimental) general setup ---> [*] kernel->user space Relay Support (formerly RELAYFS)
2. Add downstream to file/etc/kernel-pkg.conf
Install_vmlinux = YES
3. Build the kernel
# fakeroot make-kpkg--initrd--append-to-version=-systemtap-1.0 kernel_image kernel_headers
Note Two options:--INITRD let Kernel-package build initrd,--append-to-version will modify the kernel name that appears in the output of the command uname-a, you can specify the name that you like. The parameter kernel_image represents the build kernel image, and the parameter kernel_headers represents the build kernel header file.
4. Installing a custom kernel
Dpkg-i. /kernel-image-2.6.20-systemtap-1.0_10.00.custom_i386.debdpkg-i. /kernel-headers-2.6.20-systemtap-1.0_10.00.custom_i386.deb
Note that the name of the Deb package you build depends on the kernel you choose and the parameters that are built.
5. Copy your kernel build directory to/lib/modules/<your new kernel Version>/build
Back to top of page
Five, the use of examples of detailed
A simple example is given below to explain in detail how SYSTEMTAP works.
This STP script will output a maximum of 20 system calls per 5 seconds for the system to be called.
#!/usr/bin/env stap## This script continuously lists the top of systemcalls on the System#global Syscallsfunction Print_top () {cnt=0 log ("Syscall\t\t\t\tcount") foreach ([name] in syscalls-) { printf ("%-20s\t\t%5d\n", Name, Syscalls[name]) if (cnt++ = =) break } printf ("--------------------------------------\ n") Delete syscalls}probe kernel.function ("sys_*") { Syscalls[probefunc ()]++}# print top syscalls every 5 secondsprobe timer.ms () {print_top ()}
The first line of the script specifies the script interpreter Stap, which first parses the script and translates it into the C language code, compiles it, and links it to the library Systemtap the installation, creating a kernel module and loading it into the system. Finally start a user-state process to read data from the RELAYFS interface provided by SYSTEMTAP and display it to the screen.
#表示注释, similar to shell syntax.
Statement Global Syscalls declares that syscalls is a global variable.
Note that STP does not require statement delimiters such as semicolons.
function Print_top () defines a functional print_top, similar to the syntax of the shell.
The SYSTEMTAP implements powerful output support, with more log and printf, which printf is the same as the printf syntax in C. The Loop statement, foreach ([name] in syscalls-), indicates that the syscalls array is sorted in descending order and then iterates through each of its elements, and name is the index of the element that will be saved, in which case the system call name, and thus Syscalls is actually an associative array. It is not difficult for readers to see that the Print_top function is the largest 20 elements in the output syscalls array.
Probe kernel.function ("sys_*") defines a kprobe detection point and corresponding probe-point handler for each kernel function that begins with Sys_. A reader familiar with the kernel knows that kernel functions that begin with SYS_ are system calls. It defines the probe point function for the corresponding system call Register plus 1.
Probe timer.ms (5000) declares a 5000-millisecond timer detection point (kprobe now supports the timer probe), and the corresponding probe-point handler invokes the name and number of calls to the 20 system calls of the Print_top output system.
More language references for the STP scripting language can be found in the latest Systemtap source package, and interested readers can look at it. It is necessary to remind the script to compile the link to become a kernel module, so it runs in the kernel state, all users see after the execution of the script output is STAP-initiated user-state process through the SYSTEMTAP provided by the RELAYFS interface read from the kernel to display on the screen.
Back to top of page
Summary
This article explains in detail how Systemtap works and lets readers understand the origins of Systemtap and the similarities and differences with existing tools by comparing them with DTrace. In order to make it easy for readers to install SYSTEMTAP according to the Linux distribution they use, this article explains several ways to install it in detail. Finally, through a practical example, let the reader really understand the relationship between Systemtap and Kprobe and the mechanism of operation. This article is a series of articles "a new performance measurement and tuning diagnostic Tool under Linux-Systemtap" III, with interested readers who can read one of the series and two.
Reprint: https://www.ibm.com/developerworks/cn/linux/l-cn-systemtap3/
Performance measurement and Commissioning diagnostic Tool SYSTEMTAP under Linux