Use systemtap to obtain local variables of kernel functions

Source: Internet
Author: User
Tags systemtap

Use systemtap to obtain local kernel variables

Looking at the cgroup source code of the kernel over the past two days, I want to use a tool to obtain some debugging information, such as Bt and the return value of the parameter. I want to use GDB to obtain the information like debugging an application. So we have a real practice of systemtap.
Note: The Kernel used by the machine I tested is 2.6.32-220.23.1.tb704.el6.x86 _ 64 (this is the kernel of our company). The source code is 2.6.32-60.

Here we will not describe the installation of systemtap, but we need to know that to use systemtap, We must install the corresponding kernel Symbol Information debuginfo.

Systemtap supports embedding C code, which is the key to obtaining local variables in the function. Note: The-G parameter is used to run the script. All Embedded C code must use % {... %} Included (not % {...} %). In addition, the header file also needs to be included using this pair of symbols (the header file is not included in % {%} in hack Debug ); the embedded C code cannot call blocking functions.

First, I want to obtain a field of a parameter (this parameter is a struct and a pointer is passed), and the value of the parameter can be obtained using $ variable, how to obtain the value of a certain field:

function root_name:long(arg:long) %{   struct cgroup_sb_opts *opts = (struct cgroup_sb_opts *)(THIS->arg);   THIS->__retvalue = opts->subsys_bits;%}

In systemtap, the pointer value is of the long type, and our Pointer Points to struct cgroup_sb_opts. So we perform a type conversion first, and then we can use the C language to obtain its field value. Note: If this type (struct cgroup_sb_opts) is not in the header file of the kernel, you must manually define this type (you may need to recursively define all types not in the header file ). If the string type is returned, use strlcpy
(This->__ retvalue, OPTs-> name, maxstringlen. Tip: If the Embedded C language has an error or warning, the STAP output position is the location of the temporary C file it generated, rather than the location of our script. This makes debugging inconvenient, so we need to keep these temporary files (with the-K option), and then find the error location it reported, so that we can easily locate the error location of our script. The above is a simple application, because we directly obtain the value of the parameter, if you want to obtain the value of the local variable in the running?

In cgroup, each subsystem has a static global variable that stores some basic operation behaviors of the subsystem. For example, the CPU subsystem uses cpu_cgroup_subsys, The cpuset uses cpuset_subsys, and memory uses mem_cgroup_subsys, however, this variable is not found when I find the blkio subsystem according to this command rule. How can I find this variable? During the cgroup Mount process, the corresponding populate will be called Based on the Mount subsystem to create the subsystem file. For example, we mount
-T cgroup-o cpu cpu0/cgroup/CPU, BT in this process is as follows (obtained using the print_backtrace () function of systemtap ):

74447240 8388 (mkdir) call trace: 0xffffffff81054e60 :cpu_cgroup_populate+0x0/0x30 [kernel] 0xffffffff810c007a : cgroup_populate_dir+0x7a/0x110[kernel] 0xffffffff810c11fc : cgroup_mkdir+0x33c/0x540[kernel] 0xffffffff811850a7 : vfs_mkdir+0xa7/0x100[kernel] 0xffffffff8118816e : sys_mkdirat+0xfe/0x120[kernel] 0xffffffff811881a8 : sys_mkdir+0x18/0x20[kernel] 0xffffffff8100b0f2 : system_call_fastpath+0x16/0x1b[kernel]

This populate function is a member of the static variable we are looking for, so we can start from here. Let's take a look at the cgroup_populate_dir code:

static int cgroup_populate_dir(struct cgroup *cgrp){         int err;         struct cgroup_subsys *ss;          /* First clear out any existing files */         cgroup_clear_directory(cgrp->dentry);          err = cgroup_add_files(cgrp, NULL, files, ARRAY_SIZE(files));         if (err < 0)                   return err;          if (cgrp == cgrp->top_cgroup) {                   if ((err = cgroup_add_file(cgrp, NULL, &cft_release_agent)) < 0)                            return err;         }          for_each_subsys(cgrp->root, ss) {                   if (ss->populate && (err = ss->populate(ss, cgrp)) < 0)                            return err;         }         /* This cgroup is ready now */         for_each_subsys(cgrp->root, ss) {                   struct cgroup_subsys_state *css = cgrp->subsys[ss->subsys_id];                   /*                    * Update id->css pointer and make this css visible from                    * CSS ID functions. This pointer will be dereferened                    * from RCU-read-side without locks.                    */                   if (css->id)                            rcu_assign_pointer(css->id->css, css);         }          return 0;}

We know that systemtap can be specified to a row in the function, so that it can be printed through $ ss.PopulateBut because my source code and the kernel are not a version, you can only use probe to specify an accurate address, so we need to disassemble this function (in school, we use objdump to disassemble vxwork-kernel and find the address of the corresponding function symbol through nm ). There is a crash tool in Linux, which can be run directly without the Kernel File, crash> dis
Cgroup_populate_dir, which roughly summarizes the Assembly Code

Most do not understand, but you can be sure that this line is called by the above SS-> populate (SS, MRM) function, so you are writing the probe function, and this variable is saved in the Rax register. You can check this register directly:

probe kernel.function(0xffffffff810c0078) { printf("r12= %p, rax=%p\n”, register("r12"), register("rax")); print_backtrace();}

I thought I was about to find it, and finally printed it out: R12 = 0xffff880473878030, Rax = 0xffff88062986ecc0
Get this address to the symbol file (which can be generated through nm, but because systemtap exists, there will be a/usr/src/kernels/2.6.32-220.23.1.tb704.el6.x86 _ 64system in the kernel directory. the map file also contains all the symbolic information. In addition, the displayed address is not the code segment address, it should be the data segment. After some wrong attempts, I finally thought about whether the probe address is faulty, So I output an ADDR () (that is, the probe address), and found that the address is still 0xffffffff810c0000 (the starting address of the cgroup_populate_dir function), rather than the expected 0xffffffffff810c0078.

It seems that only the method from probe to 0xffffffff810c0078 can be bypassed (I think this method should be available, but I do not know it now). The function cgroup_populate_dir has only one parameter, struct cgroup, the value of the populate field can also be obtained through it, so can we directly execute this process when obtaining this parameter:

         for_each_subsys(cgrp->root,ss) {                   if(ss->populate && (err = ss->populate(ss, cgrp)) < 0)                            returnerr;         }

After expanding for_each_subsys, we get the following C embedding function:

function get_populate_addr:long(arg:long)%{ /* pure */ /* unprivileged */   struct cgroup *cgrp = (struct cgroup *)(THIS->arg);    struct cgroup_subsys *ss;   list_for_each_entry(ss, &cgrp->root->subsys_list, sibling) {       if (ss->populate)           THIS->__retvalue = (long)ss->populate;       else           THIS->__retvalue = 0;    }%}

Because we only mount one blkio sub-system, the list_for_each_entry will only be executed once and sudo STAP-v-G cgroup will be executed. tap, output: populate_addr = 0xffffffff81245fd0, and then to system. map search:

Success? Finally, I found this function name blkiocg_populate in the Code and wiped it! No. What is the situation? Then I searched for http://lxr.linux.no/linuxon the internet, and found it in 2.6.32-60. The code I run is 2.6.32-60, and the kernel I tested is 2.6.32-220.23.1.tb704.el6.x86 _ 64. The next day, I asked my colleagues in the company to find the code for testing the kernel and finally confirmed it. The static global variable blkio_subsys is found. Lesson: the Code must be consistent with the test program!

Appendix: cgroup. STP

#!/usr/bin/stap                                                                                                                                                             %{#include <linux/cgroup.h>#include <linux/ctype.h>#include <linux/list.h>struct cgroupfs_root {    struct super_block *sb;    unsigned long subsys_bits;    int hierarchy_id;    unsigned long actual_subsys_bits;    struct list_head subsys_list;    struct cgroup top_cgroup;    int number_of_cgroups;    struct list_head root_list;    unsigned long flags;    char release_agent_path[4096];    char name[64];};struct cgroup_sb_opts {    unsigned long subsys_bits;    unsigned long flags;    char *release_agent;    char *name;    bool none;    struct cgroupfs_root *new_root;};%} function root_name:long(arg:long) %{ /* pure */ /* unprivileged */    struct cgroup_sb_opts *opts = (struct cgroup_sb_opts *)(THIS->arg);    THIS->__retvalue = opts->subsys_bits;%} function get_populate_addr:long(arg:long) %{ /* pure */ /* unprivileged */    struct cgroup *cgrp = (struct cgroup *)(THIS->arg);    struct cgroup_subsys *ss;    list_for_each_entry(ss, &cgrp->root->subsys_list, sibling) {        if (ss->populate)            THIS->__retvalue = (long)ss->populate;        else            THIS->__retvalue = 0;    }%}function proc:string() {    return sprintf("%d (%s)", pid(), execname())} probe begin {    printf("starting...")} probe kernel.function("cpu_cgroup_populate") {    printf("%s cpu_cgroup_populate call trace:\n", proc());    print_backtrace();} probe kernel.function("cgroup_root_from_opts") {    printf("%s cgroup_root_from_opts subsys bit= %u\n", proc(), root_name($opts));    print_backtrace();} probe kernel.function("cgroup_populate_dir") {    //printf("addr=%p, r12= %p, rax=%p, populate_addr=%p\n", addr(), register("r12"), register("rax"), kernel_long(register("rax")));    printf("populate_addr=%p\n", get_populate_addr($cgrp));    print_backtrace();} probe kernel.function(0xffffffff810c0078) { printf("probe addr=%p, r12= %p, rax=%p\n", addr(), register("r12"), register("rax")); print_backtrace();} probe kernel.function("cgroup_test_super").return {    printf("cgroup_test_super ret:%u\n", $return);}


Refer:

/Usr/share/doc/systemtap-1.6/Examples

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.