Using Kprobe to detect variables in the kernel

Source: Internet
Author: User

There is a problem today that needs to be probed into the size of the buffer cache block in the kernel. I think of kprobe this magical tool, and very good detection of the value of the kernel variables, very convenient, here to share.

The use of DD and other tools to write equipment, is the need to go through the block equipment layer buffer cache, when the request block size is smaller than the buffer cache block_size, the Linux strategy is to first need from the disk load data to buffer cache, The newly written "Local data" is then written to the buffer cache. When this step is completed, the entire buffer cache is identified as dirty, mounted on the radix tree that the device belongs to, and then periodically wakes up the background writeback thread to refresh dirty block to disk. Today, linux-3.2 and linux-2.6.23 in the order of the comparison test, found that the request size between 512 to 2048, Linux-3.2 performance than Linux-2.6.23 worse. The performance characteristics obtained after testing seem to be related to the block size of buffer cache, so I used kprobe to detect and verify the block size of two versions.

To detect this value, we first need to find a suitable detection point, based on the results of the Code analysis, I choose to call the Create_empty_buffers function in the __block_write_begin function when the opportunity point, using kprobe insert a piece of code, Prints the value of the buffer cache block_size. The source code for the location of the probe point is as follows:

int __block_write_begin (struct page *page, loff_t pos, unsigned len,  
        get_block_t *get_block)  
{  
    ...  
     
    blocksize = 1 << inode->i_blkbits;  
    if (!page_has_buffers (page))  
        create_empty_buffers (page, blocksize, 0);  
    Head = page_buffers (page);  
     
    Bbits = inode->i_blkbits;  
    Block = (sector_t) page->index << (page_cache_shift-bbits);  
}

Through the function above, we know that blocksize is the buffer cache block size, so we can intercept the Create_empty_buffers function, print the incoming second parameter can get the buffer cache block size value. The Intercept create_empty_buffers function is very simple, and the memory address of the Intercept function can be obtained by kallsyms_lookup_name function or/proc/kallsyms. The key problem is that after we intercept the function, we get his second argument, which is related to the parameter pass problem of the function.

On the x86_64 platform, the parameters of Linux are passed through the following 9 registers, respectively: Rdi,rsi,rdx,rcx,rax,r8,r9,r10,r11. In the Pre_handler function, we can get the Register group variable, through the Register group variable, we can get the second parameter value passed by the Create_empty_buffers function through the RSI register. For the Linux-2.6.23 version, the registers in the function call process are defined as follows in the stack:

struct Pt_regs {  
    unsigned long R15;  
    unsigned long R14;  
    unsigned long R13;  
    unsigned long R12;  
    unsigned long RBP;  
    unsigned long RBX;  
/* Arguments:non Interrupts/non tracing syscalls only save upto here*/unsigned
    long R11;  
    unsigned long R10;  
    unsigned long R9;  
    unsigned long R8;  
    unsigned long rax;  
    unsigned long RCX;  
    unsigned long RDX;  
    unsigned long RSI;  
    unsigned long rdi;  
    unsigned long Orig_rax;  
/* End of arguments * * *
CPU exception frame or undefined * *
    unsigned long rip;  
    unsigned long CS;  
    unsigned long eflags;  
    unsigned long RSP;  
    unsigned long SS;  
/* Top of Stack page *
/};

For the Linux-3.2 version, the register has the same organizational structure, but the name definition is different, and the new version of the Register is defined as follows:

struct Pt_regs {  
    unsigned long R15;  
    unsigned long R14;  
    unsigned long R13;  
    unsigned long R12;  
    unsigned long BP;  
    unsigned long bx;  
/* Arguments:non Interrupts/non tracing syscalls only Save up to here*/
    unsigned long R11;  
    unsigned long R10;  
    unsigned long R9;  
    unsigned long R8;  
    unsigned long ax;  
    unsigned long CX;  
    unsigned long dx;  
    unsigned long si;  
    unsigned long di;  
    unsigned long orig_ax;  
/* End of arguments * * *
CPU exception frame or undefined
    /unsigned long IP;  
    unsigned long CS;  
    unsigned long flags;  
    unsigned long sp;  
    unsigned long SS;  
/* Top of Stack page *
/};

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.