Principles and insights for building mirror overlays using Jprobe

Source: Internet
Author: User
Tags exit in

Suddenly remembered, that is 2007 on Friday in the winter, I see my old wet debug Linux heap IP layer, only see him change the logic of routing lookup, and then directly make install on the immediate effect a bit, I only know that, this logic must again change the compilation kernel. Once again, he did not compile, just like the file just compiled ... Boring work hindered my progress on the Linux kernel, until today, I still have a considerable fear of compiling the kernel, not afraid of error, but afraid of disk space is not enough, INITRD assembly disassembly and so on, too cumbersome. The reason I know that 2007 years is Friday is because I have to work overtime the next day. No one forced me. I volunteered, because I wanted to know how master did it. You can change non-module kernel code processing logic without having to compile the kernel again. The second day of the harvest is very much, not only know that he used the "mirror stack." It also earns an extra day's overtime pay. I remember the Saturday plus my wife and I went to eat a lamb steak called the masonry square hotpot. He was given a green bunny doll.

Now that doll is still in, my family small special like. It's a bunch of seemingly unrelated but coincidental things. Let me think of something to write down this weekend.
All right. Let's start from Kprobe.

Suppose I interview a Linux kernel person, ask him how to debug the kernel, he answered first to join PRINTK and then compile the last time to load the new kernel run, see DMESG, I will let him wait for a few minutes, then the personnel will tell him to let him go back to the notice. Fortunately, I did not meet such a person to let me interview to show my kettle black half bottle sloshing style. I've never met such an unkind interviewer before, and I've really said that when I was looking for a job. People also really let me go to wait for the notice, however I really wait for the notice, notice the time of entry and medical examination matters ... The purpose of this is to show a tool to debug the kernel, Kprobe.

It can dynamically change the binary instructions of the kernel address space code and then run the code snippet that you want it to run, which should probably be called binary dynamic programming! How dark the technology, completely ignoring the logic of the source code. Completely ignoring the painstaking effort of the compiler, directly to the binary machine code to change.
Kprobe works very easy, for example, you have a function func, you can insert a piece of code before and after Func is called, we assume that the Func Directive is
Begin
Go
End

Kprobe to do is to replace the begin. Change it to:
JMP Prefunc
Of course, before the replacement of the original, so as to run out of our hook function Prefunc can also jump back to the original logic. As for the complex jmp details (short jump. Relative absolute jump and so on) as well as Intel's int 3 debug mode single-step mode this article no longer repeat, the word is used well, because all these details are cumbersome, you change a non-Intel platform, you know how cumbersome these are, but for the life of those who do not change the platform, Understanding these details becomes capital, so want to know these, or go to see the snow bar, look for the high-level attitude of good questions, or diving can also be done. I think the amount of information about snow is already large enough, and basically can be found ready-made.
Although I do not advocate the details of Intel in this article, there is one exception. That's the question of the Prefunc hook function, for example I want to hook up the Vfs_write function, and its declaration is as follows:

ssize_t vfs_write (struct file *file, const char __user *buf, size_t count, loff_t *pos);
Assuming that the Prefunc hook function has the same number of references as the vfs_write, the whole logic becomes:
ssize_t prefunc (struct file *file, const char __user *buf, size_t count, loff_t *pos) {    todo_something (...);    return vfs_write (file, buf, Count, POS);}
But unfortunately, it is. Kprobe can't do that. Because it is handled based on an int 3 exception/interrupt, Intel's handling of exceptions/interrupts has a specific discipline, which is to preserve the entire context. So it has only a struct pt_regs *regs one of its parameters. That is, the entire register information.

To restore the vfs_write, you have to do a "deep parse" of the regs, and this introduces you to the platform-related hell again. Suppose you're on the X86 platform. You're going to have to correct it. Register using the specification to do a specific understanding of the ability to restore the parameters of the hook function, for X86, the parameters are stored in the stack (can also be passed through the register), to restore the hook function of the site, you have to analyze is regs->sp. I will not say the following.
To say the unfortunate, fortunately, the Linux kernel provides a mechanism for kprobe. Help you achieve the above said that should be done by your own work, this is jprobe. In general terms. The point of Jprobe is that it is actually a kprobe prefunc. Its prefunc is so realized:

Prefunc (Kprobe, regs) {    Saves the    contents of the Regs Register field save stack  //Because Jprobe uses the same stack as the hook function, it may change the contents    of the Stack Replace regs with the pointer of the IP pointer jprobe the hook    back}
Just like this, a kprobe prefunc hook function returns int 3 to normal flow, but notice that in this prefunc, the regs IP is changed to Jprobe entry function, and the stack information does not change at all. Therefore, after returning to the normal stream, the parameter information on the stack does not change, only the function that is running is changed. into a entry. After Jprobe entry run out, call Jprobe_return to restore, this return is actually again into the Int 3 exception, and then call Kprobe and a hook function to restore the scene, The Prefunc saved regs site and the stack site will be restored. is not very like setjmp and longjmp ah. Yes, almost the same!

At this end, the program enters the hooked letter, the whole process is:
Enter int 3--into the Prefunc save field and replace the IP with entry--return the modified normal stream on the same stack run entry--enter int 3--restore the original reg IP and restore the contents of the original stack--return the original run stream run the hooked function
Jprobe's entry hook function has the same number of parameters as the original hooked function. This is because their stack contents are identical. The above is the jprobe of all. Of course except for the details.
In addition to the general principle. One of the notable details is that. A process switch can occur in the Jprobe hook function. Because it is actually running in a normal stream. Just this normal flow is changed, and in the Kprobe hook function, it is not possible to preempt. In essence, it is still running in the INT3 exception/interrupt handler function.
So, what can we do with this jprobe? Suppose you really read my intentions. So what I'm trying to say is probably what you're thinking, that is, using jprobe to implement a mirror stack, I'll first paste the code snippet:

static struct Jprobe Steal_jprobe = {    . Entry   = Steal_ip_local_deliver,    . KP = {    . symbol_name    = "Ip_ Local_deliver ",    }};int steal_ip_local_deliver (struct Sk_buff *skb) {    if (skb && Skb->mark = 1004) { C9/>ip_local_deliver_finish (SKB);    }    Jprobe_return ();    return 0;}
This code may have expressed my purpose. That is, from Ip_local_deliver, the packet will no longer be processed by the native Linux protocol stack. Instead, it was stolen from my steal_ip_local_deliver, within which I was able to implement my own protocol stack processing logic, but for simplicity I simply called Ip_local_deliver_finish to pass the packet directly around Nf_hook.


However, when you really run the above code, you get a ruthless panic!.

Since the steal function called Ip_local_deliver_finish, it went all the way to the socket layer, SKB has been free. Because a stack of data is shared and SKB is simply a pointer to the SKB data, the value of the SKB field is completely unavailable after returning to the normal ip_local_deliver. What we need to do is to block this running flow inside the steal function, but von Neumann machine is a serial processor, and Unix/linux's running flow is distributed by fork. Which means you're not going to be able to stop it. No matter what a running stream, unless you call exit, you can't exit in SOFTIRQ because you don't know which task_struct! to borrow. In order to stop panic, you can only:

int steal_ip_local_deliver (struct sk_buff *skb) {    if (skb && Skb->mark = 1004) {        Ip_local_deliver_ Finish (Skb_copy (SKB, gfp_atomic));    }    Jprobe_return ();    return 0;}
After doing so, the ip_local_deliver_finish in steal is just a copy of the SKB. After returning to normal ip_local_deliver. The original SKB is still available. However, this will fork a data stream into two, for the TCP protocol, the TCP logic will voluntarily discard the repeated, but for a data stream such as UDP or ICMP. will receive two copies of the data, one from the normal protocol stack, and a protocol stack from steal. Now the problem is. How to stop normal protocol stack processing.
The way to take it for granted is to return the normal ip_local_deliver directly to 0. This is actually the right thing to do. Now we're back to the beginning. Worship that Yin trick, that is binary dynamic programming! Can I get rid of the function of the hook? The idea is very clear, and the next step is to find a way to solve the problem, I define a stub function:
int stub (struct Sk_buff *skb) {    return 0;}
All I have to do is to change the command to call Ip_local_deliver after returning to the original normal stream, and to implement the dynamic binary instruction change. Go deep into the kprobe details and you should know that the kprobe structure consists of a field:
    /* Copy of the original instruction */    struct ARCH_SPECIFIC_INSN ainsn;
I even put on the stare, because this province I explained, pay attention to the name. AINSN in A is the meaning of arch, this extra layer for the upper block of the platform-related details, for X86, it is:
U8 *INSN;
Yes. A series of binary instructions, it is very obvious that the instructions saved here are definitely jmp Ip_local_deliver, because the purpose of this instruction is to jump back to the original running stream. I just need to change it to a jmp stub. So. In Jprobe's entry hook, change the ainsn.insn of Kprobe to a jmp stub, and then return Ip_local_deliver to the stub in order not to affect the unrelated running stream, and then change Kprobe's ainsn.insn back.


The task of the next pull is to look for instructions, the front said, rather than looking at the tome of the full English of the Intel manual, rather than directly see the snow. I'm not against looking at Intel's manual. But it's a bit too much to go in for such a simple question. Look at the snow on the content is very much very full. I tried two different ways:
Mode 1: Short jump, instruction code 0xFF 0x04 $ small End reverse order stub function address
Failed! Not tired, because my purpose is not to clear Intel's instruction set.

It's just a little bit out of my mind. Already started flat memory mode, how do people use long jump now ah! In any case, in a different way.
Mode 2: With the help of registers.

That is, MOV Rax $ small End reverse the stub function address; JMP Rax; The instruction code is 0x48 0XB8 $ small End reverse the stub function address 0xFF 0xE0.
This is a success.

No cheering, no celebration. Because this is just a link.


       complete code such as the following:

#include <linux/kernel.h> #include <linux/module.h> #include <linux/kprobes.h> #include <linux/ Hardirq.h> #include <linux/skbuff.h>//ip_local_deliver_finish address found from/proc/kallsyms// I just want to call finish directly in the Jprobe function, attempting to skip Nf_hook#define func 0xffffffff812b70f3int (*f) (struct sk_buff *);//Save global variables. Because kprobestruct kprobe *k = NULL cannot be taken from the steal hook function, #define jmp_code_size 12#define addr_size sizeof (void *) U8 Saved[ma X_insn_size] = {0};//Note, don't be too concerned with the specifics of the following binary scripts! The main meaning of the understanding can be: the address into the register, JMP to that place U8 jmpcode[jmp_code_size] = {0x48, 0xb8, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xFF, 0xe0};int stub (struct Sk_buff *skb) {memcpy (K-&GT;AINSN.INSN, saved, MA    X_insn_size); return 0;}        int steal_ip_local_deliver (struct Sk_buff *skb) {if (SKB && Skb->mark = 1234) {//Save the original replacement script first.        memcpy (Saved, K-&GT;AINSN.INSN, max_insn_size);        Replace with the script for the jmp to steal function.        memcpy (K-&GT;AINSN.INSN, Jmpcode, jmp_code_size); Call your own function, for the sake of simplicity, I just call the Ip_local_dEliver_finish. (*f)        (SKB);    After returning from here, since the script has been replaced with the steal function stub, it will not//return to the normal ip_local_deliver.    } jprobe_return (); return 0;} static struct Jprobe Steal_jprobe = {. Entry = Steal_ip_local_deliver,. kp = {. Symbol_name = "Ip_local_deli    Ver ",}};static int __init jprobe_init (void) {int ret;    int i = 0, j = 9;    unsigned long addr = (unsigned long) &stub;    ret = Register_jprobe (&steal_jprobe);        if (Ret < 0) {PRINTK ("Register_jprobe failed:%d\n", ret);    return-1;    } k = &steal_jprobe.kp;    f = func; Populates an array of Jmpcode instructions based on the address of the stub function for (i = 0; i < addr_size; i++, j--) {jmpcode[j] = (addr&0xff00000000000000        ) >>56;    Addr <<= 8; } return 0;} static void __exit jprobe_exit (void) {unregister_jprobe (&steal_jprobe);} Module_init (Jprobe_init) module_exit (jprobe_exit) module_license ("GPL");
This is the principle of a mirrored protocol stack that I saw a few years ago. Although Linux is very difficult to compile the entire network stack into a module directly through make config, we can build a network protocol stack module by hand, simply by compiling the Net/ipv4 folder into a module. Then use Jprobe to hook up the underlying function of NETIF_RECEIVE_SKB and import control into our own stack module. In the case of a serial-processing machine like von Neumann, the contention is control, only you have the CPU. That control belongs to you once you have control over it. You can not only delete and change the data in memory, but also can delete and change the code in memory, because the data and code are in memory ...
The best thing about Kprobe's documentation is the documentation/kprobes.txt that comes with the Linux kernel.


This paper interprets the implementation principle of a mirror protocol stack. But at the same time, a Linux kernel debugging method is also shown. That is the use of kprobe above the Jprobe for debugging, in fact, based on Kprobe debugging tools very much, such as Systemtap, but personally think. Before you pro-actively hand step by step to write a native Jprobe module. Or do not use those tools as well, because it takes a lot of time and effort to just be familiar with the way the tools are used. And assuming that the underlying principle is not understood, even learning how to use the tool will be very quickly forgotten. Maybe I'm too old-fashioned. But I always remember what the teacher who taught computer programming said. Do not use the IDE before compiling a complete program in person with the command line.

It's much easier to Kprobe and jprobe, and then to learn the tools that are based on their encapsulation. Once learned is more difficult to forget.


PostScript: About Panic
Programming and life compared to its pleasure lies in the panic after the reset! No matter how big a mistake you make (paragraph error?) Stack overflow? Be penetrated? Got a piece of crap? No matter how much regret you have (/etc/sysctl.conf file added kernel.panic = 1 after forgetting sysctl-p ... The point is I am the machine of the remote company ... Hateful without the support of IPMI!! ), after reset everything becomes mist!

Suppose there's something to pass, panic it, then reset!


Time passed too fast, from 2007 to the present, but also the blink of a swing. How I wish I could be someone like my old wet. In fact, I am also by virtue of such a simple worship and the network of curiosity and a step by step to the present, the level is not what the pinnacle, but at least from the rookie step by step, now at best is only a fat veteran.

No reason, all of a sudden, I from the past memories, years, Ah!

Copyright notice: This article blog original article. Blogs, without consent, may not be reproduced.

Principles and insights for building mirror overlays using Jprobe

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.