Stack-Buffer Overflow Vulnerability
I plan to write this article because I have read a paper on the Internet about the buffer overflow and stack destruction vulnerability to execute malicious programs. See reference 1 for this paper. This article involves some basic assembly knowledge and some basic concepts of virtual memory. Of course, the system used to debug the program is Linux and the tool is GCC. I haven't read the Assembly and C language for a long time. please correct me if there are any mistakes.
1. Summary
When the title of the article mentions stacks and buffers, we will first discuss the definitions of these terms. The buffer here refers to a continuous memory area in the computer, which can store multiple instances of the same data type. The most common buffer for C programmers is the character array. Like other variables in C, Arrays can also be declared static or dynamic. Static variables are in the Data Segment during program loading, dynamic variables are in the stack (this can be easily verified by writing a program, see exmple1.c. compile it into 32-bit assembly code using the GCC-M32-s example1.c command, view example1.s to see that the data of array a is distributed in the data segment, while the data of array B is distributed in the stack ). This article only discusses the overflow problem of Dynamic Buffer, that is, stack-based buffer overflow.
exapmle1.c-------------------------------int main() { static int a[4] = {1, 2, 3, 4}; int b[4] = {5, 6, 7, 8};}
2. Basic Knowledge
2.1 Process Memory organization form
Since this article will discuss stack-based buffer overflow, we will first look at the memory structure of the process. We all know that the structure of processes in the memory can be simply divided into code segments, data segments, and stack segments. The code segment is located at a low memory address, while the stack is located at a high memory address. Of course, the memory address here refers to the virtual address. The physical address must be converted by MMU (Memory Management Unit. The following figure shows the memory structure of a process:
Figure 2.1 Process Memory Structure
As shown in figure 2.1, apart from basic code segments, data segments, uninitialized data segments, such as BSS, heap, and memory ing areas. Of course, the concept of the section here is different from that of the section when the program is loaded. For the specific differences, see Section 18.5 ELF file format in Linux C one-stop programming.
2.2 Stack
A stack is a commonly used abstract data model in computers. Its features are first-in-first-out, and the supported operations are mainly push and pop. The push operation is to press an element on the top of the stack, while the pop operation is the top element of the pop-up stack.
Why is stack used related to modern computer design. Functions or procedure are often used for programming in advanced programming languages such as C, Java, and python ). Generally, a function call can change the execution process of a program like a jump command. After the function is executed, the control must be returned to the Code instruction after the function is executed, this implementation relies on stacks. Of course, the stack is used in the local variables of the function, as well as the parameters passed by the function and the returned values.
A stack is a continuous memory area. The stack can grow up or down, depending on the specific implementation. In most processors, such as Intel, Motorola, iSCSI, and MIPS, the stack grows downward, that is, the stack pointer SP points to the top of the stack, and the bottom of the stack is a fixed address, the stack size is dynamically adjusted by the kernel during running. The CPU implements command push and pop to add and remove elements to and from the stack.
In addition to the stack pointer sp, there is also a pointer BP pointing to the fixed address within the frame for convenience. Theoretically, local variables can be referenced by adding an offset to the sp. However, when a word is pushed into the stack or out of the stack, these offsets change. Although in some cases, the compiler can track the changes in operations in the stack and correct the offset, there are still many situations that cannot be tracked, and additional management overhead needs to be introduced to track the changes in the offset. Therefore, many compilers use the second register bp, which can be referenced by both local variables and function parameters, because the distance between local variables and function parameters to BP is not affected by the push and pop operations.
2.3 stack frame analysis in function call
In order to use buffer overflow, you need to know the stack frame changes and layout in function calls. I will not analyze them here. I have already mentioned this issue in a good article, for more information, see Section 19.1 "Linux C one-stop programming.
3. Buffer Overflow
Okay. After some preparation, let's take a look at the buffer overflow problem. First, let's take a look at the following code example1.c. We will analyze the distribution of function stack frames.
example1.c--------------------------------------------------void function(int a, int b, int c) { char buffer1[5]; char buffer2[10];}void main() { function(1,2,3);}
Run the command: gcc-S-fno-Stack-Protector example1.c. By analyzing the example1.s file, obtain the function stack frame distribution as follows (My runtime environment is a 32-bit ubuntu11.04 ):
Stack frame Distribution |
C (high address) |
B |
A |
RET (return address) |
EBP |
Buffer1 |
Buffer2 (low address) |
Next we will look at a case where a segment error is caused by overwriting the returned address. See example2.c.
example2.c---------------------------------------void function(char *str) { char buffer[16]; strcpy(buffer,str);}void main() { char large_string[256]; int i; for( i = 0; i < 255; i++) large_string[i] = 'A'; function(large_string);}
Example2.c is a typical example of buffer overflow. The strcpy copies more than 16 bytes of data, causing the overflow code to overwrite the EBP value stored in the stack and the returned address ret, when the function returns, it retrieves the returned address RET from the stack and then executes the next command. This address is invalid, resulting in a segment error. If you use a valid address to overwrite the returned address ret, you can modify the program execution process.
Next, modify example1.c, and modify the return address RET through buffer overflow to modify the program execution process. As shown in example3.c.
example3.c--------------------------------void function(int a, int b, int c){ int *ret; char buffer1[5]; char buffer2[10]; ret = buffer1 + 13; (*ret) += 8;}void main(){ int x = 0; function(1,2,3); x = 1; printf("%d\n", x);}
Run the GCC-O example3-fno-Stack-Protector example3.c command to compile. The stack frame distribution is shown as follows:
| Stack frame distribution | ------------ | C (high address) | B | A | RET (return address) | EBP | RET (partial variable RET) | buffer1 | buffer2 (low address) | therefore, you can obtain the returned RET address through ret = buffer1 + 13. Here, the addition of 13 is 5 bytes of buffer1 + 4 bytes of the local variable RET + 4 bytes of EBP. After the function is called, the returned address must be the x = 1 Instruction address. (* RET) + = 8: The returned address RET is added to 8. In this way, the X = 1 instruction is skipped, the execution result after example3.c compilation is 0. Note that-fno-Stack-Protector must be added because GCC has the stack protection technology by default, which will prevent the returned address from being rewritten. If the returned address is maliciously modified, a segment error will be reported. For the GCC compiler stack protection technology, see this article.
Next, we can execute shell code through buffer overflow. This is for the next article. The content is too long and has not been read yet.
4. References
- Stack smashing
- GCC compiler stack protection technology