Why is the strcpy function in C Language vulnerable?

Source: Internet
Author: User
Tags print format

Source: chinaitlab collection
I have studied the principle of DoS Overflow for several days, and finally understood that the principle is very simple. The key is to understand why the strcpy function of C language has vulnerabilities, why does the abnormal use of this function cause overflow.
  
Section 1: Introduce the strcpy function. People who can see this article may know that many of the problems are caused by it.
  
Take a look at the string in the Standard C language. h: the Declaration of this function char * _ cdecl stpcpy (char * DEST, const char * SRC); for the code, see the following: (this is Microsoft's description of this function)
  
(% VC %/vc7/CRT/src/Intel/strcat. ASM)
  
;***
; Char * strcpy (DST, Src)-copy one string over another
; Purpose:
; Copies the string SRC into the spot specified
; DEST; assumes enough room.
;
; Algorithm:
; Char * strcpy (char * DST, char * SRC)
;{
; Char * CP = DST;
; While (* CP ++ = * SRC ++);/* Copy SRC over DST */
; Return (DST );
;}
; Entry:
; Char * DST-string over which "src" is to be copied
; Const char * Src-string to be copied over "DST"
;
; Exit:
; The address of "DST" in eax
;
; Uses:
; Eax, ECx
;
; Exceptions:
; **************************************** ******************************
  
I wanted to remove some annotations, but I thought it would be nice to keep them :)
From the above we can see that such code has the following problems:
1. Check whether the two input pointers are valid.
2. Check whether the two strings end with null.
3. Check whether the space of the target pointer is greater than or equal to the space of the original string.
  
Okay.
Main () {J ();}
J ()
{
Char A [] = {A, B ,};
Char B [1];
Char * c =;
Char * D = B;
While (* D ++ = * C ++ );
Printf ("% Sn", B );
}
  
  
Section 2: using our c4.exe
W32dasm, debug, TCC, TC
Step 1: Use tc2 to generate the executable file c4.exe.
Step 2 Use TCC-B to generate the Assembly source code of the C code.
Step 3: Use w32dasmand debugto perform static and dynamic debugging on c4.exe
  
Analyze the c4.asm code generated by TCC as follows:
First, let's explain that this is a complete C program that includes the main function. At the beginning of the program, the Data Segment and the stack segment and the code are both different, when we execute the J function, the stack and the number segment are together. Pay special attention to this.
  
Ifndef ?? Version
? Debug macro
Endm
Endif
? Debug s "c4.c"
  
_ Text Segment byte public code
Dgroup _ data, _ BSS
Assume Cs: _ text, DS: dgroup, SS: dgroup
_ Text ends
  
_ Data Segment word public data
D @ label byte
D @ w label word
_ DATA ends
  
_ BSS segment word public BSS
B @ label byte
B @ w label word
? Debug C e930a68d2e0463342e63
_ BSS ends
  
_ Text Segment byte public code
;? Debug L 1
_ Main proc near
;? Debug L 3
Call near PTR _ j // run our J function here
@ 1:
;? Debug L 4
RET
_ Main endp
_ Text ends
  
_ Data Segment word public data // first defines our source string AB terminator in the Data Segment
DB 97
DB 98
Db 0
_ DATA ends
  
_ Text Segment byte public code
;? Debug L 6
_ J proc near
Push BP // J function entry
MoV bp, SP
Sub sp, 6
Push Si
Push di
Push SS
Lea ax, word PTR [BP-6]
PUSH AX
PUSH DS
MoV ax, offset dgroup: d @ // note that this is the offset of the source string in the data segment.
Push ax // All scopy @ the above code is used to allocate the source string and the target string so many spaces in the stack
MoV CX, 3 // Cx = 3 specify the number of characters to copy
Call far PTR scopy @ // another function is executed to copy the source string in the data segment to the stack.
;? Debug L 10
Lea Si, word PTR [BP-6]
;? Debug L 11
Lea Di, word PTR [bp-2]
;? Debug L 12
JMP short @ 3
@ 5:
@ 3:
;? Debug L 12
MoV BX, Si
INC Si
MoV Al, byte PTR [BX]
MoV BX, Di
INC di
MoV byte PTR [BX], Al
Or Al, Al
JNE @ 5
@ 4:
;? Debug L 13
Lea ax, word PTR [bp-2]
PUSH AX
MoV ax, offset dgroup: s @ // obtain the print format parameter of the printf function.
PUSH AX
Call near PTR _ printf
Pop CX
Pop CX
@ 2:
;? Debug L 14
Pop di
Pop Si
MoV sp, BP
Pop BP
RET
_ J endp
_ Text ends
? Debug C E9
  
_ Data Segment word public data
S @ label byte
DB 37 // %
DB 115/s
DB 10 // line feed :)
Db 0
_ DATA ends
Extrn scopy @: far
_ Text Segment byte public code
Extrn _ printf: near
_ Text ends
Public _ main
Public _ j
End
  
Three sections: analyze the static assembly code obtained by w32dasm, that is, the final code of the program. At the same time, we will analyze it step by step.
This is the case of the stack.
The article is written here. You may know that it's a big code. Let's analyze it first.
The code execution can be divided into three parts:
  
1. From 01fe to 020b, the example of allocating space in the stack is divided into six bytes according to the definition in the C code. There is no problem in defining the number of points.
2 far to 0000: 1395 is to put the source string in the data segment into the stack, because the number is in CX, so there is no problem here
3. The problem is that the source string is copied to the memory unit of the target string in the stack!
  
  
: 0001.01fa e80100 call 01fe // execute our J Function
: 0001.01fd C3 RET
  
: 0001.01fe 55 push BP
: 0001.01ff 8bec mov bp, SP
: 0001.0201 83ec06 subsp, 0006
: 0001.0204 56 push Si
: 0001.0205 57 push di
: 0001.0206 16 push SS
: 0001.0207 8d46fa Lea ax, [bp-06]
: 0001.020a 50 PUSH AX
: 0001.020b 1E PUSH DS
: 0001.020c b89401 mov ax, 0194
: 0001.020f 50 PUSH AX
: 0001.0210 b90300 mov CX, 0003
: 0001.0213 9a95130000 call 0000: 1395 // jump to 1395 first and execute it because it is in 0000.
  
: 0001.0218 8d76fa Lea Si, [bp-06]
: 0001.021b 8d7efe Lea Di, [bp-02]
: 0001.021e eb00 JMP 0220
: 0001.0220 8bde mov BX, Si
: 0001.0222 46 Inc Si
: 0001.0223 8a07 mov Al, [BX]
: 0001.0225 8bdf mov BX, Di
: 0001.0227 47 Inc di
: 0001.0228 8807 mov [BX], Al
: 0001.022a 0ac0 or Al, Al
: 0001.022c 75f2 JNE 0220
: 0001.022e 8d46fe Lea ax, [bp-02]
: 0001.0231 50 PUSH AX
: 0001.0232 b89701 mov ax, 0197
: 0001.0235 50 PUSH AX
: 0001.0236 e8bc08 call 0af5 // execute Print Output
: 0001.0239 59 pop CX
: 0001.023a 59 pop CX
: 0001.023b 5f pop di
: 0001.023c 5E pop Si
: 0001.023d 8be5 mov sp, BP
: 0001.023f 5d pop BP
: 0001.0240 C3 RET
// The following is our scopy @
0001.1395 55 push BP
: 0001.1396 8bec mov bp, SP
: 0001.1398 56 push Si
: 0001.1399 57 push di
: 0001.139a 1E PUSH DS
: 0001.139b 116606 lDs Si, [bp + 06]
: 0001.139e c47e0a les Di, [bp + 0a]
: 0001.13a1 FC ClD
: 0001.13a2 d1e9 shr cx, 01
: 0001.13a4 F3 repz
: 0001.13a5 movsw
: 0001.13a6 13c9 adc cx, CX
: 0001.13a8 F3 repz
: 0001.13a9 A4 movsb
: 0001.13aa 1f pop DS
: 0001.13ab 5f pop di
: 0001.13ac 5E pop Si
: 0001.13ad 5d pop BP
: 0001.13ae ca0800 retf0008
  
Now we start debug dynamic debugging:
Step D: turboc2> debug c4.exe
-G 01fe: through the search in w32dasm, we directly jump to the j entry and execute
  
Ax = 0000 BX = 0566 Cx = 000e dx = 067f sp = ffe8 BP = fff4 Si = 00d8 di = 054b
DS = 13db es = 13db Ss = 13db cs = 129f IP = 01fe NV up ei pl Zr na PE NC
129f: 01fe 55 push BP
-T
  
Ax = 0000 BX = 0566 Cx = 000e dx = 1193 sp = ffe6 BP = fff4 Si = 00d8 di = 054b
DS = 13db es = 13db Ss = 13db cs = 129f IP = 01ff NV up ei pl Zr na PE NC
129f: 01ff 8bec mov bp, SP
  
  
Since the previous command is call o1fe, there is also a pop 01fd, and then a push BP
-D ss: ffe0
13db: ffe0 FF 01 9f 12 F3 0b F4 FF-FD 01 1D 01 01 00 F2 FF ................
13db: fff0 54 05 F6 ff 00 00 43 35-2e 45 58 45 00 FB 00 T ....
  
Now let's take a look at the stack situation.
MoV bp. After sp, BP becomes ffe6.
  
Low
Ffe0 |-> sub sp, 0006 (six bytes are empty to allocate space for the source string in the stack)
Ffe1 |
Ffe2 |
Ffe3 |
Ffe4 |
Ffe5 |
Ffe6 | F4 ----> current stack top ffe6
Ffe7 | FF ----> original BP
Ffe8 | FD
Ffe9 | 01
Ffea |
High
  
  
Then press Si, Di, and SS into the stack, and SP becomes ffda.
Run Lea ax, [bp-06] Again
PUSH AX
PUSH DS
This is to put the memory address of the allocated memory space into the stack, and DS
Then execute
MoV ax, 0194 (mov ax, offset dgroup: d @) Get the offset of the string in the Data Segment
PUSH AX
MoV CX, 03
Okay. Now we have to execute our scopy @.
Call 0000: 1395 because it is a remote hop, the cs ip is pressed against the stack.
Let's take a look at the stack situation.
  
Low memory address
  
Ffd0 | 18 ----> ip, that is, the offset in the CS segment where the [bp-06] is located
Ffd1 | 02
Ffd2 | 9f ----> first press CS
Ffd3 | 12
Ffd4 | 94 ----> offset pressure stack of a string in a data segment
Ffd5 | 01
Ffd6 | dB ----> DS
Ffd7 | 13
Ffd8 | EO ----> address of the space allocated for the string
Ffd9 | FF
Ffda | dB ----> SS
Ffdb | 13
Ffdc | di ----> the purpose of pushing Di and Si to the stack is to push the number
Ffdd | data in the data segment must be used when it reaches the stack. Therefore, you must save the data first.
Ffde | Si
Ffdf |
Ffe0 | 1-> sub sp, 0006 (six bytes are empty as the source, and the destination string is allocated space in the stack)
Ffe1 | 2
Ffe2 | 3
Ffe3 | 4
Ffe4 | 5
Ffe5 | 6
Ffe6 | F4 ----> current stack top ffe6
Ffe7 | FF original BP is fff4, And now BP is ffe6
Ffe8 | FD ----> return address after J execution
Ffe9 | 01
Ffea |
  
High memory address
Now our analysis is 1/3 complete. Now let's take a look at how scopy uses the strings in the data segment.
Put in the stack.
  
Push bp to apply the previous BP (ffe6) to the stack
MoV bp, SP current SP = BP = FFCE
Push Si
Push di
When pushing ds, SP is ffc8.
Then execute
LDS Si, [bp + 06] Si is equal to FFCE + 06 = ffd4, and the data in ffd4 is the offset 0194 of the string in the data segment.
Les Di, [bp + 0a] Di is equivalent to ffc3 + 0a = ffd8, and the data in ffd8 is the first address of the string stored in the stack ffe0
After these two commands are executed, SI = 0194, DI = ffe0
Low memory address
Below are the top stack conditions:
  
Ffc8 | dB --> DS pressure stack <-- sp = ffc8
Ffc9 | 13
FFCA | di
Ffcb |
Ffcc | Si
FFCD |
FFCE | E6
Ffcf | FF
  
High memory address
  
The following seven lines of code simply move the data in the address pointed to by Si to the address pointed by di.
ClD
Shr cx, 01 (CX equals 3)
Repz
Movsw
Adc cx, CX
Repz
Movsb
  
In this way, the migration method is relatively high. First, the movsw command is used to move two commands. When only one command is used, the movsb command is used.
After the preceding command is executed
Ffe0 is
Ffe1 B
Ffe2 0
  
After the data is finished, we need to restore ds, Di, Si, and BP.
Pop DS
Pop di
Pop Si
Pop BP
After these four commands are executed, SP = ffd0 and BP are restored to the previous ffe6
Finally, the return command
Retf 8
Take a good look at this command: because it is a long jump to execute, so SP should add 4 (ffd0 + 4 = ffd4)
Add 8 (ffd4 + 8 = ffdc) to the parameter of the previous generation)
  
In this case, the stack becomes:
SP = ffdc, BP = ffe6
  
Ffdc | di ----> the purpose of pushing Di and Si to the stack is to push the number
Ffdd | data in the data segment must be used when it reaches the stack. Therefore, you must save the data first.
Ffde | Si
Ffdf |
Ffe0 | 1 A-> sub sp, 0006 (six bytes are empty as the source, and the destination string is allocated space in the stack)
Ffe1 | 2 B
Ffe2 | 3 0 (note that the source string has been correctly placed in the stack !)
Ffe3 | 4
Ffe4 | 5
Ffe5 | 6
Ffe6 | F4 ----> current stack top ffe6
Ffe7 | FF original BP is fff4, And now BP is ffe6
Ffe8 | FD ----> return address after J execution
Ffe9 | 01
  
  
Now, let's make a summary:
We can see from the above that these are all faults.
Why should we allocate six bytes of space?
Let's take a look at how I defined it in the C program:
Char A [] = {A, B ,};
Char B [1];
In fact, the space allocation rule in C is very simple, that is, the length of each string must be double
Add 1 for a single ticket
As shown above: the source is 3 + 1 = 4
Objective 1 + 1 = 2, source + objective = 4 + 2 = 6
  
I personally think that this place is very important for correct overflow, because some articles say
One more byte is acceptable, but is that true? Not necessarily. It is like this in my example!
  
Read the code later:
Lea Si, [bp-06] at this time BP = ffe6, FFE6-06 = ffe0
Lea Di, [bp-02] the same Di is equal to ffe4
After Si and Di are set, a byte is cyclically stored to put the letters in the source string into the target string.
The following code is the most important, and the problem lies here :!!!!!!
  
JMP 0220
0220 mov BX, Si (give the address of Si to BX)
INC Si (SI address plus 1)
MoV Al, [BX] (data in the memory address recorded in the Bx register is given to Al, and A is retrieved for the first time)
MoV BX, di (give DI address to BX)
INC di (di address plus 1)
MoV [BX], Al (give the character in Al to the address pointed to by BX)
Or Al, Al
JNE 0220 (skip if not 0)
  
------------ Let's take a look at the stack situation -------- BP = ffe6, SP = sp = ffdc
Ffdc | di
Ffdd |
Ffde | Si
Ffdf |
Ffe0 |
Ffe1 | B
Ffe2 | 0
Ffe3 |
Ffe4 | A (the first execution of DI = ffe4, the value in ffe4 is)
Ffe5 |
  
From the code above, I think you have seen the problem. The only criterion for judging whether to copy the code is to check whether there are any problems.
The source string size is not compared to the target string size!
  
  
Ffe0 |
Ffe1 | B
Ffe2 | 0
Ffe3 |
Ffe4 | A (the first execution of DI = ffe4, the value in ffe4 is)
Ffe5 | B
Ffe6 | F4 ----> original BP (note that in my example, ffe6 will change to 0,
Ffe7 | FF but this is only the BP of the last saved function, so the program has not encountered any errors)
Ffe8 | FD ----> main return address
Ffe9 | 01
  
Let's look at the subsequent code:
  
Lea ax, [bp-02]
MoV ax, 0197
PUSH AX
Call 0af5 // execute Print Output
Pop CX
Pop CX // The above rows are printed as the target string
  
Pop di
Pop Si // pop up Di and Si
MoV sp, BP
Pop BP
RET

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.