Disassembly and Analysis of the Startup Process of c program
0x01 tool preparation
1. The simplest c code is one,
Int main (){
Return 0 ;}
2. ollydbg
3. VC ++ 6.0
4. GCC (mingw)
0x02 code analysis
Int main ()
{
Return 0;
}
Add the-nostdlib compilation option under gcc, that is, the linker does not link to the standard library, and the following error message is displayed:
D: \ Backup \ My document \ src> gcc main. c-nostdlib-o main.exe
C: \ release E ~ 1 \ ADMINI ~ 1 \ LOCALS ~ 1 \ Temp \ ccmSU3wr. o: main. c :(. text + 0x9): undefined re
Ference to '_ main'
Collect2.exe: error: ld returned 1 exit status
For the-nostdlib compilation option, only the items specified by the command line are passed to the linker. Neither the standard Startup File nor library is passed to the linker. This option is implicitly enabled with options-nostartfiles and-nodefaultlibs. This option can also be used to write -- no-standard-libraries.
After gcc executes the Assembly, when only option-nostartfiles is enabled in the link section, the result is normal and no error message is displayed. Many error messages are prompted in the-nodefaultlibs option.
The main function depends on some system standard library files. Some functions are required during the link, for example, pre_cpp_init, check_managed_app, pre_c_init, _ tmainCRTStartup, _ InterlockedCompareExchangePointer, duplicate_ppstrings, WinMainCRTStartup, mainCRTStartup, _ timeout ....
The _ main in the Assembly is the main in the C language, because the assembler and the C compiler name the symbol differently by an underscore.
The linker searches for the _ start symbol in the system standard library file, similar to the/lib/crt2.o file, and then creates the heap object and stack in _ start, open the device provided in advance, pass the argv and argc parameters into the main function, and then call the main function.
0x03 disassembly Analysis of vc main Function
1: int main ()
2 :{
00401010 push ebp // save EBP on the stack
00401011 mov ebp, esp // The reference point for giving the current position of the stack to EBP to store values in the stack structure
00401013 sub esp, 40 h // allocate space
00401016 push ebx // Save the data segment Value
00401017 push esi // source address pointer
00401018 push edi // target address pointer
00401019 lea edi, [ebp-40h] // load a valid address to get pointers to local variables and function parameters. Here the [ebp-40h] is the base address and then shifted down to 40 h, that is, the starting address of the space previously mentioned for the local variable; load this value into the edi register to get the address of the local variable
0040101C mov ecx, 10 h // store 10 h in the ecx register
00401021 mov eax, 0 CCCCCCCCh
00401026 rep stos dword ptr [edi] // initialize the local variable space, ds: [edi]
3: return 0;
00401028 xor eax, eax
4 :}
0040102A pop edi // restore all register values
0040102B pop esi
0040102C pop ebx
0040102D mov esp, ebp // restore Stack
0040102F pop ebp
00401030 ret // return to the source EIP address
Check the call stack by Vc. You can see that the mainCRTStartup function is also started before the main function. This function is the multi-byte encoding startup function in the Console environment. The mainCRTStartup function is called at address 7c816fd7 in kernel32.dll.
Main () line 2
MainCRTStartup () line 206 + 25 bytes
KERNEL32! 7c816fd7 ()
0x04 ollydbg disassembly Analysis
Od loading ,.
Stack window.
Through the stack, we can see that kelnel32 calls the Entry function (mainCRTStartup). For od, the main function is not an Entry point, but a mainCRTStartup function.
Go one step until 00401146. od analysis calls the GetVersion function to obtain the version number of the current running platform. Because it is a console program, the system obtains the version number of ms-dos.
Continue to the single step, one step to 0040119E, one step into, you can see that HeapCreate applied for heap space function, the size is determined by the passed parameters, and the call contains HeapDestroy destroy heap function. Therefore, 0040119E is used to initialize the heap space ,.
At 004011C0, the od analysis is the GetCommandLineA function, which obtains the first address of the command line parameter information.
After entering the call below, you can see the GetEnvironmentStringsW and GetEnvironmentStrings functions to get the first address of the environment variable ,. Return to the register and stack in Unicode encoding form, and use the WideCharToMultiByte function to convert the Unicode string to a multi-byte string,
Read c ++ disassembly and Reverse Analysis Technology Secrets, read chapter 3 to understand the startup function, find the user entry, and make some preparations before learning the main function, in addition, the C language program entry function learned last semester is not a main function, but a _ start function, which leads to some thinking about what happened to the compiler during program compilation and system execution, therefore, we want to perform some analysis on instances. In the process of thinking, some involve the compiler knowledge, including how it works, and how it is linked after compilation. This part of content is not very familiar, in this aspect, you must master the compilation principles, and learn the relevant content of the compiler. Those things have not been learned, so there are some shortcomings. I have a better understanding of some of the content of disassembly, and can have a deeper understanding of some of the relatively underlying things, including data exchange between stacks, stacks, and registers. In addition, IDA is not used as an artifact. It is better to use IDA to analyze some functions statically.