Four practical procedures in the series of learning and writing compression Shells

Last Update:2013-11-20 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Familiar with the concepts of data and commands, as well as loading concepts related to the early stage (completed) and learning to write and compress the shell series.
2. Structure Analysis of pe files (completed) learn to write compression shell Series II Master PE Structure, smooth
3. pe File load process (completed) learn to write compression shell experience Series 3 simulation loading, step by step
4. Shell processing (completed)

We need to understand the compression shell processing PE method, that is, to load the PE according to the normal method, then perform various operations in the memory, and then write it back to the disk. This process is briefly described:

Load the PE file to be compressed

When simulating system modification of pe files, there are two methods:

1. Directly read the file and copy it according to each structure. There is a problem here. As described in the first article, the disk file alignment granularity is different from the memory file alignment granularity. In PE files, each data structure depends on the offset location, while the relative offset is exactly the offset of the relative base address in the memory, and there is an alignment granularity conversion. The first article also provides the file offset to memory offset function. Before locating each structure, you need to call this function to convert the specified offset.

2. Directly simulate the system loader loading method. Similar to the system loading method described in the previous section in article 3. In a memory space, read the structure of each part of the pe according to the given structure to be loaded into the memory offset, fill the offset specified in the memory. This direct filling method avoids the memory offset converted from file offset, which is clear and intuitive. When simulating system loading, you also need to use a conversion function of its granularity. Unlike in method 1, this conversion is only for filling because of the gap between its granularity and the actual size, to meet the needs of filling the actual pe into the memory size.
Code:
AlignmentNum (DWORD address, DWORD Alignment)
{
Int align = address % Alignment;
Return address + Alignment-align;
}

After obtaining the hFile and NT header NtHeader of the target file, you must first know the following parameters in the target PE Header to simulate the PE loading:

DWORD MemAlignment = NtHeader. OptionalHeader. SectionAlignment; // memory Granularity
DWORD FileAlignment = NtHeader. OptionalHeader. FileAlignment; // disk memory Granularity
DWORD PeSize = AlignmentNum (NtHeader. OptionalHeader. ImageBase, MemAlignment); // total load size

You can open up space based on the total size of the memory.
Char * pMemPointer = (char *) GlobalAlloc (GMEM_FIXED | GMEM_ZEROINIT, PeSize );
DWORD SecNum = NtHeader. FileHeader. NumberOfSections; // Number of read sections

DWORD SizeOfHeader = NtHeader. OptionalHeader. SizeOfHeader; // read the size of the PE Header
ReadFile (hFile, pMemPointer, SizeOfHeader, NULL, NULL); // read the PE Header from the file

After obtaining the segment header pseheader header, you can load it.

For (I = 0; I <SecNum; I ++)
{
// Locate the first disk offset of the file
SetFilePointer (hFile, pseheader header. PointerToRawData, NULL, FILE_BEGIN );
// Read the entire memory and convert it to the memory Granularity
ReadFile (hFile, (char *) pMemPointer + AlignmentNum (pseheader header. VirtualAddress, MemAlignment), pseheader header. SizeOfRawData, NULL, NULL );
// Go to the next table after reading
Pseheader header ++;
}
After the PE is loaded, the layout of the entire pe file in the memory is clear at a glance. In fact, the key point of the shell is how to restore the pe files processed in some way to the layout of the previous format, so that the system loader can load the files as before. As we can see, the shell code we write is done by the system loader, and the processed PE is restored to the previous one, then the system loads the pe again to know that the file is running normally. Because the compressed pe is no different from the original pe for loader. The shell is about how to change the compressed files stored in the disk, arrange the layout in the memory, and change the layout to the memory layout of the uncompressed pe.

Processing of imported tables
After compression, pe files usually need to be restructured and imported into tables. Many compression shells only provide the import functions they need. Then, simulate the loader and use the saved or decompressed import table to locate each item and complete the filling.
There are two ways to import tables:

1. Copy the original imported table data, clear the original imported table, and compress the data. The import table has two key structures: name and FristThunk. Generally, you only need these two key items to locate and fill the IAT. 2. save the original import table RVA and compress the import table. After the partition containing the import table is decompressed, the shell simulation system loader finds the IID and FristThunk items of the original import table, load the specified module by name using the import function LoadLibraryA provided by the shell, and search for the function name pointed to by FristThunk through GetProcAddress to locate the actual address of the function to be imported.

Process relocated tables
The relocated data is generally useless to the EXE, And the relocated information contained in the EXE can be deleted. For DLL, the relocation information is critical. After compressing the relocation information, for the DLL that has been compressed, the relocation information specified in the data directory is no longer accurate. At this time, you can clear the directory of the PE relocation table after the loaded memory is loaded. When decompression is complete, the shell will simulate loader based on the previously saved address to automatically fill in the relocated data.
Code:
// Save the original data of the relocation table
If (NtHeader. OptionalHeader. DataDirectory [IMAGE_DIRECTORY_ENTRY_RESOURCE]. Size> 0)
{
IsRESOURCE = TURE;
DWORD BASERELOC_VA = Headers. OptionalHeader. DataDirectory [IMAGE_DIRECTORY_ENTRY_BASERELOC]. VirtualAddress;
DWORD BASERELOC_SIZE = Headers. OptionalHeader. DataDirectory [IMAGE_DIRECTORY_ENTRY_BASERELOC]. Size;
}
// Clear the relocated table directory
NtHeader-> OptionalHeader. DataDirectory [IMAGE_DIRECTORY_ENTRY_BASERELOC]. VirtualAddress = 0;
NtHeader-> OptionalHeader. DataDirectory [IMAGE_DIRECTORY_ENTRY_BASERELOC]. Size = 0;
Process resources
Some resources cannot be compressed, such as version information and icons, because these items must be read and displayed by the system before the pe is loaded into the memory. To facilitate the processing, you can copy the entire resource segment to the compressed file. All the resource segments are not compressed and only a transplantation of resource items. When loading to memory, you can import and locate the original resource directory information. Before compression, you can define a global variable BOOL IsRESOURCE to determine whether a compression section exists.
Code:
// Save the information of the original PE Resource Directory table
If (NtHeader. OptionalHeader. DataDirectory [IMAGE_DIRECTORY_ENTRY_RESOURCE]. Size> 0)
{
IsRESOURCE = TURE;
DWORD RESOURCE_VA = NtHeader. OptionalHeader. DataDirectory [IMAGE_DIRECTORY_ENTRY_RESOURCE]. VirtualAddress;
DWORD RESOURCE_SIZE = NtHeader. OptionalHeader. DataDirectory [IMAGE_DIRECTORY_ENTRY_RESOURCE]. Size;
}
// Store the VA and SIZE of the original resource directory. After the file is loaded into the memory, it is directly copied to the compressed data section without processing.
Compression Section

Placeholder Area
A placeholder is also a block, which is the place to save the data that is decompressed and restored to the original PE. After upx compresses the PE, the first upx0 segment is a placeholder segment. The placeholder segment can be understood as the working range of shell code. When compressing the original pe, the shell is bound to save two key data structures: 1. Data address after compression 2. Address to be decompressed. The address is the placeholder area. In the PE Structure, the segment structure has four important parameters: VSize (Virtual size), VOffset (virtual address), RSize (actual size), and ROffset (Disk address ). For placeholder segments, VSize should be the sum of VSize of all segments to be compressed. If VOffset follows the architecture described earlier, it is the address of the first segment to be compressed. Looking back, since it is called a placeholder, How is it a placeholder? The placeholder here actually refers to the occupied memory space after loading. It is only a notification loader. when loading the pE to the memory, a memory segment should be allocated from the VOffset according to the VSize. What about RSize and ROffset? RSize is set to 0, and ROffset can set the Offset of the next segment, because this segment does not need to map any data to the memory, but only notifies the loader to allocate memory. In other words, this section is "No ontology" (same as the intelligence left by naruto when it was used to transmit the information of Paine :).

Compressed function Encapsulation
The compression function here describes how to compress data in the APlib library. aplib provides a compression interface function aP_pack, because before using this function, you need to call the aplib library for initialization, here, we will refer to the functions provided in the book "computer virus secrets:
Code:
// PSource compression source, lInLength data size, lOutLenght judgment macro
Compress (PVOID pSource, long lInLength, OUT long & lOutLenght)
{
// Packed stores the space of compressed data, and workmem is the space needed to complete compression
BYTE * packed, * workmem;
If (packed = (BYTE *) malloc (aP_max_packed_size (lInLength) = NULL |
(Workmem = (BYTE *) malloc (aP_workmem_size (lInLength) = NULL)
{
Return NULL;
}
// Call the aP_pack compression function
LOutLenght = aP_pack (pSource, packed, lInLength, workmem, NULL, NULL );
If (lOutLenght = APLIB_ERROR)
{
Return NULL;
}
If (NULL! = Workmem)
{
Free (workmem );
Workmem = NULL;
}
Return packed; // return the Save address.
}
Segment Processing
To save the information of each compression segment, you can define a struct to save the information of the compression segment.
Typedef struct _ CompessSection
{
Dword va;
DWORD CompessVA;
DWORD CompessSize ;//
LPVOID lpCompessData ;//
} CompessSection, * PCompessSection;

First, extract the segments that cannot be compressed, and then, based on the construction of the compression shell, determine the address to be written to the compression segments, which should be immediately after the placeholder range. Here, the global variable pMemPointer is the base address of the previously obtained loaded memory.
// PeSectionHeader points to the first address of the last segment of the placeholder
DWORD LastSecRva = PeSectionHeader. VirtualAddress;
// Size of the last section after memory alignment
DWORD LastSecSize = AlignmentNum (pPeSectionHeader [m_iSecNum-1]. Misc. VirtualSize,
MemAlignment );
// Obtain the address saved after the segment Compression
DWORD CompressRva = LastSecRva + LastSecSize;
// Obtain the number of segments to be compressed
DWORD CompressSecNum = SecNum;
// If a resource segment exists, this segment is not compressed, and the number of segments to be compressed is reduced by 1
If (IsRESOURCE = TURE)
{
CompressSecNum --;
}

// Configure the compression Information
M_pComSec = new CompessSection [CompressSecNum];
Int iPos = 0;
Int j = 0;
For (unsigned int I = 0; I <SecNum; I ++)
{
// Skip the Resource Directory and the relocation directory
If (I = 2)
{
IPos ++;
Continue;
}
Else
{
If (mPeSectionHeader [iPos]. SizeOfRawData = 0)
{// Empty segments are not compressed
IPos ++;
Continue;
}
Long lCompressSize = 0;
PVOID pCompressData;
PVOID pInData = (BYTE *) pMemPointer + PeSectionHeader [iPos]. VirtualAddress;
// Call the Compress Function
PCompressData = Compress (pInData,
PeSectionHeader [iPos]. Misc. VirtualSize,
LCompressSize );
// Compress data pointer
M_pComSec [j]. lpCompessData = pCompressData;
// Decompressed memory address
M_pComSec [j]. VA = PeSectionHeader [iPos]. VirtualAddress;
// The memory address after the compressed data is loaded
M_pComSec [j]. CompessVA = CompressRva;
// Compressed data size
M_pComSec [j]. CompessSize = CompressSize;
// The address for loading compressed data in the next section
ICompressRva + = AlignmentNum (CompressSize, MemAlignment );
IPos ++;
J ++;
}
}
}
Create an import table for the Shell
Because the compressed pe file is completely new to the previous pe file, when constructing the compressed pe file, you only need to construct the import table used when the shell code calls the function. As mentioned above, when simulating loade loading pe, use the import function LoadLibraryA provided by the shell to load the module specified by name, and use GetProcAddress to search for the function name pointed to by FristThunk to locate the actual address of the function to be imported. Then, after filling in the IAT, you need to change the relevant data in the original PE Header back, and the original loaded pe memory cannot be written, virtualProtect is required to change the attribute of the memory address of the loaded pe Header. All three functions are exported using kernel32.dll. This completes a new import table. Fill in the constructed import table address when generating the compression PE.
IMAGE_IMPORT_DESCRIPTOR corresponding to kernel32.dll:
Code:
Dword OriginalFirstThunk // Reserved
Dword TimeDateStamp // Reserved
Dword ForwarderChain // Reserved
Dword Name1 // kernel32.dll
Dword FirstThunk // points to LoadLibraryA, GetProcAddress, and VirtualProtect function addresses RVA
The RVA address FirstThunk can be fixed after construction. The three necessary functions have been determined, so the structure of the entire import table will not change. When filling, we organize the format of the data to be filled according to the definition in MS for the import table:
Code:
IAT: DWORD (LoadLibraryA) DWORD (GetProcAddress) DWORD (VirtualProtect) DWORD (set to 0)
IID: sizeof (IMAGE_IMPORT_DESCRIPTOR) * 2
INT: DWORD * 4 (same as the IAT structure)
String: sizeof ("LoadLibraryA" + "GetProcAddress" + "VirtualProtect" + 3*2 + "kernel32.dll") // each function contains the sequence number structure of the word-sized function, all must be 3*2 bytes
According to this structure, you can directly construct a new import table and save the starting memory address of the returned operation.

Settings of raw PE Information
The shell code always needs to restore the original pe, according to the information saved below, after the shell code correctly decompress the PE into the memory, set it back, after modification, jump to the original entry point. Several pieces of data need to be saved.

1. Locations saved before each section is decompressed
2. Position of each section in the memory after decompression
3. original import table address
4. Original relocated directory table
5. Original Resource Directory table
6. Original loading base address
7. Original entry point

Segment Integration
After the compression is performed according to the above method, the Section will be more than the original file. Then we can combine the compressed sections to separate the sections according to the compressed section, the unzipped section, and the shell code section. The fusion here cannot move the compressed section, because all the previous addresses have been saved, and the subsequent shell loading needs to be located through these addresses, moving may cause an offset error. The integration here only tells the system that the number of compressed pe segments must be reduced. On the surface, the number of segments is indeed "reduced". In fact, there is no substantial change in the location of the loader.
You can use the virtual address and disk address of the first CIDR block as the virtual address and disk address of the CIDR block, and add the virtual size of all CIDR blocks, the total disk size is used as the virtual size and disk size of the CIDR block. Modify the names of related sections in the section header and delete the redundant section headers. Detailed code is provided in chapter 15 of encryption and decryption section. For details, refer. So far, all the data loaded in the memory has been fully filled. Now, you only need to sequentially load the data in the memory and save it to the disk.

Shell loader Processing
The shell loader must deal with the compressed pe file. Its task is to decompress the pe file and simulate the system laoder to load the PE to the memory, after the task is completed, write the saved original PE information back to the pe file in the loaded memory and jump to the original oep for execution.
Here, the shell code segment can be implemented using c, and then copied to the shell code segment through the disassembly binary. The shell code mainly deals with two points:
1. restore the original imported table data and pass the Resource Directory RESOURCE_VA. The linxer moderator provides the detailed source code:
Code:
IMAGE_IMPORT_DESCRIPTOR * pIID = (PIMAGE_IMPORT_DESCRIPTOR) RESOURCE_VA;

For (; pIID-> Name! = NULL; pIID ++)
{
// Directly locate FirstThunk.
IMAGE_THUNK_DATA * pITD = (IMAGE_THUNK_DATA *) (pMemPointer + pIID-> FirstThunk );
// Call LoadLibraryA to load the DLL
HINSTANCE hInstance = LoadLibraryA (pMemPointer + pIID-> Name );

For (; pITD-> u1.Ordinal! = 0; pITD ++)
{
FARPROC fpFun;
If (pITD-> u1.Ordinal & IMAGE_ORDINAL_FLAG32)
{
/// The function is imported by serial number.
FpFun = GetProcAddress (hInstance, (LPCSTR) (pITD-> u1.Ordinal & 0x0000ffff ));
}
Else
{// The function is imported by name.
IMAGE_IMPORT_BY_NAME * pIIBN = (IMAGE_IMPORT_BY_NAME *) (pITD-> u1.Ordinal );
FpFun = GetProcAddress (hInstance, (LPCSTR) pIIBN-> Name );
}

If (fpFun = NULL)
{
Return false;
}
PITD-> u1.Ordinal = (long) fpFun;
}
}
}
2. Fixed the relocation code,
To fix and relocate, we should know the actual loading address, the expected loading address, and the repaired data, which are located in the PE, so we will fix the problem, you also need to know the actual loading address. Obtain the actual loading address through the stack mechanism.
Code:
Pushad // save the environment
Call Getaddr
Getaddr:
Pop eax // the current Code address is kept in eax, and later addressing can be taken as the benchmark address
Using eax as the benchmark address, we also know that the dll loading address is the top layer of the entry code. For example, if pushad is executed and esp is increased by 32 bytes, the module address is [esp + 4] before it is increased, and now it should be [esp + 24]. In this way, you can obtain the actual address for code running and save it in [esp + 24], store it through dwInfactBase, and know the actual loading address, you can fix the problem by passing the relocated DIRECTORY parameter pRelocateTable.
Code:
While (pRelocateTable-> VirtualAddress! = NULL)
{// Locate the data
ParrOffset = (WORD *) (PBYTE) pRelocateTable + sizeof (IMAGE_BASE_RELOCATION ));
// Number of relocated data. Each data type is word.
DwRvaCount = (pRelocateTable-> SizeOfBlock-sizeof (IMAGE_BASE_RELOCATION)/2;
For (DWORD I = 0; I <dwRvaCount; I ++)
{
DwRva = parrOffset [I];
// The highest nType! = 3, not processed, because the PE is usually 3 or 0 Type
If (dwRva & 0xf000 )! = 0x3000)
{
Continue;
}
DwRva & = 0x0fff;
DwRva = (DWORD) (dwInfactBase + pRelocateTable-> VirtualAddress + dwRva );
* (DWORD *) dwRva + = dwRelocOffset;
}
// Point to the next relocation Block
PRelocateTable = (PIMAGE_BASE_RELOCATION) (PBYTE) pRelocateTable + pRelocateTable-> SizeOfBlock );
}
After all the loader simulation is completed, the PE layout displayed in the memory should be shown in:

After the processing, the stack balance is restored. The OEP value of the original file is saved, and the entry point of the original file is jumped through a jmp OEP. So far, a simple compression shell has completed its mission :)

Postscript: This article has formed a reference for many books, Daniel's code and examples, and many open-source shell code. Thanks to the selfless dedication of the predecessors. The above documents show some notes and experiences of compressing the shell on your own. The younger brother's level is limited, and many details are not involved, which is precisely an important factor in the completion of a compression shell. All the recommended friends who want to know more read the Open Source Shell Source Code several times, there may be more gains: there must be deficiencies and errors above, and there is no technical content. I hope you can give more advice. Finally, I would like to thank you for your patience:

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Four practical procedures in the series of learning and writing compression Shells

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support