Hands-on implementation of code virtual machines
0x00 what is code Virtualization
Virtualization actually I think it is to use a set of custom bytecode to replace the original native commands in the program, and the bytecode is interpreted and executed by the interpreter in the program during execution. Custom bytecode can only be identified by the interpreter. Therefore, general tools cannot identify custom bytecode, virtual Machine-based protection is more difficult to crack than other protection methods. However, the interpreter is generally native code, so that the interpreter can run and interpret the execution bytecode. The relationship is similar to many interpreted languages. It is not an executable file of the system and cannot be run directly in the system. It must be an interpreter, such as python.
0x01 why study code Virtualization
Virtualization Technologies, such as sandbox and shell, are used in many places. In many cases, to prevent malicious code from damaging our system, we need a sandbox to run the program in the sandbox, even if malicious code destroys the system, it only destroys sandbox and does not affect our system. For example, the vmp and shielden encryption shells are built into a virtual machine to protect program code. The protection based on virtual machines is more difficult to crack than other protection methods, because the existing tools cannot recognize the bytecode of the virtual machine. After seeing the power of this kind of protective shell, I also had the impulse to write one by myself, so I had this article.
0x02 virtual machine-based code obfuscation
Virtual Machine-based code protection can also be considered a type of code obfuscation technology. The purpose of code obfuscation is to prevent code from being reverse analyzed. However, not all obfuscation technologies cannot be analyzed completely, but increase the difficulty of analysis or prolong the analysis time, although these technologies are very effective in protecting code, they also have side effects, such as reducing program efficiency more or less, which is particularly prominent in the protection based on virtual machines, therefore, most protection based on virtual machines only protects the most important part. Virtual Machine-based code protection can be roughly divided into two types:
Use the virtual machine to explain and execute the shell code. This obfuscation aims to hide how the original code is encrypted, and how the code is decrypted by the shell code. This method is effective for static analysis, but not for dynamic debugging. During dynamic debugging, you can completely unshell the source code after it is decrypted. Only when used with other protection technologies can the protection effect be relatively strong.
The source code of the program to be protected is converted to a custom bytecode, and then the converted program bytecode is interpreted and executed by the virtual machine. The source code of the program will not appear in the program. This method can be effectively protected regardless of static or dynamic.
We can see that the difference between the two types of protection is that the first type only protects the shell code, but does not protect the source code. The second method directly protects all source codes. Therefore, the intensity of the first type is smaller than that of the second type. This article implements protection in the second way, that is, to protect all source code.
In the protection technology based on virtual machines, the custom bytecode and native commands usually have a ing relationship, that is, one or more bytecode corresponds to one native command. As to why multiple bytecode are needed to correspond to the same native command, it is actually to increase the difficulty of virtual machine protection being cracked, in this way, multiple sets of bytecode can be randomly generated during the conversion of protected code, but the execution of programs with the same effect increases the difficulty of reverse analysis.
0x03 what needs to be implemented?
After understanding the principles of code virtualization, I learned that the principle is to customize a set of bytecode and then use an interpreter to explain how to run the bytecode. Therefore, the objective is divided into two parts:
Define bytecode
The bytecode is just an identifier that can be defined at will. The following is the defined bytecode. Each instruction identifier corresponds to one byte.
/** Opcode enum */enum OPCODES {MOV = 0xa0, // The mov instruction bytecode corresponds to 0xa0 XOR = 0xa1, // The xor instruction bytecode corresponds to 0xa1 CMP = 0xa2, // The cmp command bytecode corresponds to 0xa2 RET = 0xa3, // the ret command bytecode corresponds to 0xa3 SYS_READ = 0xa4, // The read system calls the bytecode corresponding to 0xa4 SYS_WRITE = 0xa5, // The write system calls the bytecode corresponding to 0xa5 JNZ = 0xa6 // The jnz command bytecode corresponds to 0xa0 };
My demo is just a simple crackme, so I only defined several common commands. If necessary, you can continue to define more bytecode to enrich the functions of virtual machines.
Implementation Interpreter
After defining the bytecode corresponding to the instruction, you can implement an interpreter to explain the instruction bytecode defined above. Before implementing the virtual machine interpreter, you must first figure out what we need to virtualize. A virtual machine is actually a virtual environment where a program (custom bytecode) runs. In fact, the virtual machine here is similar to our real processor in interpreting and executing bytecode. Programs on physical machines can run only in a processor, stack, heap, and other environments that execute commands. Therefore, a virtual processor is required first, some registers are required in the processor to assist in computing. The following are the virtual processors I have defined.
/** Virtual processor */typedef struct processor_t {int r1; // virtual register r1 int r2; // virtual register r2 int r3; // virtual register r3 int r4; // virtual register r4 int flag; // virtual register flag, similar to eflags unsigned char * eip; // virtual register eip, point to the interpreted bytecode address vm_opcode op_table [OPCODE_NUM]; // the bytecode list, which stores all bytecode and the corresponding processing function} vm_processor; /** opcode struct */typedef struct opcode_t {unsigned char opcode; // bytecode void (* func) (void *); // processing function corresponding to bytecode} vm_opcode;
In the above structure, r1 ~ R4 is a general register used to transmit parameters and return values. The eip points to the bytecode address currently being executed. Op_table stores the processing functions of all bytecode commands. The above two virtualized structures are the core of the virtual machine, and the interpreter then focuses on the above two structures when interpreting bytecode. Because the program logic is simple, you only need to virtualize a processor. The heap and stack are not necessary. I used a buffer to store the data in the program. I can also understand the whole buffer as a pile or as a stack.
With the above two structures, you can write the interpreter. The interpreter's job is to determine whether the byte code currently interpreted can be parsed. If so, it passes the corresponding parameter to the corresponding processing function and allows the processing function to explain and execute this command. The following is the interpreter code.
Void vm_interp (vm_processor * proc) {/* eip points to the first byte of the protected Code * target_func + 4 is used to skip the code generated by the compiler function entry */proc-> eip = (unsigned char *) target_func + 4; // cyclically judge whether the bytecode pointed to by the eip is a return command. If not, exec_opcode is called to explain and execute while (* proc-> eip! = RET) {exec_opcode (proc );}}
Target_func is the target function written in the Custom bytecode. It is the first byte that the eip points to the target function and is ready for interpretation and execution. When the RET command is run, exec_opcode is called to execute the bytecode. Exec_opcode
Void exec_opcode (vm_processor * proc) {int flag = 0; int I = 0; // find the processing function corresponding to the interpreted bytecode pointed by the eip while (! Flag & I <OPCODE_NUM) {if (* proc-> eip = proc-> op_table [I]. opcode) {flag = 1; // After finding it, call the processing function of this instruction. The processing function will explain proc-> op_table [I]. func (void *) proc);} else {I ++ ;}}}
When interpreting bytecode, first determine which command needs to be executed, and then call its processing function. The following is the pseudo code of target_func. The logic of pseudo-code is to first read 0x12 bytes from the standard input, then the first 8 bytes are different from 0x29 or, and then compare to the eight bytes in the memory by bit. If all the values are the same, the output success is displayed, an error is returned. The following code can be fully implemented by changing to a loop structure, but I am lazy here, all of which are copy and paste.
/* Mov r1, 0x00000000 mov r2, 0x12 call vm_read; input mov r1, input [0] mov r2, 0x29 xor r1, r2; exception or cmp r1, flag [0]; compare jnz ERROR; jump to the output ERROR code if different; same as mov r1, input [1] xor r1, r2 cmp r1, flag [1] jnz ERROR mov r1, input [2] xor r1, r2 cmp r1, flag [2] jnz ERROR mov r1, input [3] xor r1, r2 cmp r1, flag [3] jnz ERROR mov r1, input [4] xor r1, r2 cmp r1, flag [4] jnz ERROR mov r1, input [5] xor r1, r2 cmp r1, flag [5] jnz ERROR mov r1, input [6] xor r1, r2 cmp r1, flag [6] jnz ERROR mov r1, input [7] xor r1, r2 cmp r1, flag [7] jnz ERROR */
The corresponding processing function code is included in the complete code below. With the above key functions, a simple virtual machine can run. In virtual machines, you can also create virtual machine stacks and more complete registers to enrich the instructions supported by virtual machines. Because this program is relatively simple, it does not use a stack. All parameters are transferred through registers or hidden in bytecode. If you are interested, you can modify it yourself.
0x04 interpreter interpretation execution process
Here, we will use the first bytecode in the demo to demonstrate the process of interpreting and executing the interpreter in the Virtual Machine. First, we can see from the above that when the interpreter vm_interp is executed, the eip will point to target_func + 4, that is, 0xa0, the first byte defined in the target_func inline assembly, and then determines whether the eip points to the bytecode is the ret command. The ret command is 0xa3, so it is not the eip that points to the ret, enter the exec_opcode function for bytecode interpretation.
After entering exec_opcode, start to search for the bytecode pointed to by eip in op_table of the virtual processor. The current value is 0xa0. Then, call its interpretation function.
Bytecode and interpretation functions are initialized in init_vm_proc.
It can be seen that 0xa0 corresponds to the mov command, so when the interpreter encounters 0xa0, it will call the vm_mov function to explain the mov command.
In the vm_mov function, the eip + 1 and eip + 2 are saved in dest and src respectively, and dest is the register identifier, in the subsequent switch, determine which register dest is. In this example, dest is 0x10, that is, r1 register. In the case 0x10 branch, * src is assigned to r1. In general, the first six bytes are the first mov command, corresponding to mov r1, xxxx, and xxxx, which are the last four of the six bytes. In this example, 0x00000000 is used.
In this example, we can roughly understand the process of interpreting and executing bytecode in an interpreter. In fact, it is very simple to call the corresponding function through the relationship between bytecode and the interpreter function, or you can use a long switch to determine each bytecode and call the corresponding function. The interpreter simulates a command by executing the corresponding operation. Finally, you can concatenate these commands to execute a complete logic.
0x05 Code Running Effect
0x06 virtual machine protection Static Analysis
During Static Analysis of Code protected by virtual machines, common tools are ineffective, because bytecode is defined by ourselves and can only be identified by the interpreter. Therefore, when ida is used for analysis, the bytecode is only a piece of unidentifiable data.
This is the target_func code recognized by ida, which has been used to defend against static analysis. However, we can still analyze our interpreter statically. when analyzing the interpreter, the control flow in the interpreter is much more complex than the control flow in the source program, which increases the analysis difficulty.
Dynamic debugging
During dynamic debugging, the bytecode is still not recognized, and the processor does not actually execute these unrecognized things. Because these bytecode are all executed by our virtual processor through the interpreter, and our interpreters are all native commands, we can perform static analysis or dynamic debugging. However, during the dynamic debugging, the interpreter is only being debugged. During the debugging process, the interpreter of each instruction is continuously called. To truly restore the source code, you need to find the Ning between native commands corresponding to all bytecode during the debugging process. Finally, you can use this ing to convert bytecode into native commands, of course, you can also fix a fully shelled and executable native program, but the process is complicated.
0x07 complete code
The following is the complete demo code, which has been tested in linux.
Xvm. h
# Include <stdio. h> # include <stdlib. h> # include <string. h> # define OPCODE_NUM 7 // opcode number # define HEAP_SIZE_MAX 1024 char * heap_buf; // vm heap/** opcode enum */enum OPCODES {MOV = 0xa0, // mov command bytecode corresponds to 0xa0 XOR = 0xa1, // xor command bytecode corresponds to 0xa1 CMP = 0xa2, // cmp command bytecode corresponds to 0xa2 RET = 0xa3, // ret command bytecode corresponds to 0xa3 SYS_READ = 0xa4, // read System Call bytecode corresponds to 0xa4 SYS_WRITE = 0xa5, // The write system calls the bytecode corresponding to 0xa5 JNZ = 0xa6 // jnz command bytecode corresponding to 0xa0}; enum REGISTERS {R1 = 0x10, R2 = 0x11, r3 = 0x12, R4 = 0x13, EIP = 0x14, FLAG = 0x15};/** opcode struct */typedef struct opcode_t {unsigned char opcode; // bytecode void (* func) (void *); // processing function corresponding to bytecode} vm_opcode;/** virtual processor */typedef struct processor_t {int r1; // virtual register r1 int r2; // virtual register r2 int r3; // virtual register r3 int r4; // virtual register r4 int flag; // virtual register flag, the function is similar to eflags unsigned char * eip; // virtual machine register eip, pointing to the interpreted bytecode address vm_opcode op_table [OPCODE_NUM]; // the bytecode list, stores all bytecode and corresponding processing functions} vm_processor;
Xvm. c
# Include "xvm. h "void target_func () {_ asm _ volatile __(". byte 0xa0, 0x10, 0x00, 0x00, 0x00, 0x00, 0xa0, 0x11, 0x12, 0x00, 0x00, 0x00, 0xa4, 0xa0, 0x14, 0x00, 0x00, 0x00, 0x00, 0xa0, 0x11, 0x29, 0x00, 0x00, 0x00, 0xa1, 0xa2, 0x20, 0xa6, 0x5b, 0xa0, 0x14, 0x01, 0x00, 0x00, 0x00, 0xa1, 0xa2, 0x21, 0xa6, 0x50, 0xa0, 0x14, 0x02, 0x00, 0x00, 0x00, 0xa1, 0xa2, 0x22, 0xa6, 0x45, 0xa0, 0x14, 0x03, 0x00, 0x00, 0x00, 0xa1, 0xa2, 0x23, 0xa6, 0x3a, 0xa0, 0x14, 0x04, 0x00, 0x00, 0x00, 0xa1, 0xa2, 0x24, 0xa6, 0x2f, 0xa0, 0x14, 0x05, 0x00, 0x00, 0x00, 0xa1, 0xa2, 0x25, 0xa6, 0x24, 0xa0, 0x14, 0x06, 0x00, 0x00, 0x00, 0xa1, 0xa2, 0x26, 0xa6, 0x19, 0xa0, 0x14, 0x07, 0x00, 0x00, 0x00, 0xa1, 0xa2, 0x27, 0xa6, 0x0f, 0xa0, 0x10, 0x30, 0x00, 0x00, 0x00, 0xa0, 0x11, 0x09, 0x00, 0x00, 0x00, 0xa5, 0xa3, 0xa0, 0x10, 0x40, 0x00, 0x00, 0x00, 0xa0, 0x11, 0x07, 0x00, 0x00, 0x00, 0xa5, 0xa3 ");/* mov r1, 0x00000000 mov r2, 0x12 call vm_read; input mov r1, input [0] mov r2, 0x29 xor r1, r2; xor or cmp r1, flag [0]; compare jnz ERROR; jump to the output ERROR code if they are different; same as mov r1, input [1] xor r1, r2 cmp r1, flag [1] jnz ERROR mov r1, input [2] xor r1, r2 cmp r1, flag [2] jnz ERROR mov r1, input [3] xor r1, r2 cmp r1, flag [3] jnz ERROR mov r1, Input [4] xor r1, r2 cmp r1, flag [4] jnz ERROR mov r1, input [5] xor r1, r2 cmp r1, flag [5] jnz ERROR mov r1, input [6] xor r1, r2 cmp r1, flag [6] jnz ERROR mov r1, input [7] xor r1, r2 cmp r1, flag [7] jnz ERROR */}/** xor command to explain the function */void vm_xor (vm_processor * proc) {// two data with different or values are stored in r1, int arg1 = proc-> r1; int arg2 = proc-> r2; // exclusive or the result in r1 contains proc-> r1 = arg1 ^ arg2; // The xor command occupies only one byte. Therefore, after explanation, the eip moves one byte backward. Proc-> eip + = 1;}/** cmp command interpretation function */void vm_cmp (vm_processor * proc) {// The data to be compared is stored in r1 and buffer respectively. int arg1 = proc-> r1;, // bytecode contains the buffer offset char * arg2 = * (proc-> eip + 1) + heap_buf; // compare and set the flag register bit, 1 is equal, 0 is not equal to if (arg1 = * arg2) {proc-> flag = 1;} else {proc-> flag = 0;} // The cmp command occupies two bytes, eip moves two byte proc-> eip + = 2;}/** jnz command interpretation function */void vm_jnz (vm_processor * proc) {// obtain the unsigned c offset from the current eip address in the bytecode Har arg1 = * (proc-> eip + 1); // compare the flag value to determine the result of the previous command. If the flag is zero, it indicates that the previous command does not want to wait, jnz jump implementation if (proc-> flag = 0) {// you can directly modify the eip for the jump. The offset is the obtained offset proc-> eip + = arg1 ;} else {proc-> flag = 0;} // The jnz command occupies 2 bytes, so the eip moves two bytes backward: proc-> eip + = 2 ;} /** ret command interpreter */void vm_ret (vm_processor * proc) {}/ ** read System Call interpreter */void vm_read (vm_processor * proc) {// The read system calls two parameters, which are stored in the r1 and r2 registers respectively. r1 stores the offset of the buf that stores the read data, and r2 indicates the expected length of char * arg. 2 = heap_buf + proc-> r1; int arg3 = proc-> r2; // directly call read (0, arg2, arg3 ); // read system calls occupy 1 byte, so the eip moves 1 byte proc-> eip + = 1 ;} /** write System Call interpretation function */void vm_write (vm_processor * proc) {// The same as the read system call. r1 contains the offset of the buf that stores the written data, r2 is the expected char * arg2 = heap_buf + proc-> r1; int arg3 = proc-> r2; // directly calls write (1, arg2, arg3 ); // write system calls occupy 1 byte, so eip moves 1 byte proc-> eip + = 1;}/** mov instruction interpreter */void vm_mov (vm _ Processor * proc) {// the two parameters of the mov command are hidden in the bytecode. The first byte after the instruction mark is the register identifier, the second to fifth bytes after the instruction mark are the immediate Number of mov values, at present, only the content in one mov immediately count to one register and one mov buffer is implemented to one r1 register unsigned char * dest = proc-> eip + 1; int * src = (int *) (proc-> eip + 2); // the first four cases correspond to r1 ~ R4, In the last case, * src stores an offset of the buffer, which is used to assign a byte in the buffer to r1 switch (* dest) {case 0x10: proc-> r1 = * src; break; case 0x11: proc-> r2 = * src; break; case 0x12: proc-> r3 = * src; break; case 0x13: proc-> r4 = * src; break; case 0x14: proc-> r1 = * (heap_buf + * src); break ;} // mov command occupies 6 bytes, so eip moves 6 bytes backward proc-> eip + = 6;}/** execute bytecode */void exec_opcode (vm_processor * proc) {int flag = 0; int I = 0; // find the word being interpreted pointed to by the eip The processing function corresponding to the Code while (! Flag & I <OPCODE_NUM) {if (* proc-> eip = proc-> op_table [I]. opcode) {flag = 1; // After finding it, call the processing function of this instruction. The processing function will explain proc-> op_table [I]. func (void *) proc);} else {I ++ ;}}/ ** virtual machine interpreter */void vm_interp (vm_processor * proc) {/* eip points to the first byte of the protected Code * target_func + 4 is used to skip the code generated by the compiler function entry */proc-> eip = (unsigned char *) target_func + 4; // cyclically judge whether the bytecode pointed to by the eip is a return command. If not, exec_opcode is called to explain and execute while (* proc-> eip! = RET) {exec_opcode (proc) ;}/ ** initialize Virtual Machine processor */void init_vm_proc (vm_processor * proc) {proc-> r1 = 0; proc-> r2 = 0; proc-> r3 = 0; proc-> r4 = 0; proc-> flag = 0; // associate the instruction bytecode with the interpreted function proc-> op_table [0]. opcode = MOV; proc-> op_table [0]. func = (void (*) (void *) vm_mov; proc-> op_table [1]. opcode = XOR; proc-> op_table [1]. func = (void (*) (void *) vm_xor; proc-> op_table [2]. opcode = CMP; proc-> op_table [2]. func = (Void (*) (void *) vm_cmp; proc-> op_table [3]. opcode = SYS_READ; proc-> op_table [3]. func = (void (*) (void *) vm_read; proc-> op_table [4]. opcode = SYS_WRITE; proc-> op_table [4]. func = (void (*) (void *) vm_write; proc-> op_table [5]. opcode = RET; proc-> op_table [5]. func = (void (*) (void *) vm_ret; proc-> op_table [6]. opcode = JNZ; proc-> op_table [6]. func = (void (*) (void *) vm_jnz; // create buffer heap_buf = (Char *) malloc (HEAP_SIZE_MAX); // initialize buffer memcpy (heap_buf + 0x20, "syclover", 8); memcpy (heap_buf + 0x30, "success! \ N ", 9); memcpy (heap_buf + 0x40," error! \ N ", 7);} // flag: ZPJEF_L [int main () {vm_processor proc = {0}; // initial vm processor init_vm_proc (& proc ); // execute target func vm_interp (& proc); return 0 ;}
0x08 Summary
The above code is a summary after the study of code virtualization, many of which do not understand correctly. This is just the simplest implementation. It is only used for learning and using. It is still very complicated to study virtualization technology in depth and requires more knowledge to be understood. This article is a reference. There are also many problems that have not been solved in the learning process. For example, if you want to implement a virtual machine-based protective shell, you must first convert the native commands in the source program into custom bytecode, but I don't know what method is better for conversion.
In many foreign articles also see another virtual machine protection, is based on the LLVM-IR of virtual machine protection, interested can also continue to study.