Binary files protected by virtual machines in reverse order

Source: Internet
Author: User

Binary files protected by virtual machines in reverse order

0x00 Introduction

In code obfuscation, virtual machines are used to run different machine instruction sets on a program. For example, a virtual machine can run the ARM Instruction Set on a 32-bit x86 architecture machine. The Virtual Machine Used for code obfuscation is completely different from the ordinary virtual machine that can run the operating system (such as VMware). The former is only used to execute limited commands for specific tasks.

After learning about the virtual machine instruction set execution mechanism of the relevant code obfuscators, it is easier to reverse engineer a virtual machine that uses the instruction set to protect programs. It only takes a small amount of time to study the instruction set operation code of this architecture. However, most of the current virtual machine code obfuscators use custom instruction sets. In other words, each command is assigned a custom operation code (usually random) and a custom format. The reverse engineer needs to reverse decode the meaning of each operation code. This is a dead thing! For example, let's take a look at the differences between the 32-bit x86 Instruction Set and the custom instruction set we will introduce in this article:

Obviously, these commands assign the memory bytes specified by the second operand to the register of the first operand. However, the two commands have different binary operation codes. The 0x56 operation code of the second command is a random number. The second byte of the two commands indicates the register used by the operation code. Each 4-bit indicates a register.

Before entering the reverse engineering instance, we must first understand the working principle behind the scenes based on the virtual machine code obfuscation technology: after a virtual machine is started, the first thing is to apply for an "address space" in its process virtual address space. In other words, it applies for the required memory space, stack, and register. Then, the VM loads and executes the operation code file. Code execution is completed by a VM loop. In this loop, the processor of the Virtual Machine parses each predefined operation code and operand, and then performs iterative execution according to the instruction set. Wait until the VM loop encounters a specified exit operation code.

0x01 example

For this reason, I spent some time writing a virtual machine with a custom Instruction Set in C language. The complete source code can be obtained at the end of this article. As you guessed, a single virtual machine cannot do anything. That's why I wrote such a CrackMe applet. In addition, I sincerely invite you to add more features to this guy!

As mentioned in the preface, this virtual machine uses a set of custom instruction sets, and the virtual machine loads the operating code file to "address space" after initialization ".

Make sure that the operating code file and the virtual machine are in the same directory and then run the command. Enter a string of passwords as shown in the following figure:

Password Verification Failed!

Our current goal is to find the correct password for this program. First, let's take a look at this operation code file (vm_file) and open it in a hexadecimal Editor:

We can see that there are "Right pass!" In the vm_file file! "," Wrong pass! "And" Password. Next, start to reverse the Virtual Machine and open it with IDA.

After IDA opens the virtual machine, we can directly locate the virtual address of the VM loop: 0x00401334. It shows that this program is quite large, but we can certainly solve it if we find the correct entry point.

Let's take a look at the instructions executed by the entry function:

    push    ebp    push    edi    push    esi    push    ebx    sub     esp, 2Ch    mov     esi, [esp+3Ch+arg_0]    mov     ebx, [esp+3Ch+arg_4]    mov     ax, [ebx+0Ah]    lea     ebp, [esi+1200h]loc_40134D: ; This is where the loop starts    movzx edx, ax    mov     cl, [esi+edx]    lea     edx, [eax+1]    mov     [ebx+0Ah], dx    sub     ecx, 10h    cmp     cl, 0E1h     ; switch 226 cases    jbe     short loc_40136C

"Mov cl, [esi + edx]" The command reads a byte to CL. Obviously, the CL register only contains the operation code. The operation code is located through the ESI and EDX registers. We can clearly see that EDX only contains one WORD (16 bit), while ESI contains DWORD (32 bit ). Therefore, ESI actually points to the VM code segment, while DX points to the pointer of the current command of our VM (the index of the current operation code in the file ).

After the bytes are correctly read, we notice that the DX register value is saved to [EBX + 0AH]. This is the register space allocated by the virtual machine. We now know that the EBX Register indicates the location of the file data pointed to by the ESI register in the memory.

Before comparison, we noticed that the compiler used compilation optimization: the value of each operation code before accessing the switch table minus 0x10.

loc_40136C:    movzx ecx, cl    jmp     ds:switchTable[ecx*4] ; switch jump

Although the switch table is large, it can calculate the dynamic address more quickly. You can use OllyDbg or IDA in Win32 to run and debug this program.

0x02 first command

The first switch takes us to a small process:

We are now in the "case 0x18" operation code, because the compiler adds a subtraction operation to optimize this code. If you go back and check vm_file, you can find that the first byte is 0x18. This operation code seems to require some operands, so the VM reads one more byte to the DX register. Next, the VM instruction pointer [EBX + 0AH] is updated to EAX + 2, which means that the IP (instruction pointer) points to the next byte. Then, the read bytes are compared with 3. If the value is greater than 3, the system will exit the loop and throw an exception. However, in our example, no exception is thrown, because the operand in the binary file is equal to 0x01, so the program will not jump. Next we will arrive here:

Remind you that EBX is a pointer to the register array of the virtual machine, so the first command initializes [EBX + 1*2] (the second register) to 0.

Now, we have enough information to determine that the VM contains four registers, which we can call R0, R1, R2, and R3.

The remaining code loads 2 bytes of data from the file (0x250 in the large-end order) and stores the data in the R1 register. Then, the VM Instruction Pointer Points to the next instruction, that is, at the 0x04 offset of the file. Finally, the jmp jumps to the VM loop of loc_40134D to start executing the next command.

Until now, we can only know what the first command is, and it is just a simple mov command. This command can be rewritten to the following format:

    MOV R1, 250H
0x03 second command

Let's take a look at the next operation code (0xAF ):

The first code block is the same as the previous mov command. Obviously, this is a typical code that requires a register as the operand. In our example, it uses the R1 (0x01) Register. Next, it accesses the registers of [EBX + 0CH. We know that this register is definitely not R0, R1, R2, R3. Because R3 is stored in [EBX + 6]. We also know that this is not an IP instruction pointer because it is located in [EBX + 0AH]. To find out what the register is, we need to go back and check its initialization in the main function:

.text:00402703 mov     word ptr [eax+0Ch], 256

Back to our analysis, we noticed that after obtaining the value of this Register, we would perform a minus operation and then compare it with 0xFFFF. Because the register is initialized to 256, It is not equal to 0xFFFF until the register value is 0 and minus one. If this register is equal to 0 xFFFF, the VM will exit the loop. Because this is the first execution, it is determined that [EBX + 0CH] is definitely equal to 255.

The next two commands read the value of R1 (0x250) and save it to the DX register. Then we can see an interesting command:

mov [esi+eax*2+1000h], dx

If you still remember, ESI points to the base address of the code and data area. In addition, the ESI + 1000H span 4 K address space. Therefore, we can assume that ESI + 1000H points to a different "section" VM "address space ".

We can repeat this operation with pseudo code:

#!cWORD section[256];[…]section[ –reg ] = R1;

It seems that this is a stack structure. The value of the R1 register is saved to the position where the stack pointer minus one. We can boldly assume that the 0xAF operation code represents the PUSH command. Therefore, the meaning of this operation code command can be understood as: PUSH R1.

Now we know that [EBX + 0CH] is the VM's stack pointer, And the stack space is 256 * sizeof (uint16_t ). In addition, if you want to compare the VM Stack pointer with the machine Stack pointer of the x86 architecture, you can see that the VM Stack pointer is only an array index, while the stack pointer of the x86 mechanism is a register (ESP ).

0x04 Third Command

Next, the third operation code (0xC2 ):

The operation code seems to be reading a WORD data at the top of the stack. But before reading, it checks whether the stack is empty. If yes, a VM exception is thrown. We know that this stack is not empty because a value has already been pushed in. After the data at the top of the stack is saved to the DX register, the stack pointer is + 1. We also know that the DX value is now 0x250 (part of the code and data area ). Then, make sure that the value at the top of the stack does not exceed 0x1000 (address space size ). Next, call printf as the parameter of the string pointed to by [ESI + DX. In our example, the string at the 0x250 byte of vm_file is "Password:", which will be printed to the screen.

We can conclude that the 0xC2 Command needs to PUSH the string offset to the stack, and then POP the printf command.

As you can see, after this operation code is reversed, we have reached the code that prints "Password. You may have noticed that we can use a single command to simplify the execution actions represented by each operation code. Next, we will not analyze these operation codes step by step. But now you should be able to reverse engineer a program protected by virtual machines, or even build your own virtual machine protection program.

0x05 password cracking

The following describes how to quickly find the correct password:

Open the vm_file file in the hexadecimal editor and take out the 256 Random bytes at the offset from 0x80 to 0x17F. We can call it Random. Compare each byte of the password entered by the user with Random, and compare it with the predefined array of the vm_file file at the 0x240 offset.

I have provided a password generator in the reference section below. Compile and execute it to get the correct password:

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.