Introducing code virtualization [Translate]

Source: Internet
Author: User
Tags constant exit in pop value

Introducing code virtualization [Translate]

tags (space delimited): Code virtualization Introduction to code virtualization

by Nooby

Translation: FadeC

This article describes how to protect your code with "virtual machines" and apply this technology to popular virtual machines.

Take you from beginner to mastery. :)

The translation of the wrong place a lot of apologies, encountered doubts or problems, please take the original text as the subject. Why is it called a virtual machine?

Early software protection, the more mature approach is based on fuzzy and change (confusion, expansion), this method to insert the garbage code into the original code flow, or change the original instruction to a similar directive, or replace some constant calculation. Insert conditional and unconditional branches and randomly insert some bytes on key execution code (making this process irreversible).
This is the case with Example_obfuscated.exe.

As time goes on, the programmer's level rises and the debugger is enhanced. Some mature reverse engineering tools and methods make the originally protected code readable/reversible. Using the expansion method can effectively stop the reverse, but it will increase the volume of the software. As a result, people are starting to pursue a new approach to protecting code that does not grow in size.

Such a loop code is like an "emulator" (or an interpreter) of the original code.
The received data stream (also known as Pesudo-code or P-code) is a micro-operation (handler), just like a "virtual machine" execution instruction set. The process eventually evolved to: Code virtialization. (Instruction Virtualization) How does a virtual machine work?

We know that the real processor has registers, translation decoders, and logical processors. The virtual class is the same.
The portal code of the virtual machine is actually collecting the context information of the entity processor, the execution loop reads the P-code and distributes it to the appropriate handler (handler), and when the virtual machine exits, it updates the actual processor's register information with the previously saved context information.

Here is a simple example to assume that a function will be executed by a virtual machine.

Initial instruction:
add eax, ebx
Retn

by converting it to Virtual code:
Push address_of_pcode ==> add                    p-code address to the stack in
jmp vmentry

vmentry:
push all register values                 ==> put all register information into the stack in
jmp vmloop

vmloop:
get Vmeip from p-code< c12/> dispatch Handler [Add_eax_ebx_]
vminit:
pop   all register values into Vmcontext ==> pops all registers to vmcontext< C16/>pop Address_of_pcode into Vmeip          ==> p-code popup to Vmeip
jmp vmloop

add_eax_ebx_hander: Do
" add eax, ebx "on Vmcontext           ==> complete operation in Vmcontext
jmp vmloop

vmretn:
Restore register values from Vmcontext   ==> Restore register information via vmcontext do  
"RETN"                                ==> return

Note that the virtual machine is best not to emulate the x86 directive, because if the instruction can also be executed by a real processor, it may cause the virtual machine to exit in certain places, and then it will be stuck here or there are some unforeseen circumstances (Ps: This sentence is not very good translation).
The actual handler for the virtual machine usually uses a more general design idea than the handler in the example above.
Usually the P-code also determines the operand.
"Add_eax_ebx_handler" can be defined as "Add_handler", which requires two parameters to produce a result.
Also loads/stores registers, processes/saves parameters and results.
This increases the reusability of the handlers so that you can trace the handlers without having to understand the original construction code of the virtual machine.

Now let's take a look at how a stack-based (stack-based) virtual machine Works:

Add_hander:
pop REG                  ; Reg = Parameter2
add [STACK], Reg         ; [STACK] points to Parameter1
Getreg_handler:
fetch p-code for operand
push Vmcontext[operand]  ; push  Value of Reg on stack
setreg_handler:
fetch p-code for operand
pop vmcontext[operand]   ; pop value of Reg From stack the
P-code of above function would be:
Init
getreg EBX
getreg EAX
Add
SetREG EAX
RETN
What modern virtual machines do for reverse engineering.

For virtual machines, code obfuscation and distortion is important because the virtual machine's interpreter is exposed directly, and the reverse can be used by some automated tools to analyze the underlying architecture of the virtual machine.
Because some registers on the processor are not being used (Vmcontext and virtual machine interpreters may use a small number of registers that are stored separately), they can be used as additional confusion.
The handler for the virtual machine can be designed with as few operands/context dependencies as possible.
In addition, real vmcontext stack pointers may be traced, and stacks can be discarded within the interpreter loop.

With these instructions, it is not difficult to see that code obfuscation and distortion can be very effective.

Examples of confusing virtual machines can be found in Example_virtualized.exe.

Now that we know how to protect the execution part of a virtual machine, let's continue to look at the method of converting instructions to P-code, which is a wonderful part of the code for virtualization. instruction Decomposition Logic Directives

Here's a way to increase usability and complexity.

Logical processing can be decomposed into a nand/nor-like operation based on the following formula:

Not (x) = NAND (x, X) = Nor (x, X) and
(x, y) = Not (NAND (x, y)) = Nor (no (x), not (y)) 
OR (x, y) = NAND (no (x), not (y)) = Not (Nor (x, y))
XOR (x, y) = NAND (NAND (no (x), y), NAND (x, not (y))) = Nor (and (x, y), nor (x, y))
Arithmetic Instructions

Subtraction can be converted to an addition with a eflags carry calculation.

SUB (x, Y) = Not (ADD (not (x), y))

The last not before the eflages as a, the last not after the eflages is the most B, then the calculation is as follows:
EFlags = OR (and (A, 0x815), and (B, not (0x815))); 0x815 masks of, AF, PF and CF Register abstract

Since virtual machines can have more registers than an actual x86 processor, real processor registers can be dynamically mapped to virtual machine registers, and additional registers can be used to store intermediate values or to be used for confusion. This allows the instruction to be further blurred and optimized by the content described below. Context round robin

Due to the abstraction of registers, different p-code can have different register mappings, so it is possible to change the design occasionally, making the reverse more difficult.

When the next piece of P-code has different register mappings, the virtual machine simply swaps the values that are in context.

When converting instructions such as XCHG, it can simply change the register without generating any p-code mappings. Look at the following example:

The original instruction:

Xchg ebx, ecx
add eax, ECX
P-code that do not have context rotation:

Current Register Mappings

Real Registers  Virtual registers
EAX     R0
EBX     R1
ECX     R2
Getreg R2; R2 = ECX
getreg R1; R1 = EBX
SetREG R2; ECX = value of EBX
SetREG R1; EBX = value of ECX
getreg R2
getreg R0; R0 = EAX
Add
SetREG R0
P-code with context rotation (exchanged when p-code generation is complete):
Before Exchange
Real registers  Virtual registers
EAX     R0
EBX     R1 ECX R2
After Exchange
Real registers  Virtual registers
EAX     R0
EBX     R2
ECX     R1
[Map R1 = ECX, R2 = EBX]; Exchange
Getreg R1        ; R1 = ECX
getreg R0        ; R0 = EAX
Add
SetREG R0        ; R0 = EAX

Such a round robin can also be applied to the last setreg operation, and the result will also be written to another unused virtual machine register (that is, R3), discarding the R0 with invalid data. This piece of p-code operation will be on 3 registers, so it is difficult to be restored.

P-code with Context Rotation 2:
[Map R1 = ECX, R2 = EBX]     ; Exchange
getreg R1   ; R1 = ECX
getreg R0   ; R0 = EAX
Add
[Map R0 = Unused, R3 = EAX]     ; rotation
SetREG R3   ; R3 = EAX
Register Aliases:

When working with an instruction, especially when assigning values between registers, a mapping between the source register and the target register may occur. Unless the source registers are changed (forced remapping or Getreg & setreg operations).
This mapping can read access to the target register and redirect it to the source where it does not actually perform the task.

Take the following code as an example:

Original instructions:
mov eax, ecx
add eax, ebx
mov ecx, eax
mov eax, ebx
P-code: Current
Register Mappings
Real registers  Virtual registers
EAX
R0 EBX R1 ECX     R2
[make alias R0 = R2]
getreg R1               ; R1 = EBX
getreg R2               ; reading of R0 redirects to R2
Add
[R0 (EAX) are being changed, since R0 is Destinat Ion of an alias, just clear its alias]
[Map R0 = Unused, R3 = EAX]     ; rotation
SetREG R3               ; R3 = EAX
[make alias R2 = R3]
getreg R1
[R3 (EAX) is being changed, since R3 was source of an alias, we need  To does the assignment]
[Map R3 = ECX, R2 = EAX]    ; we can simplify the R2 = R3 assignment by rotation
[map R0 = EAX, R3 = Unused]; Another rotation
SetREG R0               ; R0 = EAX
Register usage Analysis:

Given the context of a set of instructions, it can be determined that at some point the value of some registers changes without affecting the program logic, and some eflags computation overhead can be ignored.

For example, a section of code in 0 x4069a8 example.exe:

PUSH ebp MOV ebp, ESP; Eax| Ecx| ebp| Of| Sf| Zf| Pf| CF SUB ESP, 0x10; Eax| Ecx| Of| Sf| Zf| Pf| CF MOV ECX, DWORD PTR [ebp+0x8]; Eax| Ecx| Of| Sf| Zf| Pf| CF MOV EAX, DWORD PTR [ecx+0x10]; Eax| Of| Sf| Zf| Pf| CF PUSH ESI; Of| Sf| Zf| Pf| CF MOV ESI, DWORD PTR [ebp+0xc]; esi| Of| Sf| Zf| Pf| CF PUSH EDI; Of| Sf| Zf| Pf| CF MOV EDI, ESI; edi| Of| Sf| Zf| Pf| CF SUB EDI, DWORD PTR [ecx+0xc]; Of| Sf| Zf| Pf| CF ADD ESI, -0x4; Ecx| Of| Sf| Zf| Pf| CF SHR EDI, 0xF; Ecx| Of| Sf| Zf| Pf| CF MOV ECX, EDI; Ecx| Of| Sf| Zf| Pf| CF Imul ECX, ecx,0x204; Of| Sf| Zf| Pf| CF LEA ECX, DWORD PTR [ecx+eax+0x144]; Of| Sf| Zf| Pf| CF MOV DWORD PTR [ebp-0x10], ECX; Of| Sf| Zf| Pf| CF MOV ECX, DWORD PTR [ESI]; Ecx| Of| Sf| Zf| Pf| CF DEC ECX; Of| Sf| Zf| Pf|
CfTEST CL, 0x1; Of| Sf| Zf| Pf| CF MOV DWORD PTR [ebp-0x4], ECX jnz 0x406cb8

The comment shows the state of the register/flag that was used before the instruction was executed. These are used as build registers for round robin.
The eflags calculates lengthy and mixed spam instructions, which makes the reverse more complicated. other P-code traps and optimizations Constant Encryption

The constants in the original instruction are converted into calculation, so that the constants appear at run time and can avoid direct exposure. Stack Confusion

Virtual machines can confuse stacks by pushing/writing random values, while real ESP can be computed/tracked from Vmcontext. explain with multiple virtual machines

It is possible to use multiple virtual machines to execute one series of P-code. On certain points, a
Special handler leading to another interpreter loop is executed. The P-code data after such points
is processed in a different virtual machine. These virtual machines need only to share the
Intermediate Run-time information such as register mappings on switch points. Tracing such P-code
would need to analyze all Vsan instances, which is considerably much. References

Vmprotect
http://vmpsoft.com/
Code virtualizer
http://www.oreans.com/
Safengine
Http://www.sa fengine.com/
Rewolf ' s x86 virtualizer
Http://rewolf.pl/stuff/x86.virt.pdf
ollydbg
http://www.ollydbg . de/
Vmsweeper
http://forum.tuts4you.com/topic/25077-vmsweeper/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.