Since QEMU-0.10.0, TCG has become a new qemu Translation Engine, making qemu no longer dependent on gcc3.x, and has achieved "real" dynamic translation (in a sense, the old version copies binary commands from the compiled target file ). TCG is called"Tiny Code Generator"Fabrice bellard, creator of qemu, wrote in The TCG instruction file that TCG originated from a C compiler backend and was simplified as a qemu dynamic code generator (Fabrice
Bellard has previously written a very good compiler called tinycc ). In fact, TCG serves the same purpose as a real compiler backend. It is mainly responsible for analyzing, optimizing the target code, and generating the host code.
Target command ----> TCG ----> host command
The following uses the X86 platform as an example (both host and target are x86 ).
As I mentioned in the previous article,The basic idea of dynamic translation is to split each target command into several micro commands, which are implemented by a simple C code, at runtime, these micro-commands are combined into a function through a dynamic code generator. Finally, executing this function is equivalent to executing a target command.
This idea is based on the fact that CPU commands are well-regulated. The length, operation code, and operands of each command have a fixed format and can be pushed and exported based on the preceding format, therefore, you only need to use the disassembly engine to analyze the operation code, input parameters, and output parameters of the command. The rest of the work is to encode the command as the target command.
So how do I know which micro commands should be divided into so many CPU commands? In fact, CPU commands seem to have a wide variety of names and are extremely complicated. In fact, most commands are similar to the following categories:
Data transmission, arithmetic operations, logical operations, program control;
For example, data transmission includes sending commands (such as mov) and stack operations (push and pop ).
Program control includes: Call and transfer command (JMP;
Based on this, TCG defines micro commands in the above categories (see TCG/i386/tcg-target.c), for example: One of the simplest functions tcg_out_movi is as follows:
// TCG/TCG. c
Static inline void tcg_out8 (tcgcontext * s, uint8_t V)
{
* S-> code_ptr ++ = V;
}
Static inline void tcg_out32 (tcgcontext * s, uint32_t V)
{
* (Uint32_t *) S-> code_ptr = V;
S-> code_ptr + = 4;
}
// TCG/i386/tcg-target.c
Static inline void tcg_out_movi (tcgcontext * s, tcgtype type,
Int ret, int32_t Arg)
{
If (ARG = 0 ){
/* XOR r0, R0 */
Tcg_out_modrm (S, 0x01 | (arith_xor <3), RET, RET );
} Else {
Tcg_out8 (S, 0xb8 + RET); // output operation code. RET is a register index.
Tcg_out32 (S, ARG); // output operand
}
}
0xb8-0xbf is the hexadecimal code of mov R and IV Operations in x86 commands. Therefore, the tcg_out_movi function is to output the mov operation instruction code to the buffer zone.It can be seen that TCG uses hard encoding in the process of generating the target command. Therefore, to run TCG on different host platforms, you must write micro-operation functions for different platforms.
Next, I will explain how the target command JMP f000: e05b is translated into a host command. The key variables are defined as follows:
Gen_opc_buf: operation code Buffer
Gen_opparam_buf: Parameter Buffer
Gen_code_buf: buffer for storing translated commands
The pointer variables gen_opc_ptr, gen_opparam_ptr, and gen_code_ptr point to the preceding buffer respectively.
JMP f000: e05b encoding: Ea 5B E0 00 F0,
The first is the disas_insn () function translation command. When it comes to the 1st-byte EA, the analysis shows that this is a 16-bit unconditional jump command, so the offset and Selector are obtained from the subsequent bytes in sequence, then it is divided into the following sub-commands:
Gen_op_movl_t0_im (selector );
Gen_op_movl_t1_imu (offset );
Gen_op_movl_seg_t0_vm (r_cs );
Gen_op_movl_t0_t1 ();
Gen_op_jmp_t0 ();
The definition of these micro-instruction functions is as follows (the function can be noted ):
Static inline void gen_op_movl_t0_im (int32_t Val)
{
Tcg_gen_movi_tl (cpu_t [0], Val); // equivalent to cpu_t [0] = Val
}
Static inline void gen_op_movl_t1_imu (uint32_t Val)
{
Tcg_gen_movi_tl (cpu_t [1], Val); // equivalent to cpu_t [1] = Val
}
Static inline void gen_op_movl_seg_t0_vm (INT seg_reg)
{
Tcg_gen_andi_tl (cpu_t [0], cpu_t [0], 0 xFFFF); // cpu_t [0] = cpu_t [0] & 0 xFFFF
Tcg_gen_st32_tl (cpu_t [0], cpu_env,
Offsetof (cpux86state, segs [seg_reg]. selector); // The value of cpu_t [0] store to the 'offset 'of cpu_env
Tcg_gen_shli_tl (cpu_t [0], cpu_t [0], 4); // cpu_t [0] = cpu_t [0] <4
Tcg_gen_st_tl (cpu_t [0], cpu_env,
Offsetof (cpux86state, segs [seg_reg]. Base); // The value of cpu_t [0] store to the 'offset 'of cpu_env
}
Static inline void gen_op_movl_t0_t1 (void)
{
Tcg_gen_mov_tl (cpu_t [0], cpu_t [1]); // cpu_t [0] = cpu_t [1]
}
Static inline void gen_op_jmp_t0 (void)
{
Tcg_gen_st_tl (cpu_t [0], cpu_env, offsetof (cpustate, EIP); // The value of cpu_t [0] store to the 'offset' of cpu_env
}
The cpu_t [0] And cpu_t [1] functions are the same as those described in T0 and T1. They are all variables used for temporary storage. On a 32-bit target machine, tcg_gen_movi_tl is the tcg_gen_op2i_i32 function, which is defined as follows:
Static inline void tcg_gen_op2i_i32 (int opc, tcgv_i32 arg1, tcgarg arg2)
{
* Gen_opc_ptr ++ = OPC;
* Gen_opparam_ptr ++ = get_tcgv_i32 (arg1 );
* Gen_opparam_ptr ++ = arg2;
}
Static inline void tcg_gen_movi_i32 (tcgv_i32 ret, int32_t Arg)
{
Tcg_gen_op2i_i32 (index_op_movi_i32, RET, ARG );
}
Gen_opparam_buf is the buffer used to store the operands. The storage order is as follows: 1st 4 bytes represent S-> temps (an array used to store the target value, that is, an output parameter) index, the 2nd 4-byte and later bytes represent the input parameters. For details about the parsing process, see the tcg_reg_alloc_movi function. The sample code is as follows:
Tcgtemp * OTS;
Tcg_target_ulong val;
OTs = & S-> temps [ARGs [0];
Val = ARGs [1];
OTS-> val_type = temp_val_const;
OTS-> val = val; // temporarily store the input values in the OTs Structure
Next, based on the Operation Code List saved by gen_opc_buf, the parameter list saved by gen_opparam_buf, And the tcgcontext structure, the final command generated by JMP f000: e05b is called by the tcg_gen_code_common function, as follows:
099d0040 B8 00 F0 00 00 mov eax, 0f000h
099d0045 81 E0 FF 00 00 and eax, 0 ffffh
099d004b 89 45 50 mov dword ptr [EBP + 50 H], eax
099d004e C1 E0 04 SHL eax, 4
099d0051 89 45 54 mov dword ptr [EBP + 54 h], eax
099d0054 B8 5B E0 00 00 mov eax, 0e05bh
099d0059 89 45 20 mov dword ptr [EBP + 20 h], eax
099d005c 31 C0 XOR eax, eax
099d005e E9 25 5d ca 06 JMP _ code_gen_prologue + 8 (12775d88h)/* return */
We can see from the above that the generated host code is very concise. For the JMP of the target machine, the host does not execute the real jump command, but simply put the target address in the EIP.
Qemu maintains
The data structure of the cpustate, which includes all the registers of the CPU of the target machine, such as eax, EBP, ESP, Cs, EIP, and eflags.
It always represents the current status of the target machine, and I use the env variable to represent
Cpustate structure,
Qemu always uses
Env. Cs + env. EIP is the starting address.
As mentioned above, the JMP f000: e05b command is divided into the following micro operations:
Gen_op_movl_t0_im (selector );
Gen_op_movl_t1_imu (offset );
Gen_op_movl_seg_t0_vm (r_cs );
Gen_op_movl_t0_t1 ();
Gen_op_jmp_t0 ();
The meaning of these micro-operations is very simple. It is to put selector in env. CS and offset in env. EIP. In debugging, comparing the qemu execution of the target command with bochs is a very interesting thing. Of course, this is just a different design concept, and there is no technical advantages or disadvantages.