Implementation of the C language interpreter-run the script (6)

Source: Internet
Author: User

Directory:

1. Script Execution Elements

2. Stack Simulation.

3. Variable address calculation in stack

4. function call Process

5. Command Parsing

6. C library function call

PreviousArticleIn, I mainly explained the parsing part of the language, and finally we produced the middle of the scriptCode. Next, it will be the most difficult time to parse and execute intermediate code!
The execution of the Code is actually another representation of the intermediate code after a certain amount of processing. As mentioned above, our intermediate code is in the form of a triple, for example, C = a + B * C; can be expressed as @ 1 = B * C; @ 2 = a + @ 1; @ 3 = c = @ 2; however, this intermediate code must be converted to facilitate parsing and execution. Next, I will explain step by step every process in which the intermediate code is executed.

1. Script Execution Elements
To execute a script, you must create an environment for it.ProgramCreate a process.
A c language program has only a few elements: operators, variables, and functions. Therefore, to execute a C script, it must first be parsed by intermediate code commands; secondly, it must have memory space for variables; and again, It must have function call parsing, this is the simulation of the function call stack. Therefore, the most important thing to execute a script is variable memory allocation and stack maintenance, as well as command execution.

2. Stack Simulation.
If you are familiar with C's call stack, this is easy to understand. Let's not talk about the stack changes when calling a function. Let's first explain the execution process of a function. For example:

IntAdd (IntA,IntB)
{
IntC, D, E;
C = A + B;
}

The intermediate code is as follows:

 
@ 1 = a + B;
@ 2 = c = @ 1;

During execution, we cannot directly search for variables based on the variable name, which is both troublesome and inefficient. Instead, we should access the variables based on the variable address. But where the variables are stored and how to calculate them are the reasons for Stack introduction. First, let's look at the stack corresponding to the above function:

 address var 
--------------
- 20 A
- 16 B
- 12 EIP
- 8 ESP
- 4 return-address
0 <----------------- ESP
0 C
4 d
8 E
12 @ 1
16 @ 2
------------

EIP indicates the current command location when the function is called. When the function returns, we need to pop this eip and continue executing the next command of EIP.
ESP indicates the starting position of the variable space of the current function when the function is called, that is, the caller's esp. When the function returns, we need to restore the ESP. ESP indicates the base address of the variable space of a function in the stack. Each function has a fixed ESP during execution, and each variable has a specific position in the stack. The distance between these variables and ESP is fixed.
Return-address mainly stores the return value address of the function, that is, the Temporary Variable generated by the function when it is called. When the function returns, the return value is entered in this address. In this way, the caller can obtain the call result from this temporary variable. For example: int A = add (3, 4); then, return-address should be the address of a or the address of another temporary variable. In short, assign a value to, it must depend on return-address.

With this stack, our intermediate code should be processed as follows:

 
@ 1 = a + B Corresponds to [esp +12] = [Esp-20] + [Esp-16];
@ 2 = c = @ 1 corresponds to [esp +16] = [Esp +0] = [Esp +12];

In the above Code, "[XX]" indicates the value in address XX. Because ESP is fixed during the execution of each function, we can omit esp without writing, so the above Code can be changed:

[12] = [-20] + [-16];
[16] = [0] = [12];

For ease of processing, we also put the intermediate variable in the stack. However, the address of the intermediate variable can be reused because after a statement is executed, the intermediate variables of this statement are no longer used. Therefore, the intermediate variables of the previous statement can be recycled.

3. Variable address calculation in stack
First, each function has a fixed ESP, which can be regarded as the starting position of the function in the stack. Other variables are then expressed as the value from ESP, that is, the offset. For example, in the above example, when we parse the intermediate code of a function, we will know all the local variables and parameter lists of the function, and know the types of these variables. We can calculate their position in the stack based on the type.

4. function call Process
For example, the following code is available:

IntAdd (IntA,IntB)
{
IntC, D, E;
C = A + B;
ReturnC;
}
IntMain (){
Add (4,5); <--------- ①
}

When the execution reaches ①, his stack space is like this:

Address offset VaR
--------------------------
....
15988 - 12 EIP
15992 - 8 ESP
15996 - 4 Return-address
16000 0 <------------------- (Assume main-ESP is 16000)

16000 - 20 4
16004 - 16 5
16008 - 12 EIP points to add ( 4 , 5 ).
16012 - 8 Main-ESP 16000
16016 - 4 Return-address
0 <-------------------( Add -ESP = 160000 + 20 =160020 )
16020 0 C
16024 4 D
16028 8 E
16032 12 @ 1
16036 16 @ 2
....
---------------------------

When the Add function returns, the function stack is recycled. To:

 address offset var 
------------------------
....
15988 - 12 EIP
15992 - 8 ESP
15996 - 4 return-address
16000 0 <----------------- (assume the main-ESP is 16000)
--------------------------

5. command Parsing
Each intermediate variable is composed of one operator and several operands. Here, we cannot list the parsing of all operators. Just explain the simplest case:
@ 1 = a + B Corresponds to [esp + 12] = [esp-20] + [esp-16];
This intermediate code, its operator is "+", its operand is [-20], [-16], and its target operand is [12]. So the parsing process is quite simple, and the C code is like this:
* (int *) (esp + 12) = * (int *) (esp-12) + * (int *) (esp-16);
in fact, I am doing this, just to adapt to the analysis of various commands, it seems more annoying, but the principle is the same. The Int type here is the type information contained in the operands. This is required. during the processing of the intermediate code, the type of each variable must be determined; otherwise, when the code is executed, there is no way to know the memory space it occupies.
This is the definition of each command. It is actually a two-way linked list, which facilitates the jump of Jump statements.

 typedef  struct  _ cmd {
char cmd;
struct {< br> char type;
int size;
Union {
int64 I;
double D;
}d;
} d [ 3 ];
int ex;
struct _ cmd * next;
struct _ cmd * pre;
}cmd_t;
cmd operator
d [ 3 ] Operations
ex additional information of some commands
next command
pre command

6. C library function call
C language has its library functions. If our interpreter needs to implement these library functions by itself, the workload will be greatly increased. Is there any way to directly call the system's library functions. If this can be done, users of the interpreter can provide a more powerful exchange method-that is, users can register their own functions for use by scripts. After thinking about a lot of methods, we only need to use assembly. The specific method is:
For example, the script has a line of code fopen ("test", "R ");
Then, we get the function name fopen and find that it is a registered function. Therefore, we get the fopen function pointer, which is assumed to be fptr. Therefore, the execution of this statement is as follows:
Push 0x123243; "test" Address
Push 0x894982; "R" Address
Call fptr; call the fopen function of the system.
...
 

I wrote an assembly code and used NASM (I used to use MASM) to smoothly port the code under liunx ). :

 ;  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;;;;;;;;;;;;;;;  
; ; NASM-fcoff call. ASM-O OUTFILE
; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;;;;;;;;;;;;;;;

[Bits 32 ] ; 32-bit Processor
[Section. text]

% Define Win32
% Ifdef Win32
% DEFINE _ funptr _ asm_funptr ; Save function pointer
% DEFINE _ argtab _ asm_argtab ; Parameter List
% DEFINE _ argtye _ asm_argtye ; Parameter type list
% DEFINE _ argnum _ asm_argnum ; Number of parameters
% DEFINE _ call _ asm_call
% Else
% DEFINE _ funptr asm_funptr
% DEFINE _ argtab asm_argtab
% DEFINE _ argtye asm_argtye
% DEFINE _ argnum asm_argnum
% DEFINE _ call asm_call
% Endif

Extern _ funptr
Extern _ argtab
Extern _ argtye
Extern _ argnum
Global _ call

_ Call:
XOR EdX, EDX
XOR ECX, ECx
MoV EBX, [_ argnum]
CMP EBX, 0
JZ End
Beg:
CMP DWORD [_ argtye + ECx],1
JZ Ft
Push DWORD [_ argtab + ECx]
Add EdX, 4
JMP Fe
FT:
Bytes DWORD [_ argtab + ECx]
Sub ESP, 8
Fstp Qword [esp]
Add EdX, 8
Fe:
Add ECX, 8
Sub EBX, 1
Jnz Beg
End:
MoV [_ Argnum], EDX
MoV Eax, [_ funptr]
Call Eax
Add ESP, [_ argnum]
RET

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.