Chibi-scheme source code analysis 3-Compilation Principle

Source: Internet
Author: User
Code Process

The sexp_eval_string function shows the entire code process.

Sexp_eval_string = sexp_read_from_string + sexp_eval

The former has completed syntax analysis, and the latter has compiled and executed

Sexp_eval = sexp_compile + sexp_apply

The previous compile completes compilation to generate bytecode, and the subsequent apply will execute bytecode on the virtual machine.

Sexp_compile = sexp_analyze + generate

The compilation process is the process of syntax analysis and code generation.

Lexical Analysis

Lexical analysis is mainly completed by the sexp_read function in the sexp. c file. This function receives input from the external data stream and reads a scheme object. The special Syntax of Lisp (All parentheses) makes lexical analysis of this language very simple.Source code, Sexp_read calls sexp_read_raw to complete the real work.

The general structure of sexp_read_raw is as follows:

Scan_loop:
Read one byte
If EOF is returned, sexp_eof is returned.
If it is a line break or a space or \ t, or \ t or \ r goto scan_loop
If 'returns (Quote XXX) The linked list
If "sexp_read_string function is called
If (
Recursively call sexp_read_raw to read an atom
While (not sexp_eof, not sexp_close, not sexp_rawdot)
Connect a linked list with sexp_cons
If #
Read another byte
If it is B or B, read a binary number.
If it is O or O, read an octal number.
If it is D or D, read a decimal number.
If X or X is used to read a hexadecimal number
If it is E or E, read a number of characters
Returns sexp_true or sexp_false if the character is read by F, F, T, or T.
...
If yes, read comments and discard them.
If yes | read comments and discard them
...
If the number is 0-9, sexp_read_number is called.
Otherwise, sexp_read_symbol is called.

In general, it is a loop, and then analyze the first few characters in the switch-case. After determining the type, they call sexp_read_number, sexp_read_symbol, sexp_read_string, and so on or recursively call sexp_read_raw. This is very easy and can be viewed against the source code.

The initial external input is transformed into various internal objects of scheme through lexical analysis, such as flonum, String, Char, vector, and the most important list object. These are further provided for syntax analysis.

Environment

"Environment" is a very important concept in the scheme language. "Context" may be an important concept specific to the implementation of Chibi-scheme, but "environment" is definitely a compiler in any scheme/lisp language. It is a concept shared by interpreters.

The environment can be seen as an associated container, storing all the symbol-value ing. For each symbol, the corresponding scheme object can be found through the environment, that is, the value bound to the symbol.

If it is an interpreter, the Environment always exists. The interpreter finds the bound value in the environment every time it encounters a symbol, so the efficiency is not high. The compiler, such as Chibi-scheme, uses the environment to find the value bound to the symbol. After the bytecode is generated, you can directly use the value without querying the binding.

The environment is also a sexp in Chibi-scheme, and its data structure is as follows:

 
Struct{
Sexp parent, lambda, bindings;
} Env;

The bindings field is (symbol1. value1) (symbol2. value2)... (symboln. valuen.

Each env structure is connected through the parent field. Every time you enter a new Lambda expression (equivalent to a function in other languages), a new environment is generated. The parent field of the new environment points to the environment where the lambda expression is defined.

When binding a search symbol, if it can be found in bindings of the current ENV, this symbol is a local variable. If it is not found, it will be further queried along the parent chain until it is found. The binding in the Env that appears in the parent chain is called the free value ). If the symbol binding cannot be found in the parent chain, an error is returned.

Next, we will analyze the creation process of the entire standard environment. The corresponding function is sexp_make_standard_env_op In the eval. c file.

Sexp_make_standard_env_op = sexp_make_primitive_env + sexp_load_standard_env

The latter mainly loads the initialization file init. SCM. The former executes sexp_make_null_env and then loads the Opcodes table. The Opcodes table is in the Opcodes. c file. This Opcodes table contains the opcode structure of the basic functions of the Scheme language, such as car And CDR.

Sexp_make_null_env_op calls sexp_make_env to allocate an env structure, and then initializes the core_forms table. Note that core_forms and opcode are the basic symbols bound to scheme objects. Basic Tables (such as if, quote, lambda, set !) Is bound to core_forms, the basic process (such as car, CDR, cons) is bound to opcode.

Syntax analysis

Syntax analysis is mainly completed by the Analyze function in the eval. c file. The input accepted by syntax analysis is the scheme object, which is converted into an abstract syntax tree and returned.

The pseudocode of analyze is given below, which can be viewed against the source file:

Loop:
If X is pair
If it is not a list error
If car (X) is a symbol
Find the car (x) atom cell in the environment
If the symbol closure processing is not found (macro system)
Otherwise, the CDR (cell) obtains the actual op operation.
If op is the core table
If define is used to generate the ast of the Define table
If set generates the ast of the set table
If Lambda generates the ast of the lambda table
If the if statement is used to generate the ast of the IF table
...
If op is a macro expansion macro, jump to loop to continue processing.
If op is the operation code processing and generates the app's AST
If the ast of another app is generated
If X is a symbol
Generate ref AST
If X is a signed closure (macro system)
Expand macro and continue to call analyze
If X is null, an error is returned.
For other returned objects

Scheme's macro system is totally different from C's pure code replacement. It is a profound topic and will not be discussed here. The following uses several abstract syntax trees (AST) to study how syntax analysis works.

First, the ast referenced by the variable (all AST in Chibi-scheme is abstracted as sexp) can be found in the Union of sexp_struct of sexp. h:

 
Struct{
Sexp name, cell, source;
}Ref;

The ast function that generates variable reference is the analyze_var_ref function in eval. C. If scheme code A is available, the syntax analysis process is to first find the environment ENV, and then find the cell bound to symbol A in env. The returned AST is a ref, its name is the symbol A, and the cell is the cell found. With the ast referenced by this variable, you only need ast when the code is generated, and you no longer need to check the environment.

Let's look at a complicated example: the ast of the IF table. If the ast generated by the table is CND, you can find its definition in the Union of sexp_struct of sexp. h:

 
Struct{
Sexp test, pass, fail, source;
} CND;

Source is implemented by the compiler to support debug. The three useful fields are test, pass, and fail.

If table generation function in the eval. c file analyze_if, the Code is as follows:

StaticSexp analyze_if (sexp CTX, sexp X ){
...
Test = analyze (CTX, sexp_cadr (x ));
Pass = analyze (CTX, sexp_caddr (x ));
Fail_expr = sexp_pairp (sexp_cdddr (x ))? Sexp_cadddr (x): sexp_void;
Fail = analyze (CTX, fail_expr );
Res = (sexp_exceptionp (TEST )? Test: sexp_exceptionp (PASS )? Pass:
Sexp_exceptionp (fail )? Fail: sexp_make_cnd (CTX, test, pass, fail ));
...
ReturnRes;
}

Assume that the source code of scheme is (if (> A 3) B # T). After syntax analysis, the test field of the generated abstract syntax tree CND corresponds to the ast of (> A 3,

Pass and fail correspond to the ast of A and 3 respectively. B is a symbol, so its AST is a ref structure, and the ast of # T is itself.

Summary

Typical steps of compilation principles: lexical analysis, syntax analysis, intermediate code generation, code optimization, and target code generation... lexical analysis is first written here. Due to the simple Syntax of lisp, lexical analysis is very easy.

After lexical analysis, an important concept of scheme/lisp is inserted: environment. The environment provides a symbol-value binding. During syntax analysis, you must use the environment to find the scheme object corresponding to the symbol.

syntax analysis uses the output of lexical analysis as its input to generate an abstract syntax tree. Syntax analysis is a little more complex. Here we use the structure of the variable reference and the abstract syntax tree of the IF table as an example. As a Article , it may be a little long. So I will write it in two articles, and the next one will continue to explain the compilation principles, A more complex example of generating an abstract syntax tree using syntax analysis is provided. Then go to the code generation process.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.