Source: http://blog.sina.com.cn/s/blog_5d90e82f01018ge9.html
The interpreter is in-depth. Although I tried to start from the basic principle and try to make this article independent of other knowledge, this tutorial is not an entry to functional programming, so I suppose you have learned the most basic scheme and functional programming. If you are not familiar with this, you can read Chapter 1 and Chapter 2 of SiC. Of course, you can continue to read this article and check the information if you do not understand it. Here I will also talk about the principle of recursion and pattern matching. If you already know these things, the content here may help you better understand them.
The interpreter is not difficult, but many people will not write it, because it is as complicated as a python interpreter in their minds. If you want to write a python interpreter at the beginning, you will probably never write it. You must start with the simplest language and gradually increase the language complexity to construct a correct interpreter. This article shows you how to write an interpreter for the simplest language (lambda calculus) with basic arithmetic functions that can be used as an advanced calculator.
General compiler courses often start with syntax analysis (parsing), tossing tools such as lex and YACC. The function of parsing is to decode the string into the program's syntax tree (AST) structure. After getting AST for a long time, the real difficulties started! Many people have fallen after writing parser. For this reason, here I use "s-expression" to represent the syntax tree (AST) Structure of the program. S-expression allows us to skip the parse step and enter the key topic: semantics (semantics ).
The scheme implementation here is racket. To make the program concise, I used the pattern match of racket ). If you use other scheme implementations, you may need to make some adjustments on your own.
What is the interpreter?
First, let's talk about what the interpreter is. The interpreter is similar to the calculator. They all accept an "expression" and output a "result ". For example, 3 is output after '(+ 1 2) is obtained. However, the interpreter expression is more complex than the calculator expression. The interpreter accepts expressions called "programs", not just simple arithmetic expressions. In essence, every program is a "Description" of a machine, and the interpreter is "simulating" the operation of this machine, that is, "computing ". In a sense, the interpreter is the essence of computing. Of course, different interpreters bring different computations.
Note that the parameters accepted by our interpreter are the "Data Structure" of an expression rather than a string. Here we use a data structure called "s-expression" to represent an expression. For example, the content in the expression '(+ 1 2) contains three symbols:' +, '1 and '2, rather than the string "(+ 1 2 )". It is convenient to extract information from structured data, while extracting information from strings is troublesome and error-prone.
In a broad sense, the interpreter is a general concept. A calculator is actually a form of interpreter, But it processes a much simpler language than a program interpreter. You may find that CPU and human brain are essentially interpreters, because interpreters are essentially "any machine for processing languages ".
Recursive Definition)
The interpreter is generally a recursive program ". The reason for recursion is that the data structure (Program) It processes is a "recursive definition" structure. An arithmetic expression is a structure such as '(* (+ 1 2) (* (-9 6) 4 )). Each expression can contain a subexpression, and a subexpression can also contain a subexpression, so endless nesting. It seems complicated, but its definition is:
There are two forms of "arithmetic expression:
1) a number
2) A structure like '(OP E1 E2) (where E1 and E2 are two "arithmetic expressions ")
Does it show where it is "recursion? We defined the concept of "arithmetic expression", which uses the concept of "arithmetic expression! This constructs a "loop" that allows us to generate expressions in any depth.
Many other data, including natural numbers, can be defined using recursion. For example, the common definition of natural numbers is:
There are two forms of "Natural Number:
1) zero
2) the successor of a "natural number"
Have you seen it? The definition of "Natural Number" shows itself! This is why we have an infinite number of natural numbers.
So it can be said that recursion is ubiquitous, and some people even say that recursion is the ultimate principle of nature. Recursive data is always processed by recursive Programs. Although recursion is sometimes manifested in another form, such as loop, the concept of "recursion" is more extensive than "loop. There are many recursive Programs that cannot be expressed by loops. For example, if the interpreter we are writing today is a recursive program, it cannot be expressed by loops. Therefore, writing a correct recursive program is crucial to designing any system. Actually, the concept of recursion is not limited to programming. In mathematical proof, there is a concept called "induction", for example, "mathematical induction ). In fact, induction is exactly the same as recursion.
Our interpreter today is a recursive program. It accepts an expression, recursively calls it to process Each subexpression, and then combines the recursive results to form the final result. This is a bit like binary tree traversal, but our data structure (Program) is more complex than binary tree traversal.
Pattern Matching and Recursion: A simple calculator
Since the calculator is the simplest interpreter, why don't we start with the calculator? The following is a calculator that can calculate four arithmetic expressions. These expressions can be nested at will, for example, '(* (+ 1 2) (+ 3 4 )). I want to explain the pattern matching and Recursion principles in this simple example.
The following is the code for this calculator. It accepts an expression and outputs a number as the result, as shown in the previous section.
(Define calc
(Lambda (exp)
(Match exp; matching expression
[(? Number? X) x]; returns a number.
['(, Op, E1, E2); matches and extracts the operators op and the two operands E1, E2.
(Let ([V1 (calc E1)]; recursively call calc itself to obtain the value of E1
[V2 (calc E2)]); recursively call calc itself to obtain the E2 Value
(Match op; Branch: four cases of Processing Operator op
['+ (+ V1 V2)]; if it is a plus sign, the output result is (+ V1 V2)
['-(-V1 V2)]; if it is a subtraction, multiplication, division, similar processing
['* (* V1 V2)]
['/(/V1 V2)])
The match statement here is a pattern match. The format is as follows:
(Match exp
[Mode result]
[Mode result]
......
)
It performs the "branch" operation based on the "structure" of the expression exp. Each branch is composed of two parts: a "pattern" on the left and a result on the right. The pattern on the left may be bound to some variables after matching. They can be used in the expression on the right.
Generally, there are many situations in which data is defined and the "pattern" is used to process data. For example, there are two arithmetic expressions: numbers or (OP E1 E2 ). Therefore, the match statement used to process it has two modes. "I can handle all your situations ". The concept of exhaustion is very important. Any situations you miss may cause troubles. The so-called mathematical induction is the manifestation of this exhaustive method in the recursive definition of natural numbers. Because you put aside two forms that may be constructed for all natural numbers, you can ensure that the theorem is true for "any natural number.
How does the model work? For example, '(, op, E1, E2) is a pattern, which is used to match the input exp. The basic principle of pattern matching is to match data with the same structure. For example, if exp is '(+ 1 2),' (, op, E1, E2) binds op to '+ and E1 to '1, bind E2 to '2. This is because they have the same structure:
'(, Op, E1, E2)
'(+ 1 2)
To put it bluntly, the mode is a "Data Structure" that can contain "names" (such as op, E1, and E2), such as '(, op, E1, E2 ). We use the structure with the name to "match" the actual data (like '(+ 1 2 )). When they match one by one, these names are automatically bound to the values at the corresponding position in the actual data. The mode can contain not only the name but also the specific data. For example, you can construct a pattern '(, op, E1 42) to match the expressions with the second operand fixed to 42.
When you see the mode on the left, you can directly "see" the form of input data, and then operate on the elements in it. It allows us to "split" (destruct) the data structure at one time, bind the values of each component (domain) to multiple variables without using multiple access functions. Therefore, pattern matching is a very intuitive programming method, which is worthy of reference for each language. Many functional languages have similar functions, such as ML and Haskell.
Note that the operands in E1 and E2 are not values. They are expressions. We call interp1 recursively to obtain values V1 and V2 of E1 and E2 respectively. They should be numbers.
Do you notice where we use recursion? If you look at the definition of "arithmetic expression" again:
There are two forms of "arithmetic expression:
1) a number
2) A structure like '(OP E1 E2) (where E1 and E2 are two "arithmetic expressions ")
You will find that the "recursion" in this definition is E1 and E2, so calc recursively calls itself on E1 and E2. If you perform recursion in every recursion part of the data definition, your recursive program will enumerate all the situations.
Then, we operate the two values V1 and V2 respectively based on the op operator. If op is the plus sign '+, we call the addition operation of scheme to act on V1 and V2, and return the calculated value. If it is a minus sign, a multiplication number, or a division number, we also perform corresponding operations and return their values.
So you can get the following test results:
(Calc' (+ 1 2 ))
; => 3
(Calc' (* 2 3 ))
; => 6
(Calc' (* (+ 1 2) (+ 3 4 )))
; => 21
A calculator is that simple. You can try these examples and then create some new ones by yourself.
What is lambda calculus?
Now let's transition to a more powerful language: lambda calculus. Although its name looks scary, it is actually very simple. Its three elements are: variables, functions, and calls. In traditional expressions, they look like:
Variable: x
Function: λ x. t
Call: T1 T2
Each programming language has these three elements, but the specific syntax is different, so you actually use lambda calculus every day. Using scheme as an example, these three elements look like:
Variable: x
Function :( Lambda (x) e)
Call: (E1 E2)
The general programming language has many other structures, but these three elements are indispensable. Therefore, the most important step to build an interpreter is to clarify the three things. The interpreter used to construct any language generally starts with these three elements and adds them to other elements only after they are completely correct.
There is a simple way of thinking that allows you to directly see the essence of these three elements. Do you remember that every program is a "machine description? Therefore, each lambda calculus expression is also a machine description. Such machines are very similar to electronic lines. The lambda calculus program has a one-to-one correspondence with the machine: a variable is a wire. A function is a "model" of an electronic device. It has its own input and output terminals and its own logic. One call is to insert an "Instance" of an electronic device in the design and connect its input terminal to some existing wires called "parameters ". So a lambda calculus interpreter is actually an electronic line simulator. So if you hear that some chip companies are starting to design hardware in languages similar to Haskell (such as bluespec system OpenGL), it's no surprise.
Note that, unlike the general language, the lambda calculus function has only one parameter. This is not a serious limitation, because the lambda calculus function can be passed as a value (this is called the first-class function ), therefore, you can use nested function definitions to represent functions with more than two parameters. For example, (lambda (x) (lambda (y) can represent a function with two parameters. It returns the second parameter. But when it is called, you need two-tier calls, like this:
(Lambda (x) (lambda (y) 1) 2)
; => 2
Although it looks ugly, it makes the Ultimate simplicity of our interpreter. Simplicity is critical to people who design programming languages. Complicated design at the beginning often leads to a bunch of ambiguous problems.
Lambda calculus is different from other common languages in that it does not have basic data types such as numbers, so you cannot directly use lambda calculus to calculate expressions like (+ 1 2. But interestingly, numbers can be encoded by the three basic elements of lambda calculus. This encoding can be used to represent natural numbers, Boolean types, pair, list, and all data structures. It can also represent complex syntax structures such as if condition statements. A common such encoding is called the church encoding. Therefore, lambda calculus can generate functions in almost all programming languages. The old Chinese saying "three lives and everything" may mean this.
Order of evaluation, call-by-name, call-by-Value
When interpreting a program, we can have several different evaluation orders ). This is a bit like traversing a binary tree in several different sequences (Central, forward, and backward ). However, the order here is more complex. For example, the following program:
(Lambda (x) (* X) (+ 1 2 ))
We can first execute the outermost call and pass (+ 1 2) into the function to get (* (+ 1 2) (+ 1 2 )). So the order of evaluation is:
(Lambda (x) (* X) (+ 1 2 ))
=> (* (+ 1 2) (+ 1 2 ))
=> (* 3 (+ 1 2 ))
=> (* 3 3)
=> 9
But we can also calculate the result of (+ 1 2) first, and then pass it into this function. So the order of evaluation is:
(Lambda (x) (* X) (+ 1 2 ))
=> (Lambda (x) (* X) 3)
=> (* 3 3)
=> 9
The first method is called call-by-Name (CBN) because it transfers the "name" (that is, the expression itself) of the parameter to the function. The second method is called call-by-value (CBV), because it first interprets the parameter names and obtains their "values" before passing them into the function.
The efficiency of the two interpretation methods is different. From the above example, you can see that CBN is a step more than CBV. Why? Because the function (lambda (x) (* x) contains two X, the (+ 1 2) is copied when it is passed into the function. Then we need to explain each copy of it, so (+ 1 2) is calculated twice!
For this reason, almost all programming languages use CBV instead of CBN. CBV is often called "strict" or "applicative order ". Although CBN is inefficient, it does not have this problem if it is equivalent to an ordered call-by-need. The basic principle of call-by-need is to "share" and "remember" The copied expressions in CBN ". When a copy of an expression is calculated, other copies automatically obtain its value, so as to avoid repeated value calculation. Call-by-need is also called "lazy evaluation", which is the semantics used by Haskell.
The order of value not only stays in call-by-name, call-by-value, call-by-need. People have also designed many other order of value, although most of them are not as practical as call-by-value and call-by-need.
Complete lambda calculus Interpreter
Below is the interpreter we will complete today. It has only 39 rows (excluding empty rows and comments ). You can first note the comments of each part, which indicate the names of each part and give a little explanation. This interpreter implements the lambda calculus of the CBV sequence, and the basic arithmetic. The reason for the addition of basic arithmetic is to allow beginners to write interesting programs, so that they will not be forced to learn the church encoding at the beginning.
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
; The following three definitions env0, ENT-env, and lookup are basic operations on the environment (Environment:
; Empty environment
(Define env0 '())
; Extension. Extends the environment env and maps X to V to create a new environment.
(Define ext-env
(Lambda (X v env)
(Cons' (, X., v) ENV )))
; Search. Search for the value of X in the ENV Environment
(Define Lookup
(Lambda (x env)
(Let ([P (assq x env)])
(Cond
[(Not p) x]
[Else (CDR p)])
; Data structure definition of closure, including a function definition F and the environment in which it is defined
(Struct closure (F env ))
; Recursive definition of the interpreter (two parameters are accepted, the expression exp and the environment env)
; In five cases (variables, functions, calls, numbers, and arithmetic expressions)
(Define interp1
(Lambda (exp env)
(Match exp; pattern matches exp in the following cases (Branch)
[(? Symbol? X) (lookup x env)]; variable
[(? Number? X) x]; number
['(Lambda (, x), e); Function
(Closure exp env)]
['(, E1, E2); call
(Let ([V1 (interp1 E1 env)]
[V2 (interp1 E2 env)])
(Match V1
[(Closure '(lambda (, x), e) env1)
(Interp1 E (ext-env x V2 env1)]
['(, Op, E1, E2); arithmetic expression
(Let ([V1 (interp1 E1 env)]
[V2 (interp1 E2 env)])
(Match op
['+ (+ V1 V2)]
['-(-V1 V2)]
['* (* V1 V2)]
['/(/V1 V2)])
; The interpreter's "User Interface" function. It wraps interp1 and masks the second parameter. The initial value is env0.
(Define interp
(Lambda (exp)
(Interp1 exp env0 )))
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
Test example
Here are some test examples. You 'd better take a look at it first, or write some new examples by yourself. The best way to learn a program is to play with it, give it some input, and observe its behavior. Sometimes this is more intuitive and clearer than the descriptions in any language.
(Interp' (+ 1 2 ))
; => 3
(Interp' (* 2 3 ))
; => 6
(Interp' (* 2 (+ 3 4 )))
; ;=> 14
(Interp' (* (+ 1 2) (+ 3 4 )))
; => 21
(Interp' (lambda (x) (lambda (y) (* x y) 2) 3 ))
; => 6
(Interp' (lambda (x) (* 2 X) 3 ))
; => 6
(Interp' (lambda (y) (lambda (x) (* Y 2) 3) 0) 4 ))
; => 6
; (Interp' (1 2 ))
; ;=> Match: no matching clause for 1
In the following sections, let's take a look at the various situations of the main branch expression in this interpreter.
Explanation of basic arithmetic operations
Arithmetic operations are the simplest and most basic thing in the interpreter, because they cannot be subdivided into smaller elements. So before dealing with functions, calls, and other complex structures, let's take a look at the processing of arithmetic operations. The following is the part of the interpreter that processes basic arithmetic operations. It is the last branch of interp1.
(Match exp
......
['(, Op, E1, E2)
(Let ([V1 (interp1 E1 env)]; recursively call interp1 itself to obtain the value of E1
[V2 (interp1 E2 env)]); Call interp1 recursively to obtain the E2 value.
(Match op; Branch: four cases of Processing Operator op
['+ (+ V1 V2)]; if it is a plus sign, the output result is (+ V1 V2)
['-(-V1 V2)]; if it is a subtraction, multiplication, division, similar processing
['* (* V1 V2)]
['/(/V1 V2)])
You can see that it is almost the same as the calculator you just wrote, but now there is only one parameter for interp1 calls. What is this env? Let's talk about it soon.
Variables and functions
I would like to use two sections to briefly introduce variables, functions, and environments. Later, let's take a look at how they are implemented.
Variable generation is one of the greatest breakthroughs in the history of mathematics. Because variables can be bound to different values, the function implementation is possible. For example, the mathematical function f (x) = x * 2, where X is a variable, it passes the input value to the main body of the function "x * 2. Functions cannot be implemented without variables.
The most basic operations on a variable are binding and evaluate ). What is binding? Take the above function f (x) as an example. When X is equal to 1, the value of f (x) is 2, and when X is equal to 2, the value of f (x) is 4. In the above sentence, we bound x twice. X is bound to 1 for the first time and 2 for the second time. You can think of "binding" as an action like the moment when you plug the plug into the power outlet. The pin of the plug is the X in f (x), and the X in X * 2 is the other end of the wire. So when you plug the plug into the socket, the current will reach the other end through this wire. If the electrical conductivity of the wire is good, the voltage at both ends should be almost equal. A little out of question ...... Remember that binding is the "Action" inserted into the socket ".
What about "value? Let's take a look at the previous example. When we use a gauge to test the voltage at the other end of the wire, we are taking the value of this variable. Sometimes this process is not so obvious, for example, if the current drives the motor of the fan. Although no voltage is displayed on the other end of the wire, the current has been applied to the input terminal of the motor and enters the coil. So you can say that the motor is actually taking the variable value.
Environment
Our interpreter is a stupid program that can only do things step by step. For example, when it requires the value of F (1), it performs the following two steps: 1) bind X to 1; 2) the function body that enters F evaluates x * 2. This is like a person doing these two actions: 1) plug into the socket, 2) Go to the other end of the wire to measure its voltage, and multiply the result by 2. Between Step 1 and step 2, how do we remember the value of X? It must be passed to the recursive interpreter used to process the function body. This is why we need "environment", that is, the second parameter env of interp1.
The environment records the values of variables and passes them to their "visible regions", which are called "Scope" in terms ). Normally, the scope is the whole function body, but there is an exception: When the function body has nested function definitions, if the internal function has the same parameter name, the Parameter Name of the outer layer will be "blocked" (Shadow. In this way, the internal function body will not see the outer parameter, but its own. For example (lambda (x) (* x 2), the X in it sees the X of the inner function, rather than the outer one.
In our interpreter, the main components used for processing the environment are as follows:
; Empty environment
(Define env0 '())
; Extends the environment env and maps X to V
(Define ext-env
(Lambda (X v env)
(Cons' (, X., v) ENV )))
; Value. Search for the value of X in the ENV Environment
(Define Lookup
(Lambda (x env)
(Let ([P (assq x env)])
(Cond
[(Not p) x]
[Else (CDR p)])
Here we use the association list of scheme to represent the environment. The Association list looks like this: (X. 1) (Y. 2) (Z. 5 )). That is, a two-element group (pair) linked list. The element on the left is the key, and the element on the right is the value. The intuitive aspect of writing is:
(X. 1)
(Y. 2)
(Z. 5 ))
The lookup operation searches from start to end. If the key on the left is the variable to be searched, the entire pair is returned. Simple?
EXT-env extends an environment. For example, if the original environment is. 2) (Z. 5) SO (ext-env x 1 (Y. 2) (Z. (5. 1) (Y. 2) (Z. 5 )). That is, put (X. 1) at the beginning. It is worth noting that after the environment is expanded, a new environment is formed, and the original environment is not "changed ". For example, the red part above is the original data structure, but it is placed in another larger structure. This is called "functional data structure ". This nature is crucial in our interpreter, because when we expand an environment, other parts of the code can still access the old environment before expansion. When we talk about the call, you may find the usefulness of this nature.
You can also use a more efficient data structure (such as the splay tree) to represent the environment. You can even use functions to represent the environment. The only requirement is that it is the "map" from the variable to the value ). You Map x to 1. If you want to query the value of X later, it should still be 1 and will not disappear or be another value. That is to say, these functions must meet such an "interface Convention": If e is the environment returned by (ext-env 'x 1 env), then (lookup 'x E) 1 should be returned. All functions that meet such interface conventions can be called ext-env and lookup, so that they can be used to completely replace the functions here without causing modifications to other code. This is called "abstraction", that is, the essence of "object-oriented language.
INTERPRETATION OF VARIABLES
After learning about variables, functions, and environments, let's take a look at the interpreter's operations on variables, that is, the first case of interp1 match. It is very simple, that is, to find the value of a variable in the environment. Here (? Symbol? X) is a special mode. It uses the scheme function symbol? To determine whether the input matches. If yes, bind it to X, find its value, and then return this value.
[(? Symbol? X) (lookup x env)]
Note that because our interpreter is recursive, this value may be returned to a higher-level expression, for example (* x 2 ).
Explanations of numbers
The interpretation of numbers is also very simple. Because the name '2 in scheme is number 2 (I think this is a small mistake in scheme design), we do not need to make special processing on the number name, returns an unblocked response.
[(? Number? X) x]
Explanation of functions
It is difficult to explain the function clearly. Because the function body may contain parameters of the outer function, for example, Y in (lambda (y) (lambda (x) (* Y 2) is the parameter of the outer function, it appears in the internal function definition. If the inner function is returned as a value, (* Y 2) runs out of the scope of Y. Therefore, we must make the function "closure ). A closure is a special data structure consisting of two elements: the definition of a function and the current environment. So our explanation of a function (lambda (x) e) is as follows:
['(Lambda (, x), E)
(Closure exp env)]
Note that the exp here is '(lambda (, x), e. We just wrapped it up and put it together with the current environment into a data structure (closure) without any complicated operations. Here, our closure uses a struct structure of racket, that is, a record type ). You can also use other forms to represent closures. For example, some interpreters advocate using functions to represent closures. In fact, it doesn't matter in any form, as long as the exp and env values can be stored. I prefer struct because its interface is simple and clear.
Why do we need to save the current environment? Because when this function is returned as a value, we must remember the parameter binding of the outer function. For example, (lambda (y) (lambda (x) (* Y 2 ))). When it is applied to 1, we will get the inner function (lambda (x) (* Y 2 )). When this function is called after a period of twists and turns, what is the value of Y? The correct method should be equal to 1. This method of recording the value of the outer parameter in the closure of the internal function is called "lexical scoping" or "static scoping ".
If you do not use a closure, but directly return the function body, you may find another Y in the position where (lambda (x) (* Y 2) is called, to use its value. The "dynamic" Method for parsing variables during the call is called "dynamic scoping ". It turns out that dynamic scoping is a serious mistake, which leads to various hard-to-find bugs in early languages. Many early languages were dynamic scoping because they only saved the code of the function and did not save the environment at its definition. This is easier, but it brings too much trouble. In the early days of lisp, the current Emacs lisp and tex were the languages using dynamic scoping.
To demonstrate the difference between lexical scoping and dynamic scoping. You can execute the following code in our interpreter:
(Interp' (lambda (y) (lambda (x) (* Y 2) 3) 0) 4 ))
The red part is the example above. Here, Y in (* Y 2) is actually in Lambda (y. After the red part is applied to 3. (Lambda (x) (* Y 2) is returned as a value. Then it is applied to 0 (X is bound to 0, ignored), SO (* Y 2) should be 6. However, if our interpreter is dynamic scoping, the final result will be 8. This is because the start of Y in the outermost layer is bound to 4, and dynamic scoping does not remember the value of Y in the inner layer, so the value of Y in the outer layer is used.
Why is lexical scoping better? You can understand it with Simple intuition. When you construct an internal function, if it references an external variable, such as Y in this example, from the outer Y to the inside of the function, A "channel" is displayed ). You can think of this internal function as a circuit component, which has a node y connected to a wire y from the outside. When this component is returned, it is like this component is dug out for use elsewhere. But where can I obtain the input from the y node when it is used (called? Obviously, you should not use a certain y at the call, because this Y and the previous y, although both called Y, are not "the same y", that is, the same name and meaning. They can even represent different types of things. So this y should still be connected to the original y wire. When this internal component moves, it is like this wire is infinitely extended, but it is always connected to the original node.
Description of function calls
Well, we finally got to the last point, function calling. Function calls are in the form of (E1 E2). Therefore, we need to obtain the values of E1 and E2 respectively. This is similar to the basic operation. You must first obtain the values of the two operands.
Function calling is like inserting an electrical appliance plug into a socket to start running. For example, when (lambda (x) (* x 2) is applied to 1, we bind X to 1 and then explain its function body (* x 2 ). But there is a problem here. If there is an unbound variable in the function, what value should it take? From the above closure discussion, you already know that, in fact, after the operand E1 is evaluated, it should be a closure, so there should be unbound variable values in it. Therefore, we will take out the environment (env1) saved in this closure, expand it, bind X to V2, and then use this extended environment to explain the function body.
The function call code is as follows:
['(, E1, E2)
(Let ([V1 (interp1 E1 env)]
[V2 (interp1 E2 env)])
(Match V1
[(Closure '(lambda (, x), e) env1); extracts sub-structures in the closure using pattern matching.
(Interp1 E (ext-env x V2 env1)]; Bind X to v2 in the closure environment to explain the function body.
)]
You may be wondering, isn't the Env of the interpreter's Environment unnecessary here? Yes. We use env to calculate the values of E1 and E2 because the variables in E1 and E2 exist in the "current environment ". The environment env1 in E1 is used to calculate the function body because the function body is not defined in the current environment and its code is elsewhere. If we use env to explain the function body, it will become dynamic scoping.
Experiment: You can change env1 in (interp1 E (ext-env x V2 env1) to env. Then, try the code we have discussed before and the output will be 8:
(Interp' (lambda (y) (lambda (x) (* Y 2) 3) 0) 4 ))
In addition, we can also see the benefits of using the "functional data structure" in the environment. When a closure is called, its environment is extended, but this does not affect the original environment. what we get is a new environment. Therefore, after the function is returned, the function parameter binding will automatically "deregister. If you use a non-functional data structure, you do not create a new environment when binding parameters, but assign values to the existing environment, the value assignment operation will change the content of the original environment permanently. Therefore, you must delete the parameter binding after the function returns. This is not only troublesome, but it is almost impossible to effectively control it in complex circumstances. Every time I use the value assignment operation to modify the environment, unexpected troubles may occur. Therefore, when writing interpreters and compilers, I only use functional data structures to represent the environment.
Next step
After understanding the BASIC Interpreter structure described here, what can be done next? In fact, from this Basic Interpreter prototype, You can further develop a lot of content, such:
Add some structures in this interpreter, such as recursion and state, so that you can get a complete interpreter of the program language, such as scheme or Python.
Abstract The interpreter to deduce the program type. If you are interested, refer to the Hindley-Milner system I implemented or the Python type derivation.
Make some changes to this interpreter to get a very powerful online partial evaluator, which can be used for Compiler optimization.
If you have any questions, please contact me: shredderyin@gmail.com. It should also be pointed out that learning this interpreter does not mean understanding the theory of programming language. So after learning this, I still want to read some books about semantics, just like the one I recommended in this blog.