This article has two purposes: one is to describe a common method for implementing a computer language interpreter, and the other is to highlight how to use Python to implement a subset of the Lisp scheme. I refer to my interpreter as Lispy (lis.py). A few years ago, I introduced how to write a scheme interpreter using Java, and I also wrote a version using Common Lisp. This time, my goal is to demonstrate as simply as possible what Alan Kay said about the "Maxwell's Equations of software" (Maxwell ' equations of Software).
Syntax and semantics of scheme subsets supported by Lispy
Most computer languages have many grammatical conventions (such as keywords, infix operators, parentheses, operator precedence, point marks, semicolons, and so on, but as a member of the Lisp language family, scheme all syntax is based on a list of prefixes that are enclosed in parentheses. This expression may seem strange, but it has a simple and consistent advantage. (Some people jokingly say "Lisp" is the abbreviation for "Lots of irritating Silly parentheses"-"a lot of annoying, stupid braces"; I think it's "Lisp is syntactically Pure"--"Lisp syntax is pure" 's initials. ) Consider the following example:
/* Java/
if (x.val () > 0) {
z = f (a*x.val () + b)
;}
/* Scheme/
(if (> Val x) 0)
(set! Z (f (+ (* A (Val x))))
Note that the exclamation point above is not a special character in scheme; it's just "set!." Part of that name. Only parentheses are special characters (in scheme). Similar to (set! x y) the list that starts with a special keyword is called a special form in Scheme (special form); The beauty of scheme is that we only need six special forms, and three other syntactic constructs--variables, constants, and procedure calls.
In the table, Var must be a symbol--an identifier similar to x or square must be an integer or floating-point number, and the rest of the words identified in italics can be any expression--number. Exp... Represents 0 or more repetitions of exp.
For more on scheme, refer to some excellent books (such as Friedman and Fellesein, Dybvig,queinnec, Harvey and Wright or Sussman and Abelson), video ( Abelson and Sussman), tutorials (Dorai, PLT or Neller), or reference manuals.
The responsibility of the language interpreter
A language interpreter consists of two parts:
1, parsing (parsing): The analytic section accepts an input program that uses a character sequence representation, validates the input program according to the grammatical rules of the language, and then translates the program into a middle representation. In a simple interpreter, the middle representation is a tree structure that closely reflects the nested structure of statements or expressions in the source program. In a language translator called a compiler, an internal representation is a series of instructions that can be executed directly by the computer (the author's intent is to say the Run-time system-the translator). As Steve Yegge said, "If you don't understand how the compiler works, you don't understand how the computer works." "Yegge describes 8 problems that the compiler can solve (or an interpreter, or a typical ironic solution with Yegge)." The Lispy parser is implemented by the parse function.
2, implementation: The internal representation of the program (by the interpreter) according to the language of the semantic rules for further processing, and then the actual operation of the source program. The execution portion of (Lispy) is implemented by the Eval function (note that the function overrides a function of the same name built in Python).
The following picture describes the interpreter's interpretation process, and the interactive session (after the picture) shows how the parse and eval functions operate on a small program:
Here, we use an internal representation as simple as possible where scheme lists, numbers, and symbols are represented by Python lists, numbers, and strings, respectively.
Execute: Eval
Below is the definition of the Eval function. For each of the nine cases listed in the table above, each has one to three lines of code, and the Eval function definition requires only these nine cases:
def eval (x, env=global_env): "Evaluate An expression in a environment."
If Isa (x, Symbol): # Variable reference return Env.find (x) [x] Elif not ISA (x, list): # constant literal return x Elif x[0] = = ' Quote ': # (Quote exp) (_, exp) = x return exp elif x[0] = = ' If ': # (if test conseq alt) (_
, test, CONSEQ, ALT) = x return eval ((Conseq if eval (test, env) else ALT), ENV) elif x[0] = = ' set! ': # (set! var exp) (_, var, exp) = x env.find (VAR) [var] = eval (exp, env) elif x[0] = = ' Define ': # (define VAR exp) (_, var, exp) = X Env[var] = eval (exp, env) elif x[0] = = ' lambda ': # (Lambda (var*) exp) (_, VARs, exp) = x return lambda *args: Eval (exp, env (VARs, args, env)) elif x[0] = = ' begin ': # (begin exp*) for exp in x[1:]: val = eval (exp, env) Retu RN val Else: # (proc exp*) exps = [Eval (exp, env) for exp in x] proc = exps.pop (0) return proc (*exps) isa = Isinstance Symbol = str
The eval function is defined so much ... Of course, except environments. Environments (environment) is just a mapping from the symbol to the value represented by the symbol. A new symbol/value binding is added by a define statement or a procedure definition (lambda expression).
Let's take an example to observe what happens when you define and then invoke a scheme process (the lis.py> prompt indicates that we are interacting with the Lisp interpreter, not Python):
Lis.py> (define area (lambda (r) (* 3.141592653 (* r)))
lis.py> (area 3)
28.274333877
When we evaluate (lambda (r) (* 3.141592653 (* r)), we execute the elif x[0] = = ' lambda ' branch in the eval function, assigning (_, VARs, exp) three variables to the corresponding elements of list x (if the length of X Not 3, throw an error). And then, we create a new process, and when that process is called, will evaluate to the expression [' * ', 3.141592653 [' * ', ' R ', ' R ']], the environment of the evaluation process (environment) is by passing the formal argument of the procedure (in this case, only one argument, R is bound to the actual arguments provided when the procedure is invoked, plus any variables in the current environment that are not in the parameter list (for example, variable *). The newly created procedure is assigned to an area variable in global_env.
So what happens when we evaluate (area 3)? Because area is not one of the symbols that represents a particular form, it must be a procedure call (the last else of the eval function: a branch), so the entire expression list will be evaluated each time one of them is evaluated. The evaluation of the area will get the process we just created; the result of the evaluation of 3 is 3. We then invoke the newly created procedure (based on the last line of the Eval function) with the argument list [3]. That is, the evaluation of exp (i.e. [' * ', 3.141592653 [' * ', ' R ', ' R ']]), and the value of r in the environment in which the evaluation is located is 3, and the external environment is the global environment, so * is the multiplication process.
Now, we can explain the details of the Env class:
Class Env (dict):
"An environment:a dict of {' var ': val} pairs, with a outer Env."
def __init__ (self, parms= (), args= (), Outer=none):
self.update (Zip (Parms,args))
Self.outer = Outer
def Find (Self, var):
"Find the innermost Env where Var appears."
Return self if var in self else Self.outer.find (Var)
Note that the Env is a subclass of Dict, that is to say, the usual dictionary operation also applies to the Env class. In addition, the class has two methods, constructor __init__ and find functions, which are used to locate the correct environment for a variable. Understanding the key to this class (and the fact that we need a class, not just the root cause of using dict) is the concept of the external Environment (outer environment). Consider the following procedure:
(Define Make-account
(Lambda (balance)
(Lambda (AMT)
(Begin (Set! balance (+ Balance amt)) Balance
))) (Define A1 (Make-account 100.00))
(a1-20.00)
Each rectangle box represents an environment, and the color of the rectangle corresponds to the color of the most recently defined variable in the environment. In the last two lines of the program we defined A1 and called (a1-20.00), which means creating a bank account with an account opening amount of USD 100 and then withdrawing 20 dollars. In the process of evaluation of (a1-20.00), we will evaluate the yellow highlight expression, which has three variables. AMT can be found directly in the most inner layer (green) environment. But balance is not defined in this environment: we need to look at the outer environment of the green environment, which is the blue environment. Finally, the + represented variables are not defined in either environment; we need to look further into the outer environment, which is the global (red) environment. Find the inner environment first, then look for the external environment in turn, we call this process the word legal boundary (lexical scoping). Procedure.find is responsible for finding the right environment according to the rules of the word.
The rest is to define the global environment. The environment needs to include the + process and all of the other scheme's built-in processes. We're not going to implement all the built-in processes, but by importing the Python math module, we can get a part of these processes, and then we can explicitly add 20 common processes:
def add_globals (env):
"Add some Scheme standard procedures to a environment."
Import math, operator as op
env.update (VARs (math)) # sin, sqrt, ...
Env.update (
{' + ': op.add, '-': op.sub, ' * ': Op.mul, '/': Op.div, ' not ': op.not_,
' > ': op.gt, ' < ': op.lt, ' >= ': op.ge, ' <= ': op.le, ' = ': Op.eq,
' equal? ': O P.eq, ' eq? ': o P.is_, ' length ': Len, ' cons ': Lambda x,y:[x]+y,
' car ': Lambda x:x[0], ' cdr ': Lambda x:x[1:], ' append ': Op.add,
' List ': Lambda *x:list (x), ' list? ': Lambda X:isa (x,list),
' null? ': Lambda x:x==[], ' symbol? ': Lambda X:isa (x, symbol) }) return
env
global_env = add_globals (env ())
PS1: One of the evaluations of Maxwell's equations is "generally, any electromagnetic phenomenon in the universe can be explained by this equation group". What Alan Kay wants to say is that the Lisp language uses its own definition of itself (Lisp is "defined in terms of Lisp") This bottom-up design has a general reference value for software design. --Translator's note
Parsing (parsing): Read and Parse
Next is the parse function. Parsing is usually divided into two parts: lexical analysis and grammatical analysis. The former decomposes the input string into a series of lexical units (token), which organizes the lexical unit into a middle representation. The lexical units supported by Lispy include parentheses, symbols (such as set! or x), and numbers (such as 2). It works in the following form:
>>> program = "(set! X*2 (* x 2)) "
>>> tokenize (program)
[' (', ' set! ', ' x*2 ', ' (', ' * ', ' ' X ', ' 2 ', ') ', '] '
>>> p Arse (program)
[' set! ', ' x*2 ', [' * ', ' X ', 2]]
There are a number of tools available for lexical analysis (e.g., Mike Lesk and Eric Schmidt's Lex). But we're going to use a very simple tool: Python's str.split. We just add spaces on both sides of the brackets (in the source program) and call Str.split to get a list of the lexical units.
Next comes the syntax analysis. We've seen that Lisp's syntax is simple. However, some Lisp interpreters allow you to accept any string representing the list as a program, making parsing work simpler. In other words, a string (set! 1 2) can be accepted as a syntactically valid program, and the interpreter will only complain when executing that the first parameter of the set! should be a symbol, not a number. In Java or Python, the equivalent statement of 1 = 2 will be identified as an error at compile time. On the other hand, Java and Python do not need to detect expression x/0 as an error at compile time, so as you can see, there is no strict rule as to when an error should be identified. Lispy uses the Read function to implement the parse function, which reads any expression (number, symbol, or nested list).
The
Tokenize function gets a series of lexical units that the read works by invoking the Read_from function on these lexical units. Given a list of lexical units, we first look at the first lexical unit; if it is a ', ' then this is a syntax error. If it's a ' (', then we start building an expression list until we read a matching ') '. All other (lexical units) must be symbols or numbers, which themselves constitute a complete list. The rest of the attention is to understand that ' 2 ' represents an integer, 2.0 represents a floating-point number, and x represents a symbol. We'll give python the job of distinguishing these things: for each lexical unit that is not a reference (quote), we first try to interpret it as an int, then try float, and finally try to interpret it as a symbol. According to these rules, we get the following program:
def read (s):
"read a Scheme expression from a string."
Return Read_from (Tokenize (s))
parse = Read
def tokenize (s):
"Convert a string into a list of tokens."
Return S.replace (', ' ('). Replace (') ', ') '). Split ()
def read_from (tokens):
"read a expression from a sequence of tokens. "
If Len (tokens) = = 0:
raise SyntaxError (' unexpected EOF while reading ')
token = tokens.pop (0)
if ' (' = = Toke N:
L = [] While
tokens[0]!= ':
l.append (Read_from (tokens))
tokens.pop (0) # pop off ') '
return L
elif ') ' = = token:
raise SyntaxError (' unexpected) ')
else: return
Atom (token)
def Atom (token):
"Numbers become Numbers; every other token is a symbol."
Try:return int (token)
except ValueError:
Try:return Float (token)
except ValueError:
return Symbol (token)
Finally, we're going to add a function to_string that converts an expression to a lisp-readable string, and a function repl, which represents the Read-eval-print-loop (read-evaluate-print loop), Used to form an interactive Lisp interpreter:
def to_string (exp):
"Convert a Python object back into a lisp-readable string."
Return ' (' + '. Join (Map (to_string, exp)) + ') ' If ISA (exp, list) Else STR (exp)
def repl (prompt= ' lis.py> '):
"A p Rompt-read-eval-print loop. "
While True:
val = eval (Parse (raw_input (Prompt)))
if Val are not None:print to_string (val)
Here is an example of function work:
>>> repl ()
lis.py> (define area (lambda (r) (* 3.141592653 (* r)))
lis.py> (area 3)
28.27433 3877
lis.py> (define fact (Lambda (n) (if (<= n 1) 1 (* N (Fact (-N 1)))
lis.py> (fact)
362880 0
lis.py> (fact)
9332621544394415268169923885626670049071596826438162146859296389521759999322991
5608941463976156518286253697920827223758251185210916864000000000000000000000000
lis.py> (Fact ))
4.1369087198e+13
lis.py> (define a)
lis.py> (define rest cdr)
lis.py> ( Define COUNT (lambda (item L) (if L (+ equal. Item (a)) (Count item (rest L)))
lis.py> (count 0 (list 0 1 2 3 0 0))
3
lis.py> (quote the) (quote (the more the merrier the bigger))
4
Lispy how small, how fast, how complete, how excellent?
We use the following criteria to evaluate Lispy:
* Compact: Lispy is very small: not including comments and blank lines, the source code is only 90 lines, and the volume is less than 4K. (smaller than the first version, with 96 lines in the first version--based on Eric Cooper's suggestion, I removed the procedure class definition and used Python's lambda instead.) I wrote the scheme interpreter in Java jscheme the smallest version, with a source code of 1664 lines and 57K. Jscheme was originally referred to as the scheme interpreter for Silk (scheme in fifty kilobytes--50kb), but only when the byte code is computed rather than the source code can I guarantee (its volume) is less than the minimum value. Lispy does much better; I think it satisfies Alan Kay's assertion in 1972 that we can use "one-page code" to define "the most powerful language in the world".
bash$ grep "^\s*[^#\s]" lis.py | WC 398 3423
* Efficient: Lispy calculation (fact 100) only takes 0.004 seconds. For me, this is fast enough (although much slower than other computational methods).
* Complete: Lispy is not very complete compared to the scheme standard. The main defects are:
(1) Syntax: missing annotations, reference (quote)/dereference (quasiquote) tags (i.e. ' and '-translator's note), #字面值 (such as #\a--translator), derived expression types (such as cond from If, or let from Lambda), And a list of points (dotted list).
(2) Semantics: lack of call/cc and tail recursion.
(3) Data type: missing string, character, Boolean, Port (ports), vector, exact/imprecise number. In fact, the Python list is much more like the scheme's vector than scheme's pairs and lists.
(4) Process: missing more than 100 basic processes: All processes associated with missing data types, as well as some other processes (such as set-car! and set-cdr!, because using the Python list, we cannot fully implement set-cdr!).
(5) Error recovery: Lispy did not attempt to detect errors, properly report errors, and recover from errors. Lispy wants the programmer to be perfect.
* Excellent: This needs to be determined by the reader. I think it's good enough for me to explain the Lisp interpreter's goal.
The True Story
It would be helpful to know how the interpreter works, and there is a story to support that view. In the 1984, I was writing my doctoral dissertation. There was no latex and Microsoft word--we were using Troff. Unfortunately, there is no forward reference mechanism for symbol labels in troff: I want to be able to compose "as we will see on the @theoremx page, and then write in the right place" @ (set Theoremx \n%) (The Troff register \n% saves the page number). My companion, graduate student Tony DeRose, also has the same needs, and together we have implemented a simple Lisp program that uses this program as a preprocessor to solve our problems. However, it turns out that the Lisp we used at the time was good at Reading Lisp expressions, but it was too inefficient to read a non-Lisp expression in one character at a time, so much so that our program was difficult to use.
Tony and I hold different views on this issue. He thought that the interpreter of the expression was a difficult part; he needed lisp to solve the problem for him, but he knew how to write a short C process to handle non-Lisp characters and know how to link it to the Lisp program. I don't know how to do this kind of link, but I think it's easy to write an interpreter for this simple language (it has just set variables, get variable values and string connections), so I wrote an interpreter using C. So, dramatically, Tony wrote a lisp program because he was a C programmer; I wrote a C program because I was a Lisp programmer.
In the end, we all finished our paper.
Entire interpreter
To summarize, here's all the code for Lispy (or download from lis.py):
#-*-Coding:utf-8-*-
# source code slightly. The following is purely entertainment ...
# Do you want to see the full source code? You really want to see it?
# really want to see you say, it is not the code I wrote, you want to see I can not always let you see, right?
# If you really want to see it, just refer to the above download address ... LOL