This article has two purposes: one is to tell the general method of implementing the computer language interpreter, and the other is to show how to use Python to realize a subset of the scheme of Lisp dialect. I refer to my interpreter as Lispy (lis.py). A few years ago, I introduced how to write a scheme interpreter using Java, and I also wrote a version using Common Lisp. This time, my goal was to demonstrate as simply and concisely as possible what Alan Kay described as "Maxwell's Equations of Software" (Maxwell's equations of software).
Syntax and semantics of scheme subsets supported by Lispy
Most computer languages have many grammatical conventions (such as keywords, infix operators, parentheses, operator precedence, point markers, semicolons, and so on), but, as a member of the Lisp language family, all of the scheme's syntaxes are based on a list of prefixes that are contained in parentheses. This representation may seem strange, but it has the advantage of being simple and consistent. (Some people dubbed "Lisp" as the abbreviation for "Lots of irritating Silly parentheses"--"a lot of annoying, stupid parentheses"--I think it's "Lisp is syntactically pure"--"Lisp syntax is pure" The abbreviation. Consider the following example:
/* Java */if (x.val () > 0) {z = f (a*x.val () + b);} /* Scheme */(if (> (Val x) 0) (set! Z (f (+ (* A (val x)) b)))
Note that the above exclamation point is not a special character in scheme; it's just "set!." Part of this name. (in scheme) only parentheses are special characters. A list similar to (set! x y) that begins with a special keyword is called a special form in Scheme (special form); The beauty of scheme is that we only need six special forms, and three other grammatical constructs-variables, constants, and procedure calls.
In this table, Var must be a symbol-an identifier like x or square--number must be an integer or floating-point number, and the rest of the words marked in italics can be any expression. Exp... Represents 0 or more occurrences of exp.
For more information on scheme, you can refer to some excellent books (such as Friedman and Fellesein, Dybvig,queinnec, Harvey and Wright or Sussman and Abelson), video ( Abelson and Sussman), tutorials (Dorai, PLT, or Neller), or reference manuals.
The role of the language interpreter
A language interpreter consists of two parts:
1. Parsing (parsing): The parsing part accepts an input program represented by a character sequence, validates the input program according to the grammar rules of the language, and then translates the program into an intermediate representation. In a simple interpreter, the middle representation is a tree structure that closely reflects the nesting structure of statements or expressions in the source program. In a language translator called a compiler, an internal representation is a series of instructions that can be executed directly by the computer (the author's intention is to say the runtime system, the translator's note). As Steve Yegge said, "If you don't understand how the compiler works, you don't understand how the computer works." "Yegge describes 8 types of problems that the compiler can solve (or an interpreter, or a typical ironic solution using Yegge). The parser for Lispy is implemented by the parse function.
2. Execution: The internal representation of the program (by the interpreter) is further processed according to the semantic rules of the language, and then the actual operation of the source program is performed. The execution part (Lispy) is implemented by the Eval function (Note that this function overrides Python's built-in function with the same name).
The following picture describes the interpreter flow, and the interactive session (after the picture) shows how the parse and eval functions operate on a small program:
Here, we use an internal representation as simple as possible, where scheme lists, numbers, and symbols are represented by Python's list, number, and string.
Execute: Eval
The following is the definition of the Eval function. For each of the nine cases listed in the table above, there are one to three lines of code, and the Eval function definition requires only these nine cases:
def eval (x, env=global_env): "Evaluate An expression in an environment." If Isa (x, Symbol): # variable reference r Eturn env.find (x) [x] Elif not ISA (x, list): # constant literal return x elif x[0] = = ' Quote ': # (Quote exp ) (_, exp) = x return exp elif x[0] = = ' If ': # (if test conseq alt) (_, test, Conseq, alt) = x return Eval ((Conseq if eval (test, env) else ALT), ENV) elif x[0] = = ' set! ': # (set! var exp) (_, var, exp) = x en V.find (Var) [var] = eval (exp, env) elif x[0] = = ' Define ': # (define VAR exp) (_, var, exp) = x Env[var] = Ev AL (exp, env) elif x[0] = = ' lambda ': # (Lambda (var*) exp) (_, VARs, exp) = x return lambda *args:eval (exp, Env (VARs, args, env)) elif x[0] = = ' begin ': # (Begin exp*) for exp in x[1:]: val = eval (exp, env) RET Urn val else: # (proc exp*) exps = [Eval (exp, env) for exp in x] proc = exps.pop (0) return proc (*exps) ISA = isinstance Symbol = str
The Eval function is defined as so many ... Of course, except environments. Environments (environment) is just a mapping of the values represented by symbols to symbols. A new symbolic/value binding is added by a define statement or a procedure definition (lambda expression).
Let's take a look at what happens when you define and invoke a scheme process (the lis.py> prompt indicates that we are interacting with the Lisp interpreter, not Python):
Lis.py> (define area (lambda (r) (* 3.141592653 (* r))) lis.py> (area 3) 28.274333877
When we evaluate (lambda (r) (* 3.141592653 (* r)), we execute the elif x[0] = = ' lambda ' branch in the eval function, assigning (_, VARs, exp) three variables to the corresponding element of the list x (if the length of X Not 3, it throws an error). We then create a new procedure that, when called, evaluates the expression [' * ', 3.141592653 [' * ', ' R ', ' R ']], and the Environment (environment) of the process is passed by the form parameter of the procedure (only one argument in this case, R ) is bound to the actual arguments provided by the procedure call, plus any variables in the current environment that are not in the argument list (for example, variable *). The newly created procedure is assigned to the area variable in global_env.
So what happens when we evaluate (area 3)? Because area is not one of the symbols that represents a particular form, it must be a procedure call (the last else: branch of the eval function), so the entire list of expressions will be evaluated, one at a time. Evaluating the area will get the process we just created, and the result of evaluating 3 is 3. We then use the parameter list [3] (based on the last line of the Eval function) to invoke the newly created procedure. That is, the value of exp (i.e. [' * ', 3.141592653 [' * ', ' R ', ' R ']) is evaluated, and the values of r in the environment where the evaluation is located are 3, and the external environment is the global environment, so * is the multiplication process.
Now, we can explain the details of the Env class:
Class Env (Dict): "An environment:a dict of {' var ': val} pairs, with a outer Env." Def __init__ (Self, parms= (), args= (), O Uter=none): self.update (Zip (Parms,args)) self.outer = Outer def find (Self, var): "Find the innermost ENV where Var appears. " Return self if var in self else Self.outer.find (Var)
Note that Env is a subclass of Dict, that is, the usual dictionary operation is also applicable to the Env class. In addition, the class has two methods, the constructor __init__ and the Find function, which are used to find the correct environment for a variable. Understanding the key to this class (and the underlying reason why we need a class rather than just using dict) lies in the concept of an external environment (outer environment). Consider the following procedure:
(Define Make-account (lambda (balance) (Lambda (AMT) ( Begin (set! balance (+ Balance amt) balance))) (Define A1 (make -account 100.00)) (a1-20.00)
Each rectangle represents an environment, and the color of the rectangle corresponds to the color of the newly defined variable in the environment. In the last two lines of the program we define the A1 and call (a1-20.00), which means to create a bank account with an opening amount of $100, then a withdrawal of $20. During a pair (a1-20.00) evaluation, we will evaluate the yellow highlight expression, which has three variables. AMT can be found directly in the most in-layer (green) environment. But balance is not defined in this environment: we need to look at the outer environment of the green environment, which is the blue environment. Finally, the variables represented by the + are not defined in either environment; we need to look further at the outer environment, which is the global (red) environment. Find the inner environment first, then look for the external environment in turn, we call this process the word legal community (lexical scoping). Procedure.find is responsible for finding the right environment according to the legal rules of the word.
The rest is to define the global environment. The environment requires a built-in process that includes a + process and all other scheme. We're not going to implement all of the built-in procedures, but by importing Python's math module we can get some of these processes, and then we can explicitly add 20 common processes:
def add_globals (env): "Add some Scheme standard procedures to an environment." Import math, operator as op env.update (VARs (math)) # sin, sqrt, ... env.update ( {' + ': op.add, '-': op.sub, ' * ': Op.mul, '/': Op.div, ' not ': op.not_, ' > ': op.gt, ' < ': op.lt, ' >= ': op.ge, ' <= ': op.le, ' = ': Op.eq, ' equal? ': O P.eq, ' eq? ': o P.is_, ' length ': Len, ' cons ': Lambda x,y:[x]+y, ' car ': Lambda x:x[0], ' cdr ': Lambda x:x[1:], ' append ': Op.add, ' List ': Lambda *x:list (x), ' list? ': Lambda X:isa (x,list), ' null ': Lambda x:x==[], ' symbol? ': Lambda X:isa (x, Symbol)} ) return env global_env = add_globals (env ())
PS1: An evaluation of Maxwell's equations is that "in general, any electromagnetic phenomenon between universes can be explained by this equation group." What Alan Kay is trying to say is that the Lisp language uses itself to define itself (Lisp was "defined in terms of Lisp") the bottom-up design has a universal reference value for software design. --Translator's note
Parsing (parsing): Read and Parse
Next is the parse function. Parsing is usually divided into two parts: lexical analysis and grammatical analysis. The former decomposes the input string into a series of lexical units (tokens), which organizes lexical units into an intermediate representation. Lexical units supported by Lispy include parentheses, symbols (such as set! or x), and numbers (such as 2). It works in the following form:
>>> program = "(set! X*2 (* x 2)) ">>> tokenize (program) [' (', ' set! ', ' x*2 ', ' (', ' * ', ' x ', ' 2 ', ') ', ') '] >>> parse [' set! ', ' x*2 ', [' * ', ' X ', 2]]
There are many tools for lexical analysis (e.g. Mike Lesk and Eric Schmidt's Lex). But we're going to use a very simple tool: Python's str.split. We just add spaces around the brackets (in the source program), and then call Str.split to get a list of lexical units.
Next comes the syntax analysis. We've seen that Lisp's syntax is simple. However, some Lisp interpreters allow you to accept any string representing the list as a program, making parsing easier. In other words, the string (set! 1 2) can be accepted as a syntactically valid program, and only when executed will the interpreter complain that the first parameter of set! should be a symbol, not a number. In Java or Python, the equivalent of statement 1 = 2 will be considered an error at compile time. On the other hand, Java and Python do not need to detect the expression at compile time x/0 is an error, so, as you can see, when an error should be recognized and there is no strict rules. Lispy uses the Read function to implement the parse function, which reads any expression (numbers, symbols, or nested lists).
The
Tokenize function obtains a series of lexical units, and read works by invoking the Read_from function on those lexical units. Given a list of lexical units, we first look at the first lexical unit; if it is a ') ', then this is a syntax error. If it's a ' (', then we start building an expression list until we read a matching ') '. All other (lexical units) must be symbols or numbers, which themselves form a complete list. The rest of the note is to understand that ' 2 ' represents an integer, 2.0 represents a floating-point number, and x represents a symbol. We'll give Python the distinction between these situations: for each lexical unit that is not a parenthesis or a reference (quote), we first try to interpret it as an int, then try to float, and finally try to interpret it as a symbol. According to these rules, we have the following procedure:
def read (s): "read a Scheme expression from a string." Return Read_from (Tokenize (s)) parse = Read Def tokenize (s): "Conver T a string into a list of tokens. "Return S.replace (' (', ' (') ' (') ' (') ' (') ' (') ' (') '." Split () def read_from (tokens): "Read an Expression from a sequence of tokens. "If Len (tokens) = = 0: raise SyntaxError (' unexpected EOF while reading ') token = Tokens.pop (0) if ' (' = = token: L = [] while tokens[0]! = ') ': l.append (Read_from (tokens)) tokens.pop (0) # pop off ') ' return L elif ') ' = = token: raise SyntaxError (' unexpected) ') Else : Return Atom (token) def atom ( Token): "Numbers become Numbers; Every other token is a symbol. "Try:return int (token) except ValueError: Try:return Float (token) except Valueerr Or: return Symbol (token)
Finally, we're going to add a function to_string that converts an expression to a lisp-readable string, and a function repl that represents read-eval-print-loop (read-evaluate-print loop), To form an interactive Lisp interpreter:
def to_string (exp): "Convert a Python object back into a lisp-readable string." Return "(' + ') '. Join (Map (to_string, exp)) + ' ) ' If ISA (exp, list) Else STR (exp) def repl (prompt= ' lis.py> '): "A Prompt-read-eval-print loop." While True: val = Eval (Parse (raw_input (Prompt))) if Val is not None:print To_string (val)
Here is an example of how the function works:
>>> repl () lis.py> (define area (lambda (r) (* 3.141592653 (* r))) lis.py> (area 3) 28.274333877lis.py> (Define fact (Lambda (n) (if (<= n 1) 1 (* N (Fact (-N 1))))) lis.py> (FACT) 3628800lis.py> (fact 100) 93326215 4439441526816992388562667004907159682643816214685929638952175999932299156089414639761565182862536979208272237582511852109 16864000000000000000000000000lis.py> (area Fact) 4.1369087198e+13lis.py> (define first car) lis.py> ( Define rest CDR) lis.py> (define COUNT (lambda (item L) (if L (+ (equal? item (first L)) (Count item (rest l)))) (0)) Lis. Py> (count 0 (list 0 1 2 3 0 0)) 3lis.py> (count (quote the) (quote (the more the merrier the bigger the better)) 4
How small, how fast, how complete, how good is Lispy?
We use the following criteria to evaluate Lispy:
* Small: Lispy is very small: it does not include comments and blank lines, its source code is only 90 lines, and the volume is less than 4K. (smaller than the first version, with 96 lines in the first version – I removed the procedure class definition and used Python's lambda instead, as suggested by Eric Cooper.) I write the scheme interpreter in Java jscheme the smallest version, its source code also has 1664 lines, 57K. Jscheme was originally called the Scheme interpreter of silk (scheme in fifty kilobytes--50kb), but only when I counted the bytecode instead of the source code did I guarantee (its volume) to be less than that minimum. Lispy is doing much better; I think it satisfies Alan Kay's assertion in 1972 that he claims that we can use "one page code" to define "the most powerful language in the world".
bash$ grep "^\s*[^#\s]" lis.py | WC 398 3423
* Efficient: Lispy calculation (fact 100) takes only 0.004 seconds. For me, this is fast enough (though much slower than other calculations).
* Complete: Lispy is not very complete compared to the scheme standard. The main defects are:
(1) Syntax: lack of annotations, references (quote)/anti-reference (quasiquote) tags (i.e. ' and '--translator's note), #字面值 (e.g. #\a--translator note), derived expression type (e.g. cond derived from if, or let from Lambda), And a list of points (dotted list).
(2) Semantics: lack of call/cc and tail recursion.
(3) Data type: missing string, character, Boolean, Port (ports), vector, exact/non-exact number. In fact, compared to the scheme's pairs and list, the Python list is more like a scheme vector.
(4) Process: No more than 100 basic processes are missing: All processes related to missing data types, and some other processes (such as set-car! and set-cdr!, because using the Python list, we are not able to fully implement set-cdr!).
(5) Error recovery: Lispy does not attempt to detect errors, reasonably report errors, and recover from errors. Lispy wants the programmer to be perfect.
* Excellent: This needs to be determined by the reader himself. I think it's good enough compared to the goal of explaining the Lisp interpreter.
The True Story
It would be helpful to know how the interpreter works, and there is a story to support this view. In 1984, I was writing my doctoral dissertation. There was no latex and Microsoft word--we were using Troff. Unfortunately, there is no forward reference mechanism for symbol labels in troff: I want to be able to compose "as we will see on the @theoremx page" and then write "@ (set Theoremx \n%)" in the right place. (Troff register \n% saved the page number). My companion, graduate student Tony DeRose also has the same needs, we have implemented a simple Lisp program, using this program as a preprocessor to solve our problems. However, it turns out that the Lisp we used at the time was good at Reading Lisp expressions, but it was too inefficient to read non-Lisp expressions one character at a time, so that our program was difficult to use.
Tony and I hold different views on this issue. He thinks the interpreter of the expression is a difficult part; he needs lisp to solve the problem for him, but he knows how to write a short C procedure to handle non-Lisp characters and know how to link it to a Lisp program. I don't know how to make this kind of link, but I think it's easy to write an interpreter for this simple language (which has just set variables, get variable values, and string connections), so I wrote an interpreter using C language. So, dramatically, Tony wrote a lisp program because he was a C programmer; I wrote a C program because I was a Lisp programmer.
In the end, we all finished our paper.
Entire interpreter
To summarize, here's all the code for Lispy (also available for download from lis.py):
#-*-Coding:utf-8-*-# source code slightly. The following is purely entertainment ... # Do you want to see the full source code? Really want to see? # really want to see you say, is not the code I wrote, you want to see I can not let you see, right? # If you really want to see it, refer to the above download address ... LOL