"Translator" uses Python to write the virtual machine interpreter

Source: Internet
Author: User
Tags mul

Original address: Making a simple VM interpreter in Python

Update: I made minor changes to the code according to your comments. Thanks to Robin-gvx, bs4h and Dagur, see here for specific codes

The stack machine itself does not have any registers, it puts all the values it needs to process into the stack and then handles them. Stack machine is simple but powerful, which is why God horse Python,java,postscript,forth and other languages choose it as their own virtual machine.

First, let's talk about stacks first. We need an instruction pointer stack to hold the return address. So when we call a subroutine (like calling a function) we can go back to where we started calling. We can do this with the self-modifying code, just like the MIX self-modifying Donald Knuth started. But if you do, you have to maintain the stack yourself to ensure that recursion works. In this article, I'm not actually implementing subroutine calls, but it's not hard to implement it (consider implementing it as an exercise).

You will save a lot of things after you have the stack. For example, consider an expression like this (2+3)*4 . Stack Machineon the code that is equivalent to this expression 2 3 + 4 * . First, put 2 and 3 push into the stack, followed by the operator + , at this point the stack pops the two values, and then the results of the two added to the stack again. The heap is then put 4 on the stack, and then two values are popped up, and then the result of multiplying them is re-entered into the stack. How simple!

Let's start by writing a simple stack class. Let this class inherit collections.deque :

from collections import dequeclass Stack(deque):push = deque.appenddef top(self):    return self[-1]

Now we have push , pop and top these three methods. topmethod is used to view the top element of the stack.

Next, we implement the virtual machine class. In the virtual machine we need two stacks and some memory space to store the program itself (Translator Note: The program here is understood below). Thanks to the dynamic type of Pyhton we can put any type into the list. The only problem is that we can't tell what strings are and which are built-in functions. The correct approach is to put only the real Python functions into the list. I may be able to achieve this in the future.

We also need an instruction pointer pointing to the next code to execute in the program.

class Machine:def __init__(self, code):    self.data_stack = Stack()    self.return_addr_stack = Stack()    self.instruction_pointer = 0    self.code = code

At this point we add some handy functions to save you from hitting the keyboard later.

def pop(self):    return self.data_stack.pop()def push(self, value):    self.data_stack.push(value)def top(self):    return self.data_stack.top()

Then we add a dispatch function to do the work of each opcode (we don't really use the opcode, just dynamically unfold it, you know). First, add the necessary loops for an interpreter:

def run(self):    while self.instruction_pointer < len(self.code):        opcode = self.code[self.instruction_pointer]        self.instruction_pointer += 1        self.dispatch(opcode)

As you can see, the goods have to do a good thing, that is, to get the next instruction, let the instruction pointer self-increment, and then according to the operation code processing separately. dispatchthe code for the function is a little bit longer.

  def dispatch (self, op): Dispatch_map = {"%": self.mod, "*": Self.mul, "+ ": Self.plus,"-": Self.minus,"/": self.div," = = ": Self.eq," cast_ int ": Self.cast_int," cast_str ": Self.cast_str," drop ": Self.drop," dup ": Self.dup,"        If ": self.if_stmt," jmp ": self.jmp," Over ": Self.over," print ": Self.print_,     "println": self.println, "read": Self.read, "stack": Self.dump_stack, "swap": Self.swap, If op in Dispatch_map:dispatch_map[op] () elif isinstance (OP, Int.): # Push numbers on the data s Tack Self.push (OP) elif isinstance (OP, str) and op[0]==op[-1]== ' "': # Push quoted strings on the data STA CK Self.push (op[1:-1]) else:raise runtimeerror ("Unknown opcode: '%s '"% op)  

Basically, this code is only based on the operation code to find that there are corresponding processing functions, such as correspondence, correspondence * self.mul drop self.drop , dup corresponding self.dup . By the way, the code you see here is essentially a simple version Forth . Also, the Forth language is worth looking at.

In short pinch, it once found the opcode is * the words directly call self.mul and execute it. Just like this:

def mul(self):    self.push(self.pop() * self.pop())

Other functions are similar to this. If we dispatch_map do not find the corresponding operation function in, we first check that he is not a numeric type, if it is the case directly into the stack, if it is quoted in the string is the same processing-directly into the stack.

As of now, congratulations, a virtual machine is complete.

Let's define more actions and then write the program using the virtual machine and P-code language that we just completed.

# Allow to use "print" as a name for our own method:from __future__ import print_function# ...def plus(self):    self.push(self.pop() + self.pop())def minus(self):    last = self.pop()    self.push(self.pop() - last)def mul(self):    self.push(self.pop() * self.pop())def div(self):    last = self.pop()    self.push(self.pop() / last)def print(self):    sys.stdout.write(str(self.pop()))    sys.stdout.flush()def println(self):    sys.stdout.write("%s\n" % self.pop())    sys.stdout.flush()

Let's use our virtual machine to write an example with the print((2+3)*4) equivalent effect.

Machine ([2, 3, "+", 4, "*", "println"]). Run ()
You can try to run it.

Now introduce a new operation jump , that is, the go-to operation

def jmp(self):    addr = self.pop()    if isinstance(addr, int) and 0 <= addr < len(self.code):        self.instruction_pointer = addr    else:        raise RuntimeError("JMP address must be a valid integer.")

It only changes the value of the instruction pointer. Let's see how the branch jump is done.

def if_stmt(self):    false_clause = self.pop()    true_clause = self.pop()    test = self.pop()    self.push(true_clause if test else false_clause)

This is also very straightforward. If you want to add a conditional jump, you just have to do test-value true-value false-value IF JMP it simply. (Branching is a very common operation, and many virtual machines provide operations like JNE this.) JNEis jump if not equal the abbreviation).

The following program requires the user to enter two numbers and then print out their sum and product.

Machine([‘"Enter a number: "‘, "print", "read", "cast_int",‘"Enter another number: "‘, "print", "read", "cast_int","over", "over",‘"Their sum is: "‘, "print", "+", "println",‘"Their product is: "‘, "print", "*", "println"]).run()

over, read and cast_int These three operations are long such a drop:

def cast_int(self):    self.push(int(self.pop()))def over(self):    b = self.pop()    a = self.pop()    self.push(a)    self.push(b)    self.push(a)def read(self):    self.push(raw_input())

The following procedure requires the user to enter a number and then print out whether the number is odd or even.

Machine([‘"Enter a number: "‘, "print", "read", "cast_int",‘"The number "‘, "print", "dup", "print", ‘" is "‘, "print",2, "%", 0, "==", ‘"even."‘, ‘"odd."‘, "if", "println",0, "jmp" # loop forever!]).run()

Here's a little exercise for you to implement: Add call and return these two opcode. callThe opcode will do the following: Push the current address into the return stack, and then call it self.jmp() . The opcode return will do the following: Return the stack, give the value of the stack element to the instruction pointer (this value allows you to jump back or return from the call call). When you are done with these two commands, your virtual machine can invoke the subroutine.

A simple parser

Create a small language that mimics the above program. We will compile it into our machine code.

 import tokenize from StringIO import StringIO# ...def parse(text):tokens =   tokenize.generate_tokens(StringIO(text).readline)for toknum, tokval, _, _, _ in tokens:    if toknum == tokenize.NUMBER:        yield int(tokval)    elif toknum in [tokenize.OP, tokenize.STRING, tokenize.NAME]:        yield tokval    elif toknum == tokenize.ENDMARKER:        break    else:        raise RuntimeError("Unknown token %s: ‘%s‘" %                (tokenize.tok_name[toknum], tokval))
A simple optimization: constant folding

Constant folding (Constant folding) is an example of a peep hole optimization (peephole optimization), which means that you can do some predictable work for some obvious snippets of code during compilation. For example, it is 2 3 + easy to implement this optimization for mathematical expressions that involve constants.

def constant_fold(code):"""Constant-folds simple mathematical expressions like 2 3 + to 5."""while True:    # Find two consecutive numbers and an arithmetic operator    for i, (a, b, op) in enumerate(zip(code, code[1:], code[2:])):        if isinstance(a, int) and isinstance(b, int)                 and op in {"+", "-", "*", "/"}:            m = Machine((a, b, op))            m.run()            code[i:i+3] = [m.top()]            print("Constant-folded %s%s%s to %s" % (a,op,b,m.top()))            break    else:        breakreturn code

The only problem with constant folding is that we have to update the jump address, but in many cases it's hard to do (for example: test cast_int jmp ). There are a number of workarounds for this problem, and one simple way to do this is to just jump to a named tag in the program and then parse out their real address after optimization.

If you do, that is, the Forth words function, you can do more optimizations, such as deleting program code that may never be used (dead code elimination)

Repl

We can create a simple PERL, just like this.

def repl():print(‘Hit CTRL+D or type "exit" to quit.‘)while True:    try:        source = raw_input("> ")        code = list(parse(source))        code = constant_fold(code)        Machine(code).run()    except (RuntimeError, IndexError) as e:        print("IndexError: %s" % e)    except KeyboardInterrupt:        print("\nKeyboardInterrupt")

Test our REPL with a few simple procedures.

> 2 3 + 4 * printlnConstant-folded 2+3 to 5Constant-folded 5*4 to 2020> 12 dup * println144> "Hello, world!" dup println printlnHello, world!Hello, world!你可以看到,常量折叠看起来运转正常。在第一个例子中,它把整个程序优化成这样 20 println。
Next

Once you've added call and finished return , you can let the user define their own functions. In forth functions are called words, they begin with a colon followed by a name and then end with a semicolon. For example, an integer squared word is long like this drop

: square dup * ;

In fact, you can try to put this paragraph in a program, like Gforth.

$ gforthGforth 0.7.3, Copyright (C) 1995-2008 Free Software Foundation, Inc.Gforth comes with ABSOLUTELY NO WARRANTY; for details type `license‘Type `bye‘ to exit: square dup * ;  ok12 square . 144  ok

You can support this by discovering it in the parser : . Once you have found a colon, you must record its name and its address (for example, the location in the program) and insert them into the symbol table. For simplicity, you can even put the entire function's code (including semicolons) in the dictionary, for example:

symbol_table = {"square": ["dup", "*"]# ...    }

When you have completed the work of parsing, you can connect your program: Traverse the entire main program and look for custom functions in the symbol table. Once you find one and it does not appear behind the main program, you can attach it to the main program behind it. Then replace it with the <address> call square address where the <address> function is inserted.

In order to ensure that the program performs properly, you should consider culling jmp operations. Otherwise, you have to parse them. It does, but you have to save them in the order in which they are written by the user. For example, if you want to move between subroutines, you need to be extra careful. You may need to add exit a function to stop the program (you may need to tell the operating system to return a value) so that the main program does not continue to run to the subroutine.

In fact, a good program space layout is likely to treat the main program as a main subroutine named. Or you decide what to look like.

As you can see, this is all very interesting, and through this process you have learned a lot about code generation, links, program space layout related knowledge.

More things to do.

You can use the Python bytecode generation library to try to make the virtual machine code native Python bytecode. or run the Java implementation on top of the JVM so you can freely use jiting.

Similarly, you can also try the register machine. You can try to implement the call stack with stack frames, and build the call session based on this.

Finally, if you don't like a language like Forth, you can create a custom language that runs on top of this virtual machine. For example, you can convert (2+3)*4 an infix expression like this to 2 3 + 4 * generate code. You can also allow C-style blocks of code { ... } so that statements if ( test ) { ... } else { ... } will be translated into

<true/false test><address of true block><address of false block>ifjmp<true block><address of end of entire if-statement> jmp<false block><address of end of entire if-statement> jmp

Example

Address  Code-------  ---- 0       2 3 > 3       7        # Address of true-block 4       11       # Address of false-block 5       if 6       jmp      # Conditional jump based on test# True-block7     "Two is greater than three."  8       println9       15       # Continue main program10       jmp# False-block ("else { ... }")11       "Two is less than three."12       println13       15       # Continue main program14       jmp# If-statement finished, main program continues here15       ...

Yes, you also need to add a comparison operator != < <= > >= .

I have implemented these things in my C + + stack machine, you can refer to the following.

I've got the code presented here as a project Crianza, which uses more optimized and experimental models to compile the program into Python bytecode.

Good luck!

The complete code

Here are the full code, compatible with Python 2 and Python 3

You can get it through here.

#!/usr/bin/env python# coding:utf-8 "" A simple VM interpreter. Code from the post at Http://csl.name/post/vm/This version should work on both Python 2 and 3. "" " From __future__ Import print_functionfrom collections import dequefrom io import stringioimport sysimport tokenizedef get_ Input (*args, **kw): "" "Read a string from the standard input." " If sys.version[0] = = "2": Return Raw_input (*args, **kw) Else:return input (*args, **kw) class Stack (deque):p Ush = dequ E.appenddef Top (self): return self[-1]class machine:def __init__ (Self, code): Self.data_stack = Stack () self.retu Rn_stack = Stack () self.instruction_pointer = 0 Self.code = codedef pop (self): return Self.data_stack.pop () def PU SH (self, Value): Self.data_stack.push (value) def top (self): return Self.data_stack.top () def run (self): when self. Instruction_pointer < Len (self.code): opcode = Self.code[self.instruction_pointer] Self.instruction_point ER + = 1 self.dispatch (opcode) def dispatch (Self, op): Dispatch_map = {"%": self.mod, "*": Self.mul, "+": Self.plus,        "-": Self.minus, "/": self.div, "= =": Self.eq, "Cast_int": Self.cast_int,        "Cast_str": Self.cast_str, "drop": Self.drop, "dup": Self.dup, "Exit": Self.exit,        "If": self.if_stmt, "jmp": self.jmp, "Over": Self.over, "print": Self.print, "println": self.println, "read": Self.read, "stack": Self.dump_stack, "swap": self. Swap,} If op in Dispatch_map:dispatch_map[op] () elif isinstance (OP, int): Self.push (OP) # push Nu Mbers on Stack elif isinstance (OP, str) and op[0]==op[-1]== ' "': Self.push (Op[1:-1]) # Push quoted strings on STA CK else:raise runtimeerror ("Unknown opcode: '%s '"% op) # OPERATIONS Follow:def plus (self): Self.push (Self.po P () + Self.pop ()) def ExiT (self): sys.exit (0) def minus (self): last = Self.pop () Self.push (Self.pop ()-last) def mul (self): Self.push (SE Lf.pop () * Self.pop ()) def div (self): last = Self.pop () Self.push (Self.pop ()/last) def mod (self): last = Self.pop () Self.push (Self.pop ()% last) def dup (self): Self.push (Self.top ()) def over (self): b = Self.pop () a = Self.pop () Self.push (a) Self.push (b) Self.push (a) def drop (self): Self.pop () def swap (self): b = Self.pop () a = SE Lf.pop () Self.push (b) Self.push (a) def print (self): Sys.stdout.write (str (Self.pop ())) Sys.stdout.flush () def pri Ntln (self): Sys.stdout.write ("%s\n"% Self.pop ()) Sys.stdout.flush () def read (self): Self.push (Get_input ()) def CAs T_int (self): Self.push (Int. (Self.pop ())) def cast_str (self): Self.push (str (self.pop)) def eq (self): Self.push (self . Pop () = = Self.pop ()) def if_stmt (self): False_clause = Self.pop () True_clause = Self.pop () test = Self.pop () s Elf.push (True_clause ifTest else False_clause) def jmp (self): addr = Self.pop () if isinstance (addr, int) and 0 <= addr < len (Self.code ): Self.instruction_pointer = addr else:raise RuntimeError ("JMP address must be a valid integer.")  def dump_stack (self): print ("The Data stack (top first):") for V in Reversed (self.data_stack): Print ("-type%s, Value '%s ' "% (type (v), v)) def parse (text): # Note that the Tokenizer module was intended for parsing Python source# code, So if you ' re going to expand on the parser, you may have to use# another tokenizer.if sys.version[0] = = "2": Stream = S Tringio (Unicode (text)) Else:stream = Stringio (text) tokens = Tokenize.generate_tokens (stream.readline) for Toknum, Tokval, _, _, _ in tokens:if Toknum = = tokenize. Number:yield Int (tokval) elif toknum in [Tokenize. OP, Tokenize. STRING, Tokenize.name]: yield tokval elif toknum = = tokenize. Endmarker:break else:raise runtimeerror ("Unknown token%s: '%s '" % (Tokenize.tok_name[toknum], tokval)) def constant_fold (code): "" "Constant-folds Simple Mathematical Expres Sions like 2 3 + to 5. "" " While True: # Find II consecutive numbers and an arithmetic operator for I, (A, B, op) in enumerate (Zip (code, code[            1:], code[2:]): If Isinstance (A, int) and isinstance (b, int) and op in {"+", "-", "*", "/"}: m = Machine ((A, B, op)) M.run () code[i:i+3] = [M.top ()] Print ("constant-folded%s%s %s to%s "% (A,op,b,m.top ())) Break Else:break return codedef repl ():p rint (' hit CTRL +d or type "exit" to quit. ')         While True:try:source = Get_input (">") code = List (parse (source)) code = Constant_fold (code) Machine (code). Run () except (RuntimeError, Indexerror) as E:print ("Indexerror:%s"% e) except Keyboa Rdinterrupt:print ("\nkeyboardinterrupt") def test (code = [2, 3, "+", 5, "*", "println")]):p rint ("Code before optimization:%s"% str (code)) optimized = Constant_fold (code) print ("Code after optimization:%s"% s TR (optimized)) print ("Stack after running original program:") A = Machine (code) A.run () A.dump_stack () print ("Stack after Running optimized program: ") b = Machine (optimized) B.run () b.dump_stack () result = A.data_stack = B.data_stackprint (" Result:%s "% (" OK "if result else" FAIL ")) return Resultdef examples ():p rint (" * * Program 1:runs the code for ' Print (2+3) ") Machine ([2, 3," + ", 4," * "," println "]). Run () print (" \n** program 2:ask for numbers, computes sum and product. ") Machine ([' "Enter a number:" ', "print", "read", "Cast_int", ' "Enter another number:" ', "print", "read", "Cast_int" , "Over", "over", ' "Their sum are:" ', "" print "," + "," println ", '" Their product is: "'," print "," * "," println "]). Run () print ("\n** program 3:shows branching and looping (use Ctrl+d to exit)") Machine ([' "Enter A number:" ', "print", "read", "Cast_int", ' "The number", "Print", "DUP", "print", "is", "print", 2, "%", 0, "= =", ' "even" ', ' "" Odd "'," if "," println ", 0," jmp "# loop Forever!]).            Run () if __name__ = = "__main__": Try:if len (sys.argv) > 1:cmd = sys.argv[1] if cmd = = "Repl": REPL () elif cmd = = "Test": Test () examples () else:print ("Commands:repl , test ") Else:repl () except Eoferror:print (" ")

This article is compiled and collated by ONEAPM engineers. ONEAPM is the emerging leader in China's basic software industry, helping enterprise users and developers easily implement slow program code and real-time crawling of SQL statements. To read more technical articles, please visit the ONEAPM Official technology blog.

"Translator" uses Python to write the virtual machine interpreter

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.