"Translator" uses Python to write the virtual machine interpreter

Source: Internet
Author: User
Tags mul stack pop

Original address: [Making a simple VM interpreter in Python] (https://csl.name/post/vm/) * * Update: I made slight changes to the code according to the comments. Thanks to Robin-gvx, bs4h and Dagur, the specific code see [here] (HTTPS://GITHUB.COM/CSLARSEN/PYTHON-SIMPLE-VM) **stack machine itself does not have any registers, It puts all the values that need to be processed into the stack and then processes them. Stack machine is simple but powerful, which is why God horse Python,java,postscript,forth and other languages choose it as their own virtual machine. First, let's talk about stacks first. We need an instruction pointer stack to hold the return address. So when we call a subroutine (like calling a function) we can go back to where we started calling. We can use the self-modifying code ([self-modifying code] (Https://en.wikipedia.org/wiki/Self-modifying_code)) to do this, as Donald Knuth initiated the [MIX ] (Https://en.wikipedia.org/wiki/MIX). But if you do, you have to maintain the stack yourself to ensure that recursion works. In this article, I'm not actually implementing subroutine calls, but it's not hard to implement it (consider implementing it as an exercise). You will save a lot of things after you have the stack. For example, consider such an expression ' (2+3) '. The code equivalent to this expression on ' Stack machine ' is ' 2 3 + 4 * '. First, the ' 2 ' and ' 3 ' are pushed into the stack, followed by the operator ' + ', at which point the stack pops the two values, and then the results of the two combined into the stack again. Then put ' 4 ' into the heap, then let the stack pop up two values, and then re-multiply the results of the stack. How simple! Let's start by writing a simple stack class. Let this class inherit ' Collections.deque ': From collections import Dequeclass Stack (deque): push = Deque.append def top (self): return self [-1] Now we have the three methods of ' push ', ' pop ' and ' top '. ' Top' method to view the top element of the stack. Next, we implement the virtual machine class. In the virtual machine we need two stacks and some memory space to store the program itself (Translator Note: The program here is understood below). Thanks to the dynamic type of Pyhton we can put any type into the list. The only problem is that we can't tell what strings are and which are built-in functions. The correct approach is to put only the real Python functions into the list. I may be able to achieve this in the future. We also need an instruction pointer pointing to the next code to execute in the program. Class Machine:def __init__ (Self, code): Self.data_stack = Stack () Self.return_addr_stack = Stack () self.instruction_poin ter = 0 Self.code = Code at this point, we add some handy functions to save you from hitting the keyboard more later. def pop (self): return Self.data_stack.pop () def push (self, value): Self.data_stack.push (value) def top (self): return self . data_stack.top () Then we add a ' dispatch ' function to do every opcode (we don't really use the opcode, just dynamically expand it, you know). First, add a loop necessary for an interpreter: def run: While Self.instruction_pointer < Len (self.code): opcode = self.code[ Self.instruction_pointer] Self.instruction_pointer + = 1 self.dispatch (opcode) As you can see, the goods have to do a good thing, that is, to get the next instruction, let the instruction pointer self-increment, It is then processed separately according to the operation code. The code for the ' dispatch ' function is a little bit longer. Def dispatch (Self, op): Dispatch_map = {"%": self.mod, "*": Self.mul, "+": Self.plus, "-": Self.minus, "/": self.div, "= = ": Self.eq," Cast_int ": Self.cast_int," CAsT_str ": Self.cast_str," drop ": Self.drop," dup ": Self.dup," If ": self.if_stmt," jmp ": self.jmp," Over ": Self.over," print " : Self.print_, "println": self.println, "read": Self.read, "stack": Self.dump_stack, "swap": Self.swap,} if op in DISPATC H_MAP:DISPATCH_MAP[OP] () elif isinstance (OP, int): # Push numbers on the data stack Self.push (OP) elif isinstance (OP, str ) and op[0]==op[-1]== ' "': # Push quoted strings on the data stack Self.push (op[1:-1]) else:raise runtimeerror (" Unknown op Code: '%s ' "% op) Basically, this code is only based on the operation code to find that there are corresponding processing functions, such as ' * ' corresponds to ' self.mul ', ' drop ' corresponds to ' self.drop ', ' dup ' corresponds to ' self.dup '. By the way, the code you see here is essentially a simple version of ' Forth '. Also, the ' Forth ' language is worth looking at. In short pinch, it once found the opcode is ' * ', then directly call ' Self.mul ' and execute it. Like this: Def mul (self): Self.push (Self.pop () * SELF.POP ()) other functions are similar. If we do not find the corresponding operation function in ' Dispatch_map ', we first check that he is not a numeric type, and if so, directly into the stack, and if it is quoted as a string, the same processing-directly into the stack. As of now, congratulations, a virtual machine is complete. Let's define more actions and then write the program using our newly completed virtual machine and [P-code] (https://en.wikipedia.org/wiki/P-code_machine) language. # Allow to use ' print ' as a name for our OWN method:from __future__ Import print_function # ... def plus (self): Self.push (Self.pop () + Self.pop ()) def-Minus (self): Last = Self.pop () Self.push (Self.pop () – last) def mul (self): Self.push (Self.pop () * Self.pop ()) def div (self): last = SE Lf.pop () Self.push (Self.pop ()/last) def print (self): Sys.stdout.write (str (Self.pop ())) Sys.stdout.flush () def println (self): Sys.stdout.write ("%s\n"% Self.pop ()) Sys.stdout.flush () Let's use our virtual machine to write an example of the equivalent effect of ' print ((2+3) ') '. Machine ([2, 3, "+", 4, "*", "println"]). Run () You can try to run it. Now introduce a new operation ' jump ', the ' go-to ' Operation def jmp (self): addr = Self.pop () if isinstance (addr, int) and 0 <= addr < len (SELF.C ODE): Self.instruction_pointer = addr else:raise RuntimeError ("JMP address must be a valid integer.") It only changes the value of the instruction pointer. Let's see how the branch jump is done. def if_stmt (self): False_clause = Self.pop () True_clause = Self.pop () test = Self.pop () self.push (true_clause if test else False_clause) This is also very straightforward. If you want to add a conditional jump, you just have to simply execute ' test-value true-value false-value if JMP ' on it. (Branching is a very common operation, many virtual machines provide operations like ' JNE '. ' JNE ' is an abbreviation for ' Jump if not equal '. The following program requires the user to enter two numbers and then print out their sum and product. Machine ([' "Enter a number:" ', "print", "read", "Cast_int", ' "Enter another number:" ', "print", "read", "Cast_int", "ove R "," Over ", '" Their sum are: "'," "print", "+", "println", "Their product is:" ', "print", "*", "println"]). Run () ' over ', ' re The three operations of ad ' and ' cast_int ' are long such drops: def cast_int (self): self.push (int (Self.pop ())) def over (self): b = Self.pop () a = Self.pop () Self.push (a) Self.push (b) Self.push (a) def read (self): Self.push (Raw_input ()) The following procedure requires the user to enter a number and then print out whether the number is odd or even. Machine ([' "" Enter A Number: "'," print "," read "," Cast_int "," The Number "'," print "," DUP "," print ", '" is "'," print ", 2, "%", 0, "= =", ' "even." ', ' "odd." ', "if", "println", 0, "jmp" # loop forever!]). Run () Here's a little exercise for you to do: Add the two opcode ' call ' and ' return '. The ' call ' opcode will do the following: Push the current address into the return stack and Invoke ' self.jmp () '. The ' return ' opcode will do the following: Return the stack, give the value of the stack element to the instruction pointer (this value will allow you to jump back or return from the call). When you are done with these two commands, your virtual machine can invoke the subroutine. # #一个简单的解析器创造一个模仿上述程序的小型语言. We're going to compile it into myTheir machine code. Import tokenize from Stringio import Stringio # ... def parse (text): tokens = Tokenize.generate_tokens (Stringio (text). Read Line) for Toknum, Tokval, _, _, _ in tokens:if Toknum = = tokenize. Number:yield Int (tokval) elif toknum in [Tokenize. OP, Tokenize. STRING, Tokenize.name]: yield tokval elif toknum = = tokenize. Endmarker:break else:raise runtimeerror ("Unknown token%s: '%s '"% (Tokenize.tok_name[toknum], Tokval)) # # A simple optimization: constant folding often Volume folding ([Constant folding] (https://en.wikipedia.org/wiki/Constant_folding)) is a peephole optimization ([peephole optimization] (https:// En.wikipedia.org/wiki/peephole_optimization) is an example of what is expected to be done for some obvious snippets of code during compilation. For example, a mathematical expression involving constants such as ' 2 3 + ' can be easily implemented with this optimization. def constant_fold (code): "" "Constant-folds Simple Mathematical expressions like 2 3 + to 5." "While True: # Find, cons Ecutive numbers and an arithmetic operator for me, (A, B, op) in enumerate (Zip (code, code[1:], code[2:])): If Isinstance (A, int) and isinstance (b, int) and op in {"+", "-", "*", "/"}: M = Machine ((A, B, op)) M.run () code[i:i+3] = [M.top ()] Print ("Constant-folded%s%s%s to%s"% (A,op,b,m.top ())), break Else:break return code with Constant folding The only problem is that we have to update the jump address, but in many cases it's hard to do (ex: ' Test Cast_int jmp '). There are a number of workarounds for this problem, and one simple way to do this is to just jump to a named tag in the program and then parse out their real address after optimization. If you implement the ' Forth words ', also the function, you can do more optimizations, such as deleting program code that might never be used ([Dead Code Elimination] (Https://en.wikipedia.org/wiki/Dead_ code_elimination) # # REPL We can create a simple PERL, like this Def repl (): Print (' Hit ctrl+d or type ' exit ' to quit. ') while True:try:sou Rce = Raw_input (">") code = List (parse (source)) code = Constant_fold (code) machine (code). Run () except (RuntimeError, I Ndexerror) as E:print ("Indexerror:%s"% e) except Keyboardinterrupt:print ("\nkeyboardinterrupt") test our repl& with a few simple programs Gt 2 3 + 4 * printlnconstant-folded 2+3 to 5constant-folded 5*4 to 2020> DUP * println144> "Hello, world!" DUP print ln Printlnhello, world! Hello, world! you can see that the constant folding looks normal. In the first example, it optimizes the entire program to be such a println. # # Next, when you've added ' call ' and ' return ', you can let the user define their own functions. In [Forth] (https://enThe functions in. wikipedia.org/wiki/forth_ (Programming_language)) are called words, and they begin with a colon followed by a name and then end with a semicolon. For example, an integer squared word is long like this: square dup *; in fact, you can try to put this paragraph in the program, such as gforth$ Gforthgforth 0.7.3, Copyright (C) 1995-2008 free Softwa Re Foundation, Inc.gforth comes with absolutely NO WARRANTY; For details type ' license ' of type ' bye ' to exit:square DUP *; Ok12 Square. 144 OK You can support this by discovering ': ' in the parser. Once you have found a colon, you must record its name and its address (for example, the location in the program) and insert them into the symbol table ([Symbol table] (https://en.wikipedia.org/wiki/Symbol_table)). For simplicity, you can even put the entire function's code (including semicolons) in the dictionary, for example: symbol_table = {"Square": ["DUP", "*"]# ...} When you have completed the work of parsing, you can [connect] (Https://en.wikipedia.org/wiki/Linker_ (computing)) Your program: Traverse the entire main program and find the custom function in the symbol table. Once you find one and it does not appear behind the main program, you can attach it to the main program behind it. Then replace ' square ' with ' <address> call ', where ' <address> ' is the address where the function is inserted. In order to ensure that the program executes properly, you should consider excluding ' jmp ' operations. Otherwise, you have to parse them. It does, but you have to save them in the order in which they are written by the user. For example, if you want to move between subroutines, you need to be extra careful. You may need to add the ' exit ' function to stop the program (you may need to tell the operating system to return a value) so that the main program does not continue to run to the subroutine. In fact, a good program space layout is likely to treat the main program as a subroutine called ' main '. Or you decide what to look like. As you can see, this is all very interesting, and through this process you also learnThere is a lot of knowledge about code generation, linking, and program space layout. # # More things to do you can use the Python bytecode generation library to try to make the virtual machine code native Python bytecode. or run the Java implementation on top of the JVM so you can freely use [jiting] (https://en.wikipedia.org/wiki/Just-in-time_compilation). Similarly, you can also try the [register machine] (https://en.wikipedia.org/wiki/Register_machine). You can try the stack frame ([stack frames] (Https://en.wikipedia.org/wiki/Call_stack#STACK-FRAME)) to implement the call stack ([calling Stack] (https:// En.wikipedia.org/wiki/call_stack)) and establish a call session based on this. Finally, if you don't like a language like Forth, you can create a custom language that runs on top of this virtual machine. For example, you can convert an infix expression like ' (2+3) to ' 2 3 + 4 * ' and generate code. You can also allow C-style code blocks ' {...} ' In this case, the statement ' if ' (test) {...}} else {...} ' will be translated into <true/false test><address of true block><address of false block>ifjmp<true block>< Address of end of entire if-statement> Jmp<false block><address of end of entire if-statement> jmp example, Addres s Code-----------0 2 3 > 3 7 # Address of True-block 4 each # Address of False-block 5 if 6 jmp # Conditional Jump Base D on test# true-block 7 "are greater than three." 8 Println9 15 #Continue main PROGRAM10 jmp# False-block ("Else {...}") One "of the" are less than three. " Println13 Continue main Program14 jmp# If-statement finished, main program continues HERE15 ... Yes, you also need to add the comparison operator '! = < <= > >= '. I have implemented these things in my [C + + stack Machine] (https://github.com/cslarsen/stack-machine), which you can refer to below. I've got the code presented here as a project [Crianza] (Https://github.com/cslarsen/crianza), which uses more optimized and experimental models to compile Python bytecode. Good luck! # #完整的代码下面是全部的代码, compatible with Python 2 and Python 3 You can get it by [here] (HTTPS://GITHUB.COM/CSLARSEN/PYTHON-SIMPLE-VM). #!/usr/bin/env python# coding:utf-8 "" A simple VM interpreter. Code from the post at Http://csl.name/post/vm/This version should work on both Python 2 and 3. "" " From __future__ Import print_functionfrom collections import dequefrom io import stringioimport sysimport tokenizedef get_ Input (*args, **kw): "" "Read a string from the standard input." "if sys.version[0] = =" 2 ": Return Raw_input (*args, **kw) Else: return input (*args, **kw) class Stack (deque): push = Deque.append def top (sELF): Return self[-1]class machine:def __init__ (Self, code): Self.data_stack = Stack () Self.return_stack = Stack () self.i Nstruction_pointer = 0 Self.code = Code def pop (self): return Self.data_stack.pop () def push (self, value): Self.data_stack . push (value) def top (self): return Self.data_stack.top () def run (self): while Self.instruction_pointer < Len ( Self.code): opcode = Self.code[self.instruction_pointer] Self.instruction_pointer + 1 self.dispatch (opcode) def Dispatch (Self, op): Dispatch_map = {"%": self.mod, "*": Self.mul, "+": Self.plus, "-": Self.minus, "/": self.div, "= =": s Elf.eq, "Cast_int": Self.cast_int, "cast_str": Self.cast_str, "drop": Self.drop, "dup": Self.dup, "Exit": Self.exit, "if" : self.if_stmt, "jmp": self.jmp, "Over": Self.over, "print": Self.print, "println": self.println, "read": Self.read, "Stac K ": Self.dump_stack," swap ": Self.swap,} If op in Dispatch_map:dispatch_map[op] () elif isinstance (OP, int): Self.push (op ) # Push numbers on stack elif isinstance (op, str) and op[0]==op[-1]== ' "': Self.push (Op[1:-1]) # Push quoted strings on stack else:raise runtimeerror (" Unknown opcode : '%s ' "% op) # OPERATIONS Follow:def plus (self): Self.push (Self.pop () + Self.pop ()) def exit (self): sys.exit (0) def Minu S (self): last = Self.pop () Self.push (Self.pop ()-last) def mul (self): Self.push (Self.pop () * Self.pop ()) def div (self): l AST = Self.pop () Self.push (Self.pop ()/last) def mod (self): last = Self.pop () Self.push (Self.pop ()%) def dup (self): Self.push (Self.top ()) def over (self): b = Self.pop () a = Self.pop () Self.push (a) Self.push (b) Self.push (a) def drop (self) : Self.pop () def swap (self): b = Self.pop () a = Self.pop () Self.push (b) Self.push (a) def print (self): Sys.stdout.write (str (Self.pop ())) Sys.stdout.flush () def println (self): Sys.stdout.write ("%s\n"% Self.pop ()) Sys.stdout.flush () def read (self): Self.push (Get_input ()) def cast_int (self): self.push (int (Self.pop ())) def cast_str (self): Self.push (str (Self.pop ())) def eq (self): Self.push (SELf.pop () = = Self.pop ()) def if_stmt (self): False_clause = Self.pop () True_clause = Self.pop () test = Self.pop () Self.push ( True_clause if test else false_clause) def jmp (self): addr = Self.pop () if isinstance (addr, int) and 0 <= addr < len (self.code): Self.instruction_pointer = addr else:raise RuntimeError ("JMP address must be a valid integer.") def Dump_sta CK (self): print ("The Data stack (top first):") for V in Reversed (self.data_stack): Print ("-type%s, value '%s '"% (type (v), V) def parse (text): # Note that the Tokenizer module was intended for parsing Python source # code, so if you ' re going to E Xpand on the parser, your may has to use # another tokenizer. If sys.version[0] = = "2": Stream = Stringio (Unicode (text)) Else:stream = Stringio (text) tokens = Tokenize.generate_tokens (stream.readline) for Toknum, Tokval, _, _, _ in tokens:if Toknum = = tokenize. Number:yield Int (tokval) elif toknum in [Tokenize. OP, Tokenize. STRING, Tokenize.name]: yield tokval elif toknum = = Tokenize.enDmarker:break else:raise runtimeerror ("Unknown token%s: '%s '"% (Tokenize.tok_name[toknum], tokval)) def constant_fold ( Code): "" "Constant-folds Simple Mathematical expressions like 2 3 + to 5." "and True: # Find, consecutive numbers an D an arithmetic operator for I, (A, B, op) in enumerate (Zip (code, code[1:], code[2:])): If Isinstance (A, int) and Isinstan CE (b, int) and op in {"+", "-", "*", "/"}: M = Machine ((A, B, op)) M.run () code[i:i+3] = [M.top ()] Print ("constant-folded %s%s%s to%s "% (A,op,b,m.top ())) Break Else:break return Codedef repl (): Print (' hits ' Ctrl+d or type ' exit ' to quit. ') whi Le True:try:source = get_input (">") code = List (parse (source)) code = Constant_fold (code) machine (code). Run () except (RuntimeError, Indexerror) as E:print ("Indexerror:%s"% e) except Keyboardinterrupt:print ("\nkeyboardinterrupt") def te St (Code = [2, 3, "+", 5, "*", "println"]): Print ("Code before optimization:%s"% str (code)) optimized = constant_fold (COD e) Print ("Code after OptiMization:%s "% str (optimized)) print (" Stack after running original program: ") A = Machine (code) A.run () A.dump_stack () PR Int ("Stack after running optimized program:") b = Machine (optimized) B.run () b.dump_stack () result = A.data_stack = B.dat A_stack print ("Result:%s"% ("OK" if Result else "FAIL")) return Resultdef examples (): Print ("* * Program 1:runs the Code For ' Print ((2+3) *) ') machine ([2, 3, "+", 4, "*", "println"]). Run () print ("\n** program 2:ask for numbers, computes sum and product. ") Machine ([' "Enter a number:" ', "print", "read", "Cast_int", ' "Enter another number:" ', "print", "read", "Cast_int", "ove R "," Over ", '" Their sum are: "'," "print", "+", "println", "Their product is:" ', "print", "*", "println"]). Run () print ("\ n** program 3:shows branching and looping (use Ctrl+d to exit). ") Machine ([' "" Enter A Number: "'," print "," read "," Cast_int "," The Number "'," print "," DUP "," print ", '" is "'," print ", 2, "%", 0, "= =", ' "even." ', ' "odd." ', "if", "println", 0, "jmp" # loop forever! ]). Run () if __name__ = = "__main__": Try:if len (sys.argv) > 1:cmd = sys.argv[1] if cmd = = "Repl": Repl () elif cmd = = "T EST ": Test () examples () else:print (" Commands:repl, Test ") Else:repl () except Eoferror:print (" ") * * This department [OneAPM] (http://o Neapm.com/index.html?utm_source=common&utm_medium=articles&utm_campaign=technicalarticles&from= Matefijuno) engineer to compile and organize. ONEAPM is the emerging leader in China's basic software industry, helping enterprise users and developers easily implement slow program code and real-time crawling of SQL statements. To read more technical articles, visit the oneapm[Official Technology blog] (http://code.oneapm.com/?hmsr=media&hmmd=&hmpl=&hmkw=&hmci=). **

Translation uses Python to write the virtual machine interpreter

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.