Implement _python_ interpreter with _python_

Source: Internet
Author: User

Summary: Allison is a Dropbox engineer, where she maintains one of the world's largest Python client networks. Before she went to Dropbox, she was the coordinator of Recurse Center, the author of the New York-based Programmer's Study institute.

Allison is a Dropbox engineer, where she maintains one of the world's largest Python client networks. Before she went to Dropbox, she was the coordinator of Recurse Center, the author of the New York-based Programmer's Study institute. She has lectured on the internal mechanisms of Python in Pycon, North America, and she likes to study strange bugs. Her blog address is akaptur.com.

Introduced

Byterun is a Python interpreter implemented in Python. As I developed the Byterun, I was pleasantly surprised to find that the basic structure of this Python interpreter could be implemented with 500 lines of code. In this chapter we will figure out the structure of this interpreter and give you enough background to explore. Our goal is not to show you every detail of the interpreter---like other interesting areas of programming and computer science, you might invest a few years to get a deeper look at this topic.

Byterun was Ned Batchelder and I, built on Paul Swartz's work. Its structure is similar to the main Python implementation (CPython), so understanding Byterun will help you understand most of the interpreters, especially the CPython interpreter. (If you don't know what Python you're using, chances are it's CPython). Although Byterun is small, it can perform most simple Python programs (this chapter is based on the bytecode generated by Python 3.5 and its previous version, and the bytecode generated in Python 3.6 has some changes).

Python Interpreter

Before you begin, let's limit the meaning of the "Pyhton interpreter". When discussing Python, the word "interpreter" can be used in many different places. Sometimes the interpreter refers to the Python REPL, which is the interactive environment you get when you knock down the command line python . Sometimes people are more or less interchangeable using the Python interpreter and Python to illustrate the process of executing Python code from beginning to end. In this chapter, the interpreter has a more precise meaning: The last step in the execution of the Python program.

Python performs the other 3 steps before the interpreter takes over: Lexical parsing, parsing, and compiling. These three steps together convert the source code into a code object, which contains instructions that the interpreter can understand. The interpreter's job is to interpret the instructions in the Code object.

You might be surprised that executing Python code would have to compile this step. Python is often referred to as an interpreted language, just like Ruby,perl, as opposed to a compiled language like C,rust. However, the term is not as precise as it looks. Most explanatory languages, including Python, do have a compile-time step. The reason Python is known as interpreted is that it works relatively little in compiling this step relative to the compiled language (the interpreter does relatively much work). Later in this chapter you will see that the Python compiler requires less information about the behavior of the program than the C language compiler.

Python's Python Interpreter

Byterun is a Python interpreter written in Python, which may surprise you, but there is nothing more strange than writing C language compilers in C. (In fact, the widely used GCC compiler is written in the C language itself) you can write a Python interpreter in almost any language.

Writing python in Python has both advantages and disadvantages. The biggest drawback is speed: with Byterun execution code than with CPython execution slower, CPython interpreter is implemented in C language, and did a careful optimization. However, Byterun is designed for learning, so speed is not important to us. The biggest advantage of using Python is that we can just implement the interpreter without worrying about the Python runtime part, especially the object system. For example, when Byterun needs to create a class, it will fall back to "real" Python. Another advantage is that Byterun is easy to understand, in part because it is written in high-level languages that people can easily understand (Python!). (In addition we do not optimize the interpreter-once again, clarity and simplicity are more important than speed)

Building an interpreter

Before we examine the Byterun code, we need to have some understanding of the structure of the interpreter from a high level. How does the Python interpreter work?

The Python interpreter is a virtual machine, which is a software that simulates real-world computers. Our virtual machine is a stack machine, which uses several stacks to perform operations (in contrast to register machine register machines, which read and write data from a specific memory address).

The Python interpreter is a byte-code interpreter bytecode interpreter: its input is some instruction set called bytecode bytecode. When you write Python code, the lexical parser, the parser, and the compiler generate code objects that let the interpreter operate. Each code object contains a set of instructions to be executed-it is the bytecode-and some of the information that the interpreter needs. Bytecode is an intermediate layer of Python code that represents intermediate representation: it represents the source code in a way that an interpreter can understand. This is similar to the intermediate representation of assembly language as C and machine language.

Micro Interpreter

To make the instructions more specific, let's start with a very small interpreter. It can only calculate two numbers and only understand three instructions. All of the code it executes is just a different combination of these three instructions. Here are the three instructions:

    • LOAD_VALUE
    • ADD_TWO_VALUES
    • PRINT_ANSWER

We don't care about morphology, grammar, and compilation, so we don't care how these instruction sets are produced. As you can imagine, when you write down 7 + 5 , then a compiler generates a combination of those three commands for you. If you have a suitable compiler, you can even write with Lisp's syntax, as long as it generates the same instructions.

Assume

    1. 7 + 5

To generate such an instruction set:

  1. what_to_execute = {
  2. "instructions": [("LOAD_VALUE", 0), # the first number
  3. ("LOAD_VALUE", 1), # the second number
  4. ("ADD_TWO_VALUES", None),
  5. ("PRINT_ANSWER", None)],
  6. "numbers": [7, 5] }

The Python interpreter is a stack machine, so it has to do this addition (see) through the Operation Stack. The interpreter executes the first instruction, LOAD_VALUE pressing the first number onto the stack. It then presses the second number into the stack. Then, the third instruction, the ADD_TWO_VALUES first two number from the stack popped out, add up, and then press the result into the stack. The final step is to eject and output the results.

Stack machine

LOAD_VALUEThis instruction tells the interpreter to press a number into the stack, but the instruction itself does not indicate what the number is. The instruction requires an additional information to tell the interpreter where to find this number. So our instruction set has two parts: the instruction itself and a list of constants. (in Python, bytecode is what we call an "instruction", and the Interpreter "executes" the code object.) )

Why not embed the numbers directly into the instructions? Imagine if we add a string instead of a number. We don't want to add the string to the instruction because it can have any length. In addition, our design also means that we only need a copy of the object, such as this addition 7 + 7 , and now the constant table "numbers" contains only one [7] .

You might wonder why you would need ADD_TWO_VALUES a directive other than that. Indeed, for our two-digit addition, this example is a bit of a man-made meaning. However, this directive is the wheel of a more complex program. For example, for the three instructions we currently define, we can do three numbers of additions, or any number of additions, as long as the correct set of instructions is given. At the same time, the stack provides a clear way to track the state of the interpreter, which provides support for the complexity of our growth.

Now let's finish our interpreter. The Interpreter object requires a stack, which can be represented by a list. It also requires a way to describe how each instruction is executed. For example, LOAD_VALUE a value is pushed into the stack.

Original link

Implementing _python_ interpreter with _python_

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.