Python Virtual machine Framework

Source: Internet
Author: User
Tags case statement

Python byte code

We know that the Python source code compiles the source code into a sequence of bytecode before execution, and the Python virtual machine performs a series of operations based on these bytecode to complete the execution of the Python program. In Python2.5, a total of 104 bytecode directives are defined:

Opcode.h

#define Stop_code0#define pop_top1#define rot_two2#define rot_three3#define dup_top4#define ROT_FOUR5#define NOP9# Define Unary_positive10#define unary_negative11#define unary_not12#define unary_convert13#define UNARY_INVERT15# Define List_append18#define binary_power19 ... #define CALL_FUNCTION_KW           141/* #args + (#kwargs <<8) */#define call_function_var_kw       142/* #args + (#kwargs <<8) *//* support for Opargs more than bits long */#define Extende D_arg  143

  

If we look closely at the bytecode instruction above, we will find that although the bytecode is defined from 0 to 143, there is a jump in the middle, for example 5 jumps directly to 9, 13 jumps directly to 15, 15 jumps directly to 18. So, Python2.5 actually defines only 104 bytecode instructions.

of the 104 instructions in Python2.5, some require parameters, and the other part has no parameters. All bytecode directives that require parameters are encoded at greater than 90. Python provides a special macro to determine whether a bytecode directive requires parameters:

Opcode.h

#define HAVE_ARGUMENT90/* opcodes From here has an argument: */#define HAS_ARG (OP) ((OP) >= have_argument)

  

Our code object in Python with the PYc file (a), the Python code object and the PYc file (ii), and the Python code object and the PYc file (iii) describe the Pycodeobject object, This object is a static object generated by Python in memory after compiling the source code, which of course contains the compiled bytecode of the source code, which we can parse using the Code object Parsing tool provided by Python.

# cat demo.py i = 1s = "Python" D = {}l = []# python............>>> Source = open ("demo.py"). Read () >>> CO = Compi Le (Source, "demo.py", "exec") >>> import dis>>> Dis.dis (CO)  1           0 load_const               0 (1)              3 Store_name               0 (i)  2           6 load_const               1 (' Python ')              9 store_name               1 (s)  3          Build_map                0             store_name               2 (d)  4          build_list               0             store_name               3 (l)             Load_ CONST               2 (None)             

  

The leftmost column is the number of lines that the bytecode directive corresponds to in the source code, the second column is the current byte code offset in Co_code, the third column shows the current bytecode instruction, the fourth column is the parameter of the instruction, the last column is the actual argument after the calculation

Running framework for Python virtual machines

When Python starts, the Python runtime environment is initialized first. Note that the runtime environment here is different from the execution environment in the previous chapter, Python code object and PYc file. The runtime environment is a global concept, and the execution environment is actually a stack frame. is a concept that corresponds to a code block. And the implementation of Python virtual machine is in a function, here we list the source code, and the actual source code will do some pruning:

Ceval.c

Pyobject * Pyeval_evalframeex (pyframeobject *f, int throwflag) {.... CO = F->f_code;names = Co->co_names;consts = co- >co_consts;fastlocals = F->f_localsplus;freevars = f->f_localsplus + Co->co_nlocals;first_instr = ( unsigned char*) pystring_as_string (co->co_code); next_instr = first_instr + f->f_lasti + 1;stack_pointer = f->f_ Stacktop;assert (Stack_pointer! = null); f->f_stacktop = null; ...}

  

Pyeval_evalframeex First Initializes a number of variables, where important information contained in the Pycodeobject object in the Pyframeobject object is taken care of. Another important action, of course, is to initialize the stack top pointer stack_pointer to point to F->f_stacktop. The Co_code field in the Pycodeobject object holds the parameters of the bytecode instruction and the bytecode instruction, and the Python virtual machine executes the bytecode instruction sequence in the process of traversing the entire co_code from beginning to end and executing the bytecode instruction sequentially.

In a Python virtual machine, 3 variables are used to complete the traversal process. Co_code is actually a Pystringobject object, and the character array is really meaningful, and the entire bytecode instruction sequence is actually a character array in C. Therefore, the 3 variables used during traversal are variables of char * type, FIRST_INSTR always points to the beginning of the bytecode instruction sequence, and next_instr always points to the location of the next bytecode instruction to be executed f_ Lasti the location of the previous byte Code directive that has been executed

Figure 1-1 Traversing the byte-code instruction sequence

Figure 1-1 shows the story of 3 variables at some point in the traversal

The architecture of the Python virtual machine execution bytecode instruction is actually a for loop plus a huge switch/case structure:

Ceval.c

Pyobject *pyeval_evalframeex (pyframeobject *f, int throwflag) {.... why = Why_not;for (;;) {... fast_next_opcode:f->f_lasti = Instr_offset ();//get bytecode instruction opcode = Nextop (); oparg = 0;//If the directive requires parameters, get the instruction parameter if (Has_ ARG (opcode)) Oparg = Nextarg ();d ispatch_opcode:switch (opcode) {case Nop:goto fast_next_opcode;case load_fast: ...} ............} ............}

  

The code above is just an extremely simplified Python virtual machine, and the complete code is implemented in the Pyeval_evalframeex method of the Ceval.c file.

In this execution architecture, a step-by-step traversal of the bytecode is accomplished by a few macros:

Ceval.c

#define Instr_offset () ((int) (NEXT_INSTR-FIRST_INSTR)) #define NEXTOP () (*next_instr++) #define NEXTARG () (Next_instr + = 2, (next_instr[-1]<<8) + next_instr[-2])

  

In the analysis of the Pycodeobject object, we said that Python bytecode has some parameters, some without parameters, determine whether the byte code with parameters specific reference has_arg the implementation of this macro, for different bytecode instructions, because there is a need for the difference in instruction parameters, so next The displacement of the _instr may be different, but anyway, next_instr always points to the next byte code to be executed by Python.

After acquiring a byte code and its required instruction parameters, Python will use switch to determine the byte code instruction, select different case statements according to the result of the judgment, and each byte code instruction will correspond to a case statement. In the case statement, the implementation of the Python-byte-code instruction

After a successful execution of a bytecode instruction, the Python execution process jumps to the fast_next_opcode, or the For loop, however, Python's next action is to obtain the next bytecode instruction and instruction parameters, completing the execution of the next instruction. The execution of the Python program is completed by traversing all the bytecode instructions contained in the Co_code in such a way.

It is also necessary to mention a variable "why", which indicates the state of the Python execution engine when exiting this huge for loop, because the Python execution engine does not have to be correctly executed every time, and there is a good chance that an error occurred while executing a byte code, which is the exception we are familiar with-- exception So when Python exits the execution engine, it needs to know what the execution engine ends up with, is it normal to end? Or is it because of the mistake that can't be carried out? Why take on the heavy lifting

The value range of the variable why is defined in CEVAL.C, which is actually the state of the Python end bytecode execution:

Ceval.c

Enum Why_code {why_not =0x0001,/* No error */why_exception = 0x0002,/* EXCEPTION occurred */why_reraise =0x0004,/* Excepti On re-raised by ' finally ' */why_return =0x0008,/* ' RETURN ' statement */why_break =0x0010,/* ' break ' statement */why_contin UE =0x0020,/* ' Continue ' statement */why_yield =0x0040/* ' YIELD ' operator */};

  

A preliminary study of Python run-time environment

As we said earlier, the pyframeobject corresponds to the stack frame of the executable file at execution time, but it is not enough for an executable to run only on the stack frame in the operating system, we also ignore two concepts that are critical to executables: processes and threads. Python creates a main thread at initialization, so there is a main thread in its running environment. Because parsing the python exception mechanism later takes advantage of the threading model inside Python, we need to have a holistic conceptual understanding of the Python threading model.

As an example of the Win32 platform, we know that the native Win32 executable is executed within a process. A process is not an active object that corresponds to a sequence of machine instructions, and the active object of the machine instruction sequence in this executable is abstracted by the thread concept, while the process is the thread's active environment

For a typical single-threaded executable, the operating system creates a process at execution time, and in the process, there is a main thread, and for multithreaded executables, the operating system creates a process and multiple threads at execution time, which can share global variables in the process address space. This naturally leads to thread synchronization issues. The CPU switch to the task is actually switching between threads, and when switching tasks, the CPU needs to perform the storage of the thread environment, and after switching to a new thread, it needs to restore the thread environment

The operating framework of the Python virtual machine that we saw earlier is actually an abstraction of the CPU, which can be seen as a soft cpu,python in which all threads use the soft CPU to do the computation. The task switching mechanism of real machines corresponds to Python, which is the mechanism by which different threads take turns using virtual machines

The CPU switch task needs to save the thread running environment. For Python, it is also necessary to save information about the current thread before switching threads. In Python, this abstraction of thread state information is implemented by Pythreadstate objects, and a thread will have a Pythreadstate object. So in another sense, this Pythreadstate object can also be seen as an abstraction of the thread itself. But in fact, there is a big difference between the two, Pythreadstate is not a simulation of the thread itself, because the thread in Python still uses the operating system's native thread, Pythreadstate is simply an abstraction of the thread state

Under Win32, a thread cannot survive independently, it needs to survive in a process environment, and multiple threads can share some of the resources of a process. In Python, too, if there are two threads in a Python program that will perform the same action--import SYS, how many copies should the SYS module be stored? Is it a global share or a single SYS module per thread? If each thread is a separate SYS module, then the consumption of Python memory can be very astonishing, so in Python, the module is shared globally, as if these module are shared resources in the process, for the concept of the process, Python is implemented as a Pyinterpreterstate object.

Under Win32, there are usually multiple processes, and Python can actually be present by multiple logical interpreter. In general, Python has only one interpreter, and this interpreter maintains one or more Pythreadstate objects, and the threads corresponding to those pythreadstate objects take turns using a byte-code execution engine

Now, show the Pyinterpreterstate object and the Pythreadstate object that represents the concept of the thread that you just mentioned:

Pystate.h

typedef struct _IS {struct _is *next; struct _TS *tstate_head;    A collection of threads in the simulated process environment pyobject *modules;    Pyobject *sysdict;    Pyobject *builtins;    Pyobject *modules_reloading;    Pyobject *codec_search_path;    Pyobject *codec_search_cache;    Pyobject *codec_error_registry; ............} pyinterpreterstate;typedef struct _TS {/* See PYTHON/CEVAL.C-Comments explaining most fields */struct _TS *nex    T    Pyinterpreterstate *interp; struct _frame *frame;    Simulates a function call stack in a thread int recursion_depth;    int tracing;    int use_tracing;    Py_tracefunc C_profilefunc;    Py_tracefunc C_tracefunc;    Pyobject *c_profileobj;    Pyobject *c_traceobj;    Pyobject *curexc_type;    Pyobject *curexc_value;    Pyobject *curexc_traceback;    Pyobject *exc_type;    Pyobject *exc_value;    Pyobject *exc_traceback;  Pyobject *dict;    /* Stores per-thread state */int tick_counter;    int gilstate_counter; Pyobject *async_exc; /* Asynchronous exception to raise */long thread_id; /*Thread ID where this tstate is created */} pythreadstate; 

  

In the Pythreadstate object, we see the familiar Pyframeobject (_frame) object. That is, in each Pythreadstate object, a list of stack frames is maintained to correspond to the function invocation mechanism in the thread of the Pythreadstate object. The same is true on Win32, where each thread will have a function call stack

When the Python virtual machine starts executing, the frame in the current thread state object is set to the current execution Environment (frame):

Pyobject *pyeval_evalframeex (pyframeobject *f, int throwflag) {...//through Pythreadstate_ Get gets the thread state object corresponding to the currently active thread pythreadstate *tstate = Pythreadstate_get (); tstate->frame = f;//Set the thread state object to Frameco = F->f_ Code;names = Co->co_names;consts = co->co_consts;............//Virtual machine main loop for (;;) {opcode = Nextop (); oparg = 0;   /* Allows Oparg to is stored in a register Becauseit doesn ' t has to is remembered across a full loop */if (has_arg (opcode )) Oparg = Nextarg ();//command dispatch switch (opcode) {...} ............} ............}

  

When creating a new Pyframeobject object, the old frame is removed from the state object of the current thread, and the Pyframeobject list is created:

Pyframeobject *pyframe_new (pythreadstate *tstate, Pycodeobject *code, Pyobject *globals,    PyObject *locals) {// Gets the current execution environment of the current thread from pythreadstate pyframeobject *back = tstate->frame; Pyframeobject *f;............//Create a new execution Environment F = pyobject_gc_resize (Pyframeobject, F, extras); ...//link current execution Environment F->f_back = back;f- >f_tstate = Tstate;return F;}

  

Figure 1-2python Run-time environment

Python Virtual machine Framework

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.