Preliminary Exploration of the implementation principles of Python programs, explore the principles of python
1. Process Overview
Python first compiles the code (. py file) into bytecode and delivers it to the bytecode Virtual Machine. Then, the virtual machine executes the bytecode command one by one to complete program execution.
2. bytecode
The bytecode corresponds to the PyCodeObject object in the Python virtual machine program.
The. pyc file is the representation of bytecode on the disk.
3. pyc File
The Creation Time of the PyCodeObject object is when the module is loaded, that is, import.
Python test. py will compile test. py into bytecode and explain the execution, but will not generate test. pyc.
If test. py loads other modules, such as import util, Python will compile util. py into bytecode, generate util. pyc, and then explain and execute the bytecode.
To generate test. pyc, we can use the Python built-in module py_compile for compilation.
When a module is loaded, if it exists at the same time. py and. pyc, Python will try to use. pyc, if. pyc Compilation Time is earlier. py, then re-compile. py and update. pyc.
4. PyCodeObject
The Python code compilation result is a PyCodeObject object.
Typedef struct {PyObject_HEAD int co_argcount;/* Number of location parameters */int co_nlocals;/* Number of local variables */int co_stacksize;/* stack size */int co_flags; PyObject * co_code; /* bytecode command sequence */PyObject * co_consts;/* all constants Set */PyObject * co_names;/* All Symbol names set */PyObject * co_varnames; /* set of local variable names */PyObject * co_freevars;/* Set of variable names for closure */PyObject * co_cellvars; /* Set of variable names referenced by internal nested functions * // * The rest doesn't count for hash/cmp */PyObject * co_filename; /* Name of the Code */PyObject * co_name;/* Module name | function name | class name */int co_firstlineno;/* start line number of the code block in the file */PyObject * co_lnotab; /* ing between bytecode commands and row numbers */void * co_zombieframe;/* for optimization only (see frameobject. c) */} PyCodeObject; typedef struct {PyObject_HEAD int co_argcount;/* Number of location parameters */int co_nlocals;/* Number of local variables */int co_stacksize; /* stack size */int co_flags; PyObject * co_code;/* bytecode command sequence */PyObject * co_consts;/* all constants */PyObject * co_names; /* set of all Symbol names */PyObject * co_varnames;/* set of local variable names */PyObject * co_freevars;/* Set of variable names for closure */PyObject * co_cellvars; /* Set of variable names referenced by internal nested functions * // * The rest doesn't count for hash/cmp */PyObject * co_filename; /* Name of the Code */PyObject * co_name;/* Module name | function name | class name */int co_firstlineno;/* start line number of the code block in the file */PyObject * co_lnotab; /* ing between bytecode commands and row numbers */void * co_zombieframe;/* for optimization only (see frameobject. c) */} PyCodeObject;
5. pyc File Format
When the module is loaded, the PyCodeObject object corresponding to the module is written into the. pyc file. The format is as follows:
6. Analyze bytecode
6.1 parse PyCodeObject
Python provides the built-in function compile to compile Python code and view PyCodeObject objects, as follows:
Python code [test. py]
s = ”hello”def func(): print sfunc() s = ”hello” def func(): print s func()
Compile the code in the Python Interactive shell to get the PyCodeObject object:
Dir (co) has listed various co domains. To view a specific domain, output it directly on the terminal:
PyCodeObject of test. py
co.co_argcount 0co.co_nlocals 0co.co_names (‘s', 'func')co.co_varnames (‘s', 'func')co.co_consts (‘hello', <code object func at 0x2aaeeec57110, file ”test.py”, line 3>, None)co.co_code 'd\x00\x00Z\x00\x00d\x01\x00\x84\x00\x00Z\x01\x00e\x01\x00\x83\x00\x00\x01d\x02\x00S' co.co_argcount 0co.co_nlocals 0co.co_names (‘s', 'func')co.co_varnames (‘s', 'func')co.co_consts (‘hello', <code object func at 0x2aaeeec57110, file ”test.py”, line 3>, None)co.co_code 'd\x00\x00Z\x00\x00d\x01\x00\x84\x00\x00Z\x01\x00e\x01\x00\x83\x00\x00\x01d\x02\x00S'
The Python interpreter will also generate the bytecode PyCodeObject object for the function. For details, refer to co_consts [1].
PyCodeObject of func
func.co_argcount 0func.co_nlocals 0func.co_names (‘s',)func.co_varnames ()func.co_consts (None,)func.co_code ‘t\x00\x00GHd\x00\x00S' func.co_argcount 0func.co_nlocals 0func.co_names (‘s',)func.co_varnames ()func.co_consts (None,)func.co_code ‘t\x00\x00GHd\x00\x00S'
Co_code is a command sequence and a binary stream. For the format and resolution method of co_code, see 6.2.
6.2 parsing command sequence
Command Sequence co_code format
Python built-in dis module can parse co_code, such:
Command sequence of test. py
Command sequence of func Function
The first column indicates the number of the following commands in the py file;
The second column shows the offset of the command in the command sequence co_code;
The third column is the command opcode name, which can be divided into two types: the operand and the non-operand. opcode is a byte integer in the command sequence;
The fourth column is the operand oparg, which occupies two bytes in the command sequence, basically the subscript of co_consts or co_names;
The fifth column contains the description of the operands.
7. Execute the bytecode
The principle of a Python virtual machine is to simulate executable programs and then run on an X86 machine. The stack frame of X86 Runtime is as follows:
If test. py is implemented in C, it will look like the following:
const char *s = “hello”;void func() { printf(“%s\n”, s);}int main() { func(); return 0;} const char *s = “hello”; void func() { printf(“%s\n”, s);} int main() { func(); return 0;}
The principle of a Python virtual machine is to simulate the above behavior. When a function call occurs, create a new stack frame. The corresponding Python implementation is the PyFrameObject object.
7.1 PyFrameObject
Typedef struct _ frame {PyObject_VAR_HEAD struct _ frame * f_back;/* caller's frame */PyCodeObject * f_code;/* bytecode object corresponding to the frame */PyObject * f_builtins; /* built-in namespace */PyObject * f_globals;/* global namespace */PyObject * f_locals;/* local namespace */PyObject ** f_valuestack; /* runtime stack bottom */PyObject ** f_stacktop;/* runtime stack top */.......} Typedef struct _ frame {PyObject_VAR_HEAD struct _ frame * f_back;/* caller's frame */PyCodeObject * f_code;/* bytecode object corresponding to the frame */PyObject * f_builtins; /* built-in namespace */PyObject * f_globals;/* global namespace */PyObject * f_locals;/* local namespace */PyObject ** f_valuestack; /* runtime stack bottom */PyObject ** f_stacktop;/* runtime stack top */.......}
The corresponding Python runtime stack is like this:
7.2 execute commands
When you run the test. py bytecode, a stack frame is created first. The following uses f to represent the current stack frame. The execution process is annotated as follows:
Set of symbols and constants of test. py
co.co_names (‘s', 'func')co.co_consts (‘hello', <code object func at 0x2aaeeec57110, file ”test.py”, line 3>, None) co.co_names (‘s', 'func')co.co_consts (‘hello', <code object func at 0x2aaeeec57110, file ”test.py”, line 3>, None)
Command sequence of test. py
When the preceding CALL_FUNCTION command is executed, a new stack frame is created and the func bytecode command is executed. The following uses f to represent the current stack frame. The execution process of the func bytecode is as follows:
Symbol and constant sets of func Functions
func.co_names (‘s',)func.co_consts (None,) func.co_names (‘s',)func.co_consts (None,)
Command sequence of func Function
7.3 view stack Frames
If you want to view the current stack frame, Python provides the sys. _ getframe () method to get the current stack frame. You only need to add the following code to the Code:
Def func (): import sys frame = sys. _ getframe () print frame. f_locals print frame. f_globals print frame. f_back.f_locals # You can print the print s of each frame domain