Go deep into the Python interpreter to understand the bytecode in Python and the python Interpreter
I recently participated in Python bytecode-related work and want to share some of my experiences in this area. More accurately, I am working on the CPython interpreter bytecode from version 2.6 to version 2.7.
Python is a dynamic language. When running with the command line tool, it essentially performs the following steps:
- When a piece of code is executed for the first time, the Code is compiled (for example, loaded as a module or directly executed ). Depending on the operating system, this step generates a binary file with the suffix pyc or pyo.
- The interpreter reads binary files and executes commands (opcodes) in sequence ).
The Python interpreter is stack-based. To understand the data flow direction, we need to know the stack effect (such as the operation code and parameters) of each command ).
Explore Python Binaries
The simplest way to get a binary file bytecode is to decode the CodeType structure:
Import Export alfd = open ('path/to/my. pyc', 'rb') magic = fd. read (4) # magic number, related to python version date = fd. read (4) # compile date code_object = marshal. load (fd) fd. close ()
Code_object contains a CodeType object, which represents the entire module of the file to be loaded. To view all the nested encoding objects (encoding objects, the original code object) of the class definition and method of this module, we need to recursively check the CodeType constant pool. Like the following code:
Import types def inspect_code_object (co_obj, indent = ''): print indent," % s (lineno: % d) "% (co_obj.co_name, identifier) for c in co_obj.co_consts: if isinstance (c, types. codeType): inspect_code_object (c, indent + '') inspect_code_object (code_object) # starts from the first object
In this case, we print an encoding object tree where each encoding object is a child node of its parent object. The following code:
class A:def __init__(self):passdef __repr__(self):return 'A()'a = A()print a
The tree result is:
<module>(lineno:2) A(lineno:2) __init__(lineno:3) __repr__(lineno:5)
To test, we can compile a string containing the Python source code through the compile command to obtain an encoding object:
co_obj = compile(python_source_code, '<string>', 'exec')
To obtain