Python source code analysis Note 3-Python execution Principle

Source: Internet
Author: User
Tags builtin

Python source code analysis Note 3-Python execution Principle
Python source code analysis Note 3-Python execution Principle

The book address: http://www.jianshu.com/p/03af86845c95

I have written several source code analysis notes before. However, if I feel that I have not understood the python execution principle from a macro perspective, it is too confusing to analyze from the bottom up, it is better to have a basic understanding of python execution principles from a macro perspective and then explore the details. This is why I haven't updated my notes for so long. I have been reading source code analysis books and source code, hoping to clarify the python execution principles from a macro perspective. People say that reading from thin reading is thick, and then reading from thick reading is to understand the true meaning, hoping to reach this situation, adding a bit of oil.

1. Python runtime environment Initialization

Before reading how to execute the SDK, you must briefly describe the python runtime environment initialization. Python has an interpreter State object PyInterpreterState used to simulate a process (hereinafter referred to as a process object), and another thread state object PyThreadState to simulate a thread (hereinafter referred to as a thread object ). The PyInterpreterState structure in python is linked through a linked list to simulate multiple processes in the operating system. The process object has a pointer pointing to the thread set, and the thread object has a pointer pointing to its corresponding process object, so that the thread and process are associated. Of course, there is also a running thread object _ PyThreadState_Current to maintain the currently running thread.

1.1 process thread Initialization

Python calls the PyInitialize () function to initialize the runtime environment. In the initialization function, the process object interp and thread object are created and associated with the thread object, and the currently running thread object is set as the newly created thread object. Next we will initialize the type system, including int, str, bool, list, and other types of initialization, which will be left here for further analysis. Then there is another big data, that is, system module initialization. The interp process object has a modules variable used to maintain all module objects. The modules variable is a dictionary object, where the maintenance (name, module) ing relationship corresponds to sys. modules in python.

1.2 module initialization

System module initialization__builtin__, sys, __main__, siteAnd other modules. In python, the module object exists as a PyModuleObject struct. Except for Common Object headers, there is only one dictionary field md_dict. The content stored by the md_dict field in the module object is familiar to us, for example__name__, __doc__And methods in the module.

In__builtin__During module initialization, the content stored in md_dict includes built-in functions and system type objects, such as len, dir, getattr, and int, str, and list objects. Because of this, we can directly use the len function in the code, because according to the LEGB rules, we can__builtin__Module. Almost identical Process CreationsysModule and__main__Module. Process object after creationinterp->builtinsWill be set__builtin__The md_dict field of the module, that is, the dictionary field in the module object. Whileinterp->sysdictThe md_dict field of the sys module.

After the sys module is initialized, it includes the previously mentioned modules, path, version, stdin, stdout, maxint, and other attributes, such as exit, getrefcount, and _ getframe. Note that the basic sys. path (that is, the lib path of the python installation directory) is set here. The path of the third-party module is added when the site module is initialized.

It must be noted that,__main__The module is a special module. When we write the first python program__name__ == "__main__"In__main__The module name. When we usepython xxx.pyWhen running a python program, the source file can be called__main__And if it is imported through other modules, its name is the name of the source file itself. As for why, this will be detailed in the example of running a python program. Note that when creating__main__Module, it will be inserted in the module dictionary("__builtins__", __builtin__ module)Relationship. This module is particularly important later, because the f_buitins field of the stack frame object PyFrameObject will be set__builtin__The locals and globals fields of the stack frame object will be set__main__Module dictionary.

In addition,siteModule initialization is mainly used to initialize the search path of a third-party python module. The sys. path we often use is set by this module. It not only adds the site-packages path to sys. path, but also all the paths in the. pth file under the site-packages directory to sys. path.

The following is some verification code. We can see that sys. modules actually has__builtin__, sys, __main__And other modules. In addition, all system type objects are located in__builtin__Module dictionary.

In [13]: import sysIn [14]: sys.modules['__builtin__'].__dict__['int']Out[14]: intIn [15]: sys.modules['__builtin__'].__dict__['len']Out[15]: 
  
   In [16]: sys.modules['__builtin__'].__dict__['__name__']Out[16]: '__builtin__'In [17]: sys.modules['__builtin__'].__dict__['__doc__']Out[17]: "Built-in functions, exceptions, and other objects.\n\nNoteworthy: None is the `nil' object; Ellipsis represents `...' in slices."In [18]: sys.modules['sys']Out[18]: 
   
    In [19]: sys.modules['__main__']Out[19]: 
    
   
  

Now, the basic work is ready, and you can run the python program. There are two methods: one is the interaction under the command line, the other ispython xxx.py. Before you describe these two methods, you must first introduce several structures related to the python program running.

1.3 Python running-related data structures

Python runtime data structures include PyCodeObject, PyFrameObject, and PyFunctionObject. PyCodeObject is the storage structure of python bytecode. The compiled pyc file is serialized and stored as a PyCodeObject object. It is loaded and deserialized as a PyCodeObject object during runtime. PyFrameObject is a simulation of stack frames. When a new function is introduced, a PyFrameObject object is used to simulate stack frame operations. PyFunctionObject is a function object. A function corresponds to a PyCodeObject.def test():The PyFunctionObject object is created during the statement. We can think that PyCodeObject is a static structure. If the python source file is determined, the compiled PyCodeObject object will remain unchanged. PyFrameObject and PyFunctionObject are dynamic structures, the content will change dynamically at runtime.

PyCodeObject object

Python program files must be compiled into PyCodeObject objects before execution. Each CodeBlock is a PyCodeObject object. In Python, classes, functions, and modules are a Code Block, that is to say, there is a separate PyCodeObject object after compilation. Therefore, after a python file is compiled, there may be multiple PyCodeObject objects. For example, after the following example program is compiled, there will be two PyCodeObject objects, A corresponding test. py, a corresponding function test. For more information about PyCodeObject object parsing, see my previous article Python pyc Format Parsing.

# Sample code test. pydef test (): print "hello world" if _ name _ = "_ main _": test ()
PyFrameObject

Python program bytecode commands and some static information such as constants are stored in PyCodeObject. Obviously, it is impossible to operate only PyCodeObject objects during runtime, because a lot of content is dynamically changed during runtime, for example, in the following code test2.py, although the bytecode commands at 1 and 2 are the same, the execution results are obviously different. Such information cannot be stored in PyCodeObject, these information needs to be obtained through PyFrameObject, that is, the stack frame object. The PyFrameObject object has three fields: locals, globals, and builtins, which correspond to the local, global, and builtin namespaces. This is what we call the LGB rule. The closure is the LEGB rule. The file corresponding to a module defines a global scope, and a function defines a local scope. python itself defines a top-level scope builtin scope. These three scopes correspond to three fields of the PyFrameObject object respectively, in this way, you can find the corresponding name reference. For example, I at 1 in test2.py references the local variable I of the test function, which corresponds to the string "hello world ", I references the module's local scope name I, corresponding to an integer of 123 (note that the local scope of the module is the same as the global scope ).Note that accessing local variables in a function does not need to access the locals namespace, because the local variables of the function are always unchanged, the memory location used by local variables can be determined during compilation.

# Sample code test2.pyi = 123 def test (): I = 'Hello world' print I #1 test () print I #2
PyFunctionObject object

PyFunctionObject is a function object that is built in the MAKE_FUNCTION command for creating a function. In PyFunctionObject, A func_code field points to the PyCodeObject object corresponding to the function, and func_globals points to the global namespace. It is noted that the local namespace is not used here. When a function is called, a new stack frame object PyFrameObject is created to execute the function. The function call relationship is associated by the f_back field in the stack frame object PyFrameObject. When the final function call is executed, the effect of the PyFunctionObject object has disappeared. What actually works is the PyCodeObject object of PyFunctionObject and the global namespace, this is because these two parameters are passed to the PyFrameObject object when the function stack frame is created.

1.4 Python program running process analysis

After talking about a few basic objects, I will go back to the previous topic and start to prepare and execute the python program. Interactive and directpython xxx.pyAlthough there are some differences, the ultimate point is to start the virtual machine to execute python bytecode. Here we usepython xxx.pyFor example, before running a python program, you need to compile the source file into bytecode to create a PyCodeObject object. This is implemented through the PyAST_Compile function. For the specific compilation process, please refer to the compilation principle longshu. It is now a black box, because it is only possible to compile this part, I cannot make it clear at half past one (well, I didn't learn the compilation principles well ). Compile and obtain the PyCodeObject object, and then callPyEval_EvalCode(co, globals, locals)The function creates a PyFrameObject object and executes the bytecode. Note that the co in the parameter is a PyCodeObject object. Because the stack frame object created when PyEval_EvalCode is the first PyFrameObject object created by Python, f_back is NULL, and its globals and locals are__main__The dictionary object of the module. If we import a module instead of running it directly, the PyCodeObject object obtained after python source code compilation will be saved to the pyc file, if this module has not been changed during the next module loading, you can directly read the content from the pyc file without re-compiling.

The execution of bytecode is similar to the execution of instructions by the CPU. It first points to the co_code field of the PyCodeObject corresponding to the f_code field of PyFrameObject, which is the location of the bytecode storage, then extract the First Command, and then the second command... Execute all commands in sequence. In python, the instruction length is 1 byte or 3 byte. The instruction length without parameters is 1 byte, the instruction length with parameters is 3 bytes (Instruction 1 byte + parameter 2 byte ).

The process, thread, and stack frame object of the python virtual machine are shown in:

2. Python program running example

Programmers often start learning a new language from "hello world" and say hello to the world as soon as they begin to face the unknown world of programming languages. I started learning python from here, but I didn't go into the principle of its execution in the past. This time, I cannot escape. Let's take a look at the following chestnuts.

# Sample code test3.pyi = 1 s = 'Hello world' def test (): k = 5 print k print sif _ name _ = "_ main __": test ()

There are not many codes in this example, but it also involves all aspects of the python operating principle (except for the class mechanism, the class mechanism is not clear yet, so ignore it first ). According to the previous section, executepython test3.pyFirst initialize the python process and thread, then initialize the system module and type system, and then run the python program test3.py. Every time you run a python program, you enable a python Virtual Machine. Because it runs directly, you must first compile it into a bytecode format to obtain the PyCodeObject object, and then run the first command of the bytecode object. Because it runs directly, PyCodeObject is not serialized to the pyc file and saved. Next, let's take a look at PyCodeObject in test3.py, and use the python dis module to see the bytecode command.

In [1]: source = open ('test3. py '). read () In [2]: co = compile (source, 'test3. py ', 'exec') In [3]: co. co_constsOut [3]: (1, 'Hello world ',, '_ Main _', None) In [4]: co. co_namesOut [4]: ('I', 's', 'test', '_ name _') In [5]: dis. dis (co) # the module's own bytecode. The integers and strings mentioned below all refer to python objects, such as py1_bject and PyStringObject. 1 0 LOAD_CONST 0 (1) # Load The 0th constants in the constant table, that is, the integer 1, to the stack. 3 STORE_NAME 0 (I) # Get the variable name I, output the integer 1 that the stack just loaded, and store the variable name and integer 1 to f-> f_locals, this field corresponds to the local namespace used to find the name. 2 6 LOAD_CONST 1 ('Hello World') 9 STORE_NAME 1 (s) # Similarly, get the variable name s and output the string hello world, it also stores the correspondence between the variable name and the string hello world and the local namespace. 4 12 LOAD_CONST 2 () 15 MAKE_FUNCTION 0 # The PyCodeObject object of the function test that has just been written into the stack. Use the f_globals parameter of the code object and PyFrameObject to create the function object PyFunctionObject and merge it into the stack 18 STORE_NAME 2 (test) # obtain the variable test, and output the PyFunctionObject object that has just entered the stack, and store it in the local namespace. 9 21 LOAD_NAME 3 (_ name _) # LOAD_NAME searches the local, global, and builtin namespaces in sequence, of course, we can find _ name _ in the local namespace __. 24 LOAD_CONST 3 ('_ main _') 27 COMPARE_OP 2 (=) # Compare command 30 JUMP_IF_FALSE 11 (to 44) # if they are not equal, Jump directly to the corresponding instruction of 44, that is, the following POP_TOP. In the COMPARE_OP command, the result of stack comparison is set as the result of stack comparison. Of course, here we are equal, so the next execution of the 33 command is also POP_TOP. 33 POP_TOP 10 34 LOAD_NAME 2 (test) # load function object 37 CALL_FUNCTION 0 # Call function 40 POP_TOP # output stack function return value 41 JUMP_FORWARD 1 (to 45) # Step 1, note that the next command address is + 1, that is, 44 + 1 = 45> 44 POP_TOP> 45 LOAD_CONST 4 (None) 48 RETURN_VALUE # Return NoneIn [6]: dis. dis (co. co_consts [2]) # Check the test function's bytecode 5 0 LOAD_CONST 1 (5) 3 STORE_FAST 0 (k) # STORE_FAST is different from STORE_NAME, it is stored in f_localsplus of PyFrameObject, not a local namespace. 6 6 LOAD_FAST 0 (k) # correspondingly, LOAD_FAST is from f_localsplus value 9 PRINT_ITEM 10 PRINT_NEWLINE # print the output 7 11 LOAD_GLOBAL 0 (s) # because the function does not use the local namespace, therefore, this is not LOAD_NAME, but LOAD_GLOBAL. Do not be confused by the name. It will actually search for the global and builtin namespaces in turn. 14 PRINT_ITEM 15 PRINT_NEWLINE 16 LOAD_CONST 0 (None) 19 RETURN_VALUE

According to our previous analysis, after the file test3.py is compiled, it actually corresponds to two pycodeobjects. One is the PyCodeObject of the entire module test3.py, and the other is the PyCodeObject corresponding to the function test. According to the PyCodeObject structure, we can know that there are five co_consts constants in test3.py bytecode, which are integers 1 and strings 'Hello world'. The PyCodeObject object and string corresponding to the test function__main__And the module return value None object. Well, we can see from here that the module actually has a return value. We can also use the dis module to view the test bytecode.

The Code parses bytecode commands. Note that the local variables in the function, for example, k, use LOAD_FAST, that is, they are directly retrieved from the f_localsplus field of PyFrameObject, rather than from the local, global, and builtin fields in the same way as LOAD_NAME, this is determined by the features of the function. The runtime stack of the function is also located in the memory corresponding to f_localsplus, but the previous part is used to store function parameters and local variables, and the later part is used by the runtime stack, in this way, the logical runtime stack is separated from the function parameters and local variables, although physically they are connected together. Note that the prediction command mechanism is used in python. For example, COMPARE_OP is often paired with JUMP_IF_FALSE or JUMP_IF_TRUE. Therefore, if the next command of COMPARE_OP is exactly JUNP_IF_FALSE, you can directly jump to the corresponding code for execution to improve efficiency.

In addition, we also need to know that when we run test3.py, The f_locals and f_globals values in the test3.py stack frame object of the module are the same, both__main__Module dictionary. Add the following code to the code of test3.py to verify this conjecture.

... # Test3.py code if _ name _ = "_ main _": test () print locals () = sys. modules ['_ main _']. _ dict _ # True print globals () = sys. modules ['_ main _']. _ dict _ # True print globals () = locals () # True

Because of this, the sequence of function definitions in python is irrelevant. You do not need to declare a function before calling a function as in C. For example, the following test4.py code is completely normal, and the function definition sequence does not affect function calling, because when executing the def statement, the MAKE_FUNCTION command is executed to add the function object to the local namespace, the local and global correspond to the same dictionary at this time, so it is also equivalent to adding the global namespace, so that function f can be found when running function g. You can also note that the function declaration and implementation are actually separated. The declared bytecode command is executed in the module's PyCodeObject, the implemented bytecode commands are in the function's own PyCodeObject.

#test4.pydef g(): print 'function g' f() def f(): print 'function f'g()~ 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.