Python source code profiling note 3-python execution principle

Source: Internet
Author: User
Tags builtin

Python source code profiling note 3-python execution principle

This book address: http://www.jianshu.com/p/03af86845c95

Before writing a few source code analysis notes, but slowly feel not from a macro point of view to understand the Python implementation principle, from the bottom upward analysis is too easy to let people doubt, rather than first from the macro on the implementation of the principle of Python has a basic understanding, and then slowly explore the details, this may be much better. This is also recently so long did not update the notes, has been looking at the source code analysis books and source code, I hope to be able to clear from a macro-level Python implementation principles. People say reading from thin read thick, and then from thick read thin side is to understand the true meaning, hope to reach this situation, add an oil.

1 Python Runtime Environment initialization

Before looking at how to do this, it is necessary to briefly explain the runtime environment initialization of Python. In Python, there is an interpreter state object pyinterpreterstate used to simulate a process (referred to as a Process object), and a thread state object pythreadstate an impersonation thread (hereafter referred to as a thread object). The pyinterpreterstate structure in Python is connected by a chain table to simulate the operating system multi-process. There is a pointer to the thread collection in the Process object, and the thread object has a pointer to its corresponding process object, so that the thread and process are associated. Of course, there is also a currently running thread object _pythreadstate_current used to maintain the currently running thread.

1.1 Process Thread Initialization

Python calls the Pyinitialize () function to complete the run environment initialization. In the initialization function, the process object Interp and the thread object are created and associated with the process object and the thread object, and the currently running thread object is set to the thread object that was just created. Next is the type system initialization, including the type initialization of int,str,bool,list, which is left behind and then analyzed slowly. Then, is another big head, that is the system module initialization. There is a modules variable in the Process object interp that is used to maintain all the module objects, modules variables are dictionary objects, which maintain (name, module) correspondence and correspond to sys.modules in Python.

1.2 Module initialization

The initialization process of the system module initializes the __builtin__, sys, __main__, site module. In Python, the module object is present in the pymoduleobject structure, except for the generic object header, where there is only one dictionary field md_dict. The contents of the Md_dict field stored in the module object are familiar to us, such as __name__, __doc__ attributes, and methods in the module.

In __builtin__ module initialization, the contents stored in Md_dict include built-in functions and system-type objects such as functions such as len,dir,getattr and Int,str,list-type objects. Because of this, we can use the Len function directly in the code because, according to the LEGB rule, we can __builtin__ find the Len symbol in the module. Almost the same process creates sys modules and __main__ modules. When creation is complete, the process object is interp->builtins set to __builtin__ the module's Md_dict field, which is the dictionary field in the module object. The interp->sysdict md_dict field is set to the SYS module.

SYS module is initialized, including the previously mentioned modules and Path,version,stdin,stdout,maxint properties, such as Exit,getrefcount,_getframe functions. Note that this is the basic Sys.path (the Lib path of the Python installation directory, etc.), and the path of the third-party module is added when the site module is initialized.

It is necessary to note that the module __main__ is a special module, when we write the first Python program, in which the __name__ == "__main__" __main__ reference is the module name. When we python xxx.py run the Python program, the source file can be considered as a module named, __main__ and if it is imported through other modules, then its name is the source file itself name, as for why, this is run in the example of a Python program is described in detail. One thing to note is that when you create a __main__ module, the corresponding relationship is inserted in the dictionary of the module ("__builtins__", __builtin__ module) . It is particularly important to see this module later, because the F_buitins field of the stack frame object Pyframeobject at run time is set to the __builtin__ module, and the locals and Globals fields of the stack frame object are initially set to __main__ the dictionary of the module.

In addition, the site module initialization is mainly used to initialize the Python third-party module search path, we often use the Sys.path is the module set. Not only does it add the site-packages path to the Sys.path, it also adds all the paths in the. pth file below the Site-packages directory to the Sys.path.

Here are some of the verification code, you can see Sys.modules in the __builtin__, sys, __main__ module. In addition, the type objects of the system are already __builtin__ in the module dictionary.

inch[ -]: Import sysinch[ -]: sys.modules[' __builtin__ '].__dict__[' int '] out[ -]: intinch[ the]: sys.modules[' __builtin__ '].__dict__[' Len '] out[ the]: <function len>inch[ -]: sys.modules[' __builtin__ '].__dict__[' __name__ '] out[ -]:' __builtin__ 'inch[ -]: sys.modules[' __builtin__ '].__dict__[' __doc__ '] out[ -]:"Built-in functions, exceptions, and other objects.\n\nnoteworthy:none are the ' nil ' object; Ellipsis represents ' ... ' in slices. "inch[ -]: sys.modules[' sys '] out[ -]: <module' sys '(built-in) >inch[ +]: sys.modules[' __main__ '] out[ +]: <module' __main__ '(built-in) >

OK, the basic work is ready, and then you can run the Python program. There are two ways, one is to interact under the command line, and the other is to python xxx.py run in the same way. Before you can explain these two ways, you need to introduce several structures related to the operation of the Python program.

1.3 Python run-related data structure

Python runs related data structures mainly by Pycodeobject,pyframeobject and Pyfunctionobject. Where Pycodeobject is the storage structure of the Python bytecode, the compiled PYC file is stored after the pycodeobject structure is serialized, and the runtime loads and deserializes the Pycodeobject object. Pyframeobject is the simulation of the stack frame, when entering a new function, there will be pyframeobject objects used to simulate the stack frame operation. Pyfunctionobject is a function object, and a function corresponds to a pycodeobject that def test(): creates a Pyfunctionobject object when the statement is executed. It can be thought that pycodeobject is a static structure, the Python source file is determined, then the compiled Pycodeobject object is unchanged, and pyframeobject and pyfunctionobject are dynamic structures, The content will change dynamically at run time.

Pycodeobject Object

Python Program files need to be compiled into Pycodeobject objects before execution, each codeblock is a Pycodeobject object, in Python, classes, functions, modules are a code block, That is, after compiling, there is a separate Pycodeobject object, so a python file may have more than one Pycodeobject object after compiling, such as the following example program compiles 2 Pycodeobject objects. A corresponding test.py entire file, a corresponding function test. For Pycodeobject object parsing, see my previous article Python PYC format parsing, here is not to repeat.

#示例代码test.pydef test():    print"hello world"if"__main__":    test()
Pyframeobject Object

The bytecode directives of the

Python program and some static information such as constants are stored in Pycodeobject, and it is obviously not possible to operate the Pycodeobject object at runtime, because there are many things that change dynamically at runtime, such as the following code test2.py, Although the bytecode instructions at 1 and 2 are the same, the results of their execution are obviously different, and the information is obviously not stored in pycodeobject, which needs to be obtained through Pyframeobject, the stack frame object. There are locals,globals,builtins three fields in the Pyframeobject object that correspond to the Local,global,builtin three namespaces, that is, the LGB rules that we often call, and of course the closure, which is the LEGB rule. A module corresponds to a file that defines a global scope, a function that defines a local scope, and Python itself defines a top-level scope builtin scope, which corresponds to three fields of the Pyframeobject object, respectively, for each of the three scopes. This will allow you to find the corresponding name reference. For example, 1 in test2.py I refers to the function test local variable i, the corresponding content is the string "Hello World", and 2 I refers to the module's local scope name I, the corresponding content is the integer 123 ( Note that the local scope of the module is the same as the global scope). It should be noted that access to local variables in the function does not require access to the locals namespace, because the local variables of the function are always constant and can be compiled to determine the memory location used by the local variables.

#示例代码test2.py123                                                                                                                                                       def test():  ‘hello world‘  print#1test()print#2
Pyfunctionobject Object

Pyfunctionobject is a function object that is built in the instruction make_function that creates the function. There is a func_code field in Pyfunctionobject that points to the Pycodeobject object that corresponds to the function, and func_globals points to the global namespace, noting that the local namespace is not used here. When the function is called, a new stack frame object Pyframeobject is created to execute the function, and the function call relationship is associated with the F_back field in the Stack frame object pyframeobject. When the function call is finally executed, the effect of the Pyfunctionobject object has disappeared, and what really works is the Pyfunctionobject Pycodeobject object and the global namespace, Because these two parameters are passed to the Pyframeobject object when the function stack frame is created.

Brief analysis of 1.4 Python program running process

After saying a few basic objects, now go back to the previous topic and start preparing to execute the Python program. The two approaches are interactive and straightforward, python xxx.py but ultimately, one place is to start the virtual machine to execute the Python bytecode. Here, python xxx.py for example, before running a python program, you need to compile the source file into bytecode and create a Pycodeobject object. This is achieved through the Pyast_compile function, as to the specific compilation process, it is necessary to see the "compilation Principle" that Dragon book, here temporarily as a black box good, because the single compilation this part, 1:30 will also say not clear (well, actually I also did not learn the principle of compiling). The Pycodeobject object is compiled, and then PyEval_EvalCode(co, globals, locals) the function is called to create the Pyframeobject object and execute the byte code. Notice that the co inside the parameter is the Pycodeobject object, and F_back is null because the stack frame object created when the Pyeval_evalcode is run is the first Pyframeobject object created by Python. And its globals and locals are the __main__ Dictionary objects of the module. If we do not run directly, but instead of importing a module, we will also save the Python source code compiled by the Pycodeobject object to the PYc file, the next time the module is loaded, if the module is not changed, you can read the content directly from the PYc file without having to compile again.

The process of executing the bytecode is the same as the process of simulating the CPU executing instructions, pointing to the Co_code field of the Pycodeobject object corresponding to the F_code field of the Pyframeobject, which is where the bytecode is stored, then the first instruction is taken, and then the second instruction ... Execute all instructions in turn. The instruction length in Python is 1 bytes or 3 bytes, where no parameter instruction length is 1 bytes, the instruction length with parameters is 3 bytes (instruction 1 bytes + parameter 2 bytes).

The Python virtual machine processes, threads, stack frame objects, and the like are shown in the following relationships:

2 Python program Run instance description

Program apes learn a new language is often started from Hello World, one to say hello, because the next is to face the world of programming language unknown. I learned that Python is also starting from here, but did not go into the principle of its implementation, this time is not run past. Look at the chestnuts below.

#示例代码test3.py1‘hello world‘def test():    5    print k    print sif"__main__":    test()

This example code is not many, but also involves all aspects of Python operation principle (in addition to the class mechanism that piece, the class mechanism that piece is not clear, first ignore). So in the previous section, when executing python test3.py , the Python process and thread are initialized, then the system module and the type system are initialized, and then the Python program test3.py is run. Each time you run a Python program that opens a Python virtual machine, because it is run directly, it needs to be compiled into bytecode format, get the Pycodeobject object, and execute from the first instruction of the bytecode object. Because it is run directly, Pycodeobject is not serialized to PYc file saved. Below you can see the pycodeobject of test3.py, and use the Python dis module to see the bytecode instructions.

in [1]: Source =Open(' test3.py ').Read() in [2]: CO = compile (source,' test3.py ',' EXEC ') in [3]: co.co_constsout[3]: (1,' Hello World ', <code Object Test at 0x1108eaaf8,file "run.py", Line 4,' __main__ ', None) in [4]: co.co_namesout[4]: (' I ',' s ',' Test ',' __name__ ') in [5]: Dis.dis (CO)# #模块本身的字节码, the following integers, strings, etc. refer to objects in Python, corresponding to Pyintobject,pystringobject, and so on.   1           0Load_const0(1)# The No. 0 constant in the load constant table is an integer 1 to the stack.               3Store_name0(i)# Gets the variable name I, out of the stack just loaded the integer 1, and then stores the variable name and integer 1 to f->f_locals, which corresponds to the local namespace when looking for the name.   2           6Load_const1(' Hello World ')9Store_name1(s)#同理, get the variable name s, out of the stack just loaded the string Hello World, and store the variable name and string Hello World correspondence to the local namespace.   4           ALoad_const2(<code Object Test at 0XB744BD10,file "test3.py", Line 4>) theMake_function0   #出栈刚刚入栈的函数test的PyCodeObject对象, create a function object with the f_globals of code object and pyframeobject pyfunctionobject merge stack              -Store_name2(test)The Pyfunctionobject object is #获取变量test, and is stored in the local namespace, just into the stack.   9           +Load_name3(__name__)# #LOAD_NAME会先依次搜索local, Global,builtin namespace, of course we are here in the local namespace to find __name__.               -Load_const3(' __main__ ') -Compare_op2(==)# #比较指令              -Jump_if_false One( to  -)# #如果不相等则直接跳转到44对应的指令处, the pop_top below. Because in the compare_op instruction, the stack top is set to the result of the comparison, so the stack is required to compare the results. Of course we are equal here, so we go down to the 33 command, and it's pop_top.               -Pop_topTen           theLoad_name2(test)# #加载函数对象             Panax NotoginsengCall_function0  # #调用函数              +Pop_top# #出栈函数返回值              AJump_forward1( to  $)# #前进1步, note that the next instruction is address +1, which is 44+1=45>> -Pop_top >> $Load_const4(None) -Return_value#返回Nonein [6]: Dis.dis (co.co_consts[2])# #查看函数test的字节码  5           0Load_const1(5)3Store_fast0(k)#STORE_FAST与STORE_NAME不同, it is stored in the f_localsplus of Pyframeobject, not the local namespace.   6           6Load_fast0(k)#相对应的, Load_fast is a value from F_localsplus              9Print_itemTenPrint_newline#打印输出  7           OneLoad_global0(s)#因为函数没有使用local名字空间, so, here is not load_name, but Load_global, do not be confused by the name, it will actually search Global,builtin namespaces in turn.               -Print_item thePrint_newline -Load_const0(None) +Return_value

According to our previous analysis, test3.py this file after compiling actually corresponds to 2 pycodeobject, one is itself test3.py this module whole pycodeobject, the other one is the function test corresponding pycodeobject. According to the structure of pycodeobject, we can know that there are 5 constants in test3.py bytecode, the integer 1, the string ' Hello World ', the Pycodeobject object corresponding to the function test, the string __main__ , And the module returns the value None object. Well, from here we can see that the module also has a return value. We can also use the DIS module to view the byte code of the function test.

With regard to bytecode directives, parsing is done in the code. It is important to note that the values of local variables such as k in the function are Load_fast, which is taken directly from the F_localsplus field of the Pyframeobject, rather than load_name, from Local,global and builtin in turn, This is determined by the nature of the function. The run-time stack of a function is also in the same memory as the F_localsplus, except that the previous part is used to store function parameters and local variables, while the latter part is used by the runtime stack, so that the logical run-time stack and function parameters and local variables are separated, although physically they are linked together. It is important to note that the predictive instruction mechanism used in Python, such as Compare_op, often appears in pairs with jump_if_false or jump_if_true, so if the next instruction of Compare_op is exactly junp_if_false, You can directly jump to the corresponding code to perform, improve a certain efficiency.

Also, be aware that when running test3.py, the values of F_locals and f_globals in the test3.py stack frame object of the module are the same, and are the __main__ Dictionary of the modules. This conjecture can be verified by adding the following code behind the test3.py code.

...#test3.py的代码if"__main__":    test()    print locals() == sys.modules[‘__main__‘# True    print globals() == sys.modules[‘__main__‘# True    # True

Because of this, the order of function definitions in Python is irrelevant, and it is not necessary to declare the function before calling the function in the C language. For example, the following test4.py is completely normal code, the function definition order does not affect the function call, because when the DEF statement is executed, the make_function instruction will be executed to add the function object to the local namespace, and local and global at this time corresponding to the same dictionary, So it is also equivalent to adding the global namespace, so that the function g can be found when running function f. It can also be noted that the function declaration and implementation are actually separate, the declared bytecode instruction is executed in the Pycodeobject of the module, and the implemented bytecode instruction is in the function's own pycodeobject.

#test4.pydef g():                                                                                                                                                       print‘function g‘  def f():  print‘function f‘g()~      

Python source code profiling note 3-python execution principle

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.