Python's operating mechanism-opcode (opcode) parsing

Source: Internet
Author: User

The previous article, "Python operation mechanism--PYC file analysis" has already made a preliminary understanding of Python's operating unit pycodeobject structure, but to really understand the Python operating mechanism,
But also by analyzing python's opcode. The interpretation of opcode is performed through the following functions in CEVAL.C:


Pyobject *pyeval_evalframeex (pyframeobject *f, int throwflag)


To achieve, so this function is the focus of research opcode, for opcode have what do not understand where can be through the function of the corresponding opcode process to get answers.


Before you start analyzing opcode, it is still necessary to understand a pyframeobject this data structure:

typedef struct _FRAME {pyobject_var_head struct _frame *f_back;/* Previous frame, or NULL */Pycodeobject *f_cod e;/* Code Segment */pyobject *f_builtins;/* builtin symbol table (pydictobject) */pyobject *f_globals;/* Global sy Mbol table (pydictobject) */pyobject *f_locals;/* local symbol table (any mapping) */Pyobject **f_valuestack;/* PO  INTs after the last local */* * Next free slots in F_valuestack.       Frame creation sets to F_valuestack. Frame evaluation usually NULLs it, but a frame this yields sets it to the current stack top.    */Pyobject **f_stacktop; Pyobject *f_trace;/* Trace Function */* If An exception was raised in this frame, the next three was used to * rec  Ord the exception info (if any) originally in the thread state.     See * Comments before Set_exc_info ()--it's not obvious.     * Invariant:if _type is NULL, then so is _value and _traceback. * Desired Invariant:all Three is NULL, or all three is non-null.  that * one isn ' t currently true, but "should is".    */Pyobject *f_exc_type, *f_exc_value, *f_exc_traceback;    Pythreadstate *f_tstate; int f_lasti;/* last instruction if called */* call Pyframe_getlinenumber () instead of reading this field direct  Ly.  As of 2.3 F_lineno is only valid when tracing are active (i.e. when F_trace is set). At and times we use Pycode_addr2line to calculate the line from the current bytecode index. */int f_lineno;/* Current line number */int f_iblock;/* index in F_blockstack */Pytryblock F_blockstack[co_max BLOCKS]; /* for try and loop blocks */Pyobject *f_localsplus[1];/* locals+stack, dynamically sized */} Pyframeobject;




Here we focus on the following members:

    Pyobject **f_stacktop;    Pycodeobject *f_code;/* Code Segment *    /pyobject *f_globals;/* Global symbol table (pydictobject) */    Pyobject *f_ locals;/* Local symbol table (any mapping) */



The visible pyframeobject contains a pycodeobject struct pointer f_code,
The two Dict objects of F_globals and f_locals in Pyframeobject are used to hold global objects and local objects, respectively.
It can be said that Pyframeobject structure is the pycodeobject of the operating environment, pycodeobject structure is static, after the creation of the general will not change, and pyframeobject is dynamic,
After creation, the two dict of F_globals and f_locals are often changed, and a pycodeobject may correspond to several pyframeobject.


OK, let's start with the analysis of opcode, or the example of the previous file, in this example, through the showfile.py script, we can see that the entire PYC file has a total of 5 pycodeobject structures stored:
The outermost layer is the pycodeobject of the module test, where the code is executed when import or run directly.
The pycodeobject structure of the Add function and the pycodeobject structure of the world class are nested in the pycodeobject of the module.
In the pycodeobject structure of the world class, the pycodeobject structure of the __init__ method and the SayHello method is also nested.
Let's start by analyzing the pycodeobject structure of the test module:
  <code>   <argcount> 0 </argcount>   <nlocals> 0</nlocals>   <stacksize> 3</stacksize>   <flags> 0040</flags>   <code>      6400006401006c00005a00006501005a02006402008400005a0300640300      640500640400840000830000595a04006504008300005a05006505006a06      008300000164010053   </code>   <dis>,   ..... </dis>   <names> (' dis ', ' True ', ' myglobal ', ' Add ', ' World ', ' w ', ' SayHello ') </names>   < Varnames> () </varnames>   <freevars> () </freevars>   <cellvars> () </cellvars >   <filename> ' test.py ' </filename>   <name> ' <module> ' </name>   < firstlineno> 1</firstlineno>   <consts>      -1      none...</consts></code>



Again to parse the resulting opcode:
In the following note, the object that locals is the Pyframeobject.f_locals,names code is pycodeobject.co_names

  1 0 load_const 0 ( -1) #加载consts数组中索引为0处的值, here is the value-1 3 load_const 1 (Non e) #加载consts数组中索引为1处的值, here is None 6 Import_name 0 (DIS) #加载dis模块: names[0] is "dis" 9 STO                Re_name 0 (dis) #将模块保存到一个dict中, this dict is specifically used to save the value of local variables, key is names[0], that is, "dis" 2 load_name             1 (true) #将names [1], which is true stack. Store_name 2 (Myglobal) #将栈顶的元素, which is true saved to locals[' Myglobal '], names[2] is the string ' Myglobal ' 4 Load_c             Onst 2 (<code object add at 024e3b60, file "test.py", line 4>) #将consts [2], which is the pycodeobject stack of the Add function.             Make_function 0 #通过add函数的PyCodeObject创建一个函数 and pressed into the top of the stack. Store_name 3 (add) #将创建的函数出栈并保存到locals [' Add '] 9 load_const 3 (' World ') #const  S[3], that is "world" into the stack of Load_const 5 (()) #consts [5], that is, empty array into the stack Load_const 4 (<code ObjectWorld at 024E3650, the file "test.py", line 9>) #将consts [4], that is, the world's pycodeobject into the stack. Make_function 0 #创建函数 call_function 0 #调用刚创建的函数, used to initialize the class, will return a dict to the top of the stack, the Dict method of preserving the class         And some global variables, #具体的实现要看world类的PyCodeObject中的opcode build_class #创建类, note that Build_class will use three objects in the stack, which are: the information of the class stored in Dict, the base class array, here is an empty array (), the name of the class: "World" store_name 4 (wo             RLD) #将类保存到locals [' World '] load_name 4 (world) Call_function 0  Store_name 5 (W) #以上三行代码创建一个world对象 Load_name 5 (W) 58 Load_attr 6 (SayHello) call_function 0# above three lines code call W.sayhello () pop_t         OP load_const 1 (None) return_value



The above is a simple comment on the opcode in the disassembly, here only explains the more common opcode, in Python 2.7, a total of 147 opcode, defined in opcode.h,
You can study Pyeval_evalframeex's code to understand them.
Through the analysis of Python's opcode, I believe that Python should have a deeper understanding of the operation, of course, there are still more questions to be answered, such as how Pyframeobject was created,
When it's created, how it's maintained, and so on.

Python's operating mechanism-opcode (opcode) parsing

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.