Python code object and PYc file (iii)

Source: Internet
Author: User

Previous section: Python code object and PYc file (ii)

Writing a string to PYc

Before we understand how Python writes strings to the PYc file, let's look at the struct Wfile:

Marshal.c

typedef struct {FILE *fp;int error;int depth;/* If fp = = NULL, the following is valid: */pyobject *str;char *ptr;char *en D Pyobject *strings; /* Dict on Marshal, list on Unmarshal */int version;} Wfile;

  

Wfile can be seen as a simple wrapper over file *, but in wfile there is a peculiar strings domain, which is the key to Python writing a string to or reading a string from a PYc file, and when writing to PYc, Strings points to a Pydictobject object, and when read from PYC, strings points to a Pylistobject object

We go back to the Pymarshal_writeobjecttofile function again.

Marshal.c

void Pymarshal_writeobjecttofile (Pyobject *x, FILE *fp, int version) {    wfile wf;    WF.FP = FP;    Wf.error = 0;    wf.depth = 0;    wf.strings = (Version > 0)? Pydict_new (): NULL;    Wf.version = version;    W_object (x, &WF);    Py_xdecref (wf.strings);}

  

You can see that Wfile's strings has pointed to a Pydictobject object before it begins writing to the object.

Marshal.c

else if (Pystring_check (v)) {if (p->strings && pystring_check_interned (v)) {//<1> Gets the ordinal number of the Pystringobject object in strings pyobject *o = Pydict_getitem (p->strings, v);//<2>intern string non-first write if (o) {long w = Pyint_aslong (o); W_byte (Type_stringref, p); W_long (w, p); goto exit;} <3>intern the first write of the string, else {int ok;o = pyint_fromssize_t (Pydict_size (p->strings)); OK = o && pydict_ SetItem (p->strings, V, O) >= 0; Py_xdecref (o); if (!ok) {p->depth--;p->error = 1;return;} W_byte (type_interned, p);}} <4> write to normal stringelse {w_byte (type_string, p);} n = pystring_get_size (v), if (n > Int_max) {/* Huge strings is not supported */p->depth--;p->error = 1;return;} <5> writes the length of the string W_long ((long) n, p); w_string (Pystring_as_string (v), (int) n, p);}

  

When writing a string to PYc, there are 3 possible scenarios:

Writes an ordinary string, writes the type of the string to identify type_string, then calls W_long to write the string length, and finally writes the string itself through w_string, all of which are done here in <4> and <5>. In addition to the normal string, Python also encounters a string that needs to be intern at a later time to load the PYc file. For this type of string, it is divided into first write and non-first write.

Here is a brief introduction to the intern mechanism, Python has a string buffer pool, when you want to generate a string, Python will check whether the buffer pool has a ready-made string, if there is a return, if not the string is stored in the buffer pool, But Python does not buffer all strings (that is, the intern mechanism).

We declare a and B two variables and give the same string, and then look at their addresses, their addresses are the same, which is precisely because the intern mechanism works

>>> a = "HelloWorld" >>> B = "HelloWorld" >>> ID (a) 139894353361584>>> ID (b) 139894353361584

  

Let's look at another example:

>>> c = "Hello world" >>> d = "Hello World" >>> ID (c) 139894353361536>>> ID (d) 139894353361728

  

As above, C and D to a and B, is nothing more than a space, but we found that the address of C and D is obviously different, why this difference? The reason is because the intern mechanism only handles simple characters by default, and a simple character is a string consisting of "0123456789abcdefghijklmnopqrstuvwxyz_abcdefghijklmnopqrstuvwxyz"

In addition, for the computed string, do not do intern

>>> LST = ["A", "B", "C"]>>> e = "" ". Join (LST) >>> f =" ". Join (LST) >>> ID (e) 139894353351 824>>> ID (f) 139894353353104

  

Now that we know what the intern mechanism is, go back to the W_object writing string in this method, if the string is a string that can be processed by the intern mechanism, then it is divided into first-write and non-first-write.

Previously said, in the writing, wfile in the strings point to a Pydictobject object, the object, actually maintains the (pystringobject,pyintobject) Such a mapping relationship, key is a string, The value is that the string is the first string that is added to wfile.strings, or, more specifically, the intern string that is written to the PYc file.

Why does Python need the value of this Pyintobject object? Suppose we want to write 3 strings to the PYc file: "Hello", "World", "hello", if we have no intern mechanism, nor strings this Pydictobject object, we just bury the string in the PYc file, So what does the PYc file look like? As follows:

PYC file
(Type_string,5,hello)
(Type_string,5,world)
(Type_string,5,hello)

The PYc file above stores 3 values, the first element in this value is the type, the string type is type_string, the second element is the length of the string, the third is the value of the string itself, we will find that lines 2nd and 4th repeat, if the source code has a large number of duplicate strings, This would undoubtedly result in a large amount of redundant information throughout the PYc file. Python, as an elegant language, obviously does not allow such operations to exist, so the intern mechanism and the pydictobject that strings point to come in handy.

Now that we have the intern mechanism and the Pydictobject object pointed to by strings, we still write the 3 values of "Hello", "World" and "Hello" to the PYc file, since these 3 strings are capable of enabling the intern mechanism, So here we also look at the results of strings this Pydictobject object:

String
Hello 0
World 1

As we said before, the strings will store the string that writes the PYc file, so what is the contents of the PYc file store, so strings stores the contents as above?

PYC file
(Type_string,5,hello)
(Type_string,5,world)
(type_stringref,0)

Now the content of this PYC file, compared to the previous PYC content is a bit different, is the third row of the content, we look at the new PYc file third line of content: (type_stringref,0), where the type stored here is no longer type_string, Also no longer stores the length of the string and the string itself, type_stringref this type represents when parsing a pyc file, the corresponding value is going to strings to find, and its index value is 0. Wait a minute, seems a little wrong? In our introduction above, strings this Pydictobject object has ever had 0 this index? Don't rush, and look behind me slowly.

As we all know, the pydictobject is stored in the form (string, write the PYc file) in such a way, when loading the PYc file, the same wfile this structure, but also need to use strings this variable, But the strings is no longer pydictobject, but pylistobject. Strings This variable is very interesting, when writing to the object, it is Pydictobject, when the load PYc file reads the object, it is pylistobject. The value of the previous Pydictobject, which is an integer, is now the index value of the Pylistobject, and the index value corresponds to a string, so that when Python loads the PYc file, it reads a type_ Stringref an element of type and an index value, you know to strings this pylistobject find what the corresponding index value is stored in

Now, let's see how the PYc file is loaded.

Marshal.c

typedef wfile Rfile; Pyobject *pymarshal_readobjectfromfile (FILE *fp) {Rfile rf; Pyobject *RESULT;RF.FP = fp;rf.strings = pylist_new (0); rf.depth = 0;rf.ptr = Rf.end = Null;result = R_object (&RF); Py_decref (rf.strings); return result;}

  

As we can see, when loading the PYc file, the RF is still the Wfile object, and at this time strings is no longer pydictobject, but Pylistobject, and r_object as W_object inverse

The Lost Pycodeobject

In this chapter of the Python code object and the PYc file (a), we have said that demo.py generates 3 Pycodeobject:

demo.py

Class A:    pass  def func ():    pass  a = A () func ()

  

In the Pymarshal_writeobjecttofile method, we see that this method only operates on a Pycodeobject object, and what about the other two Pycodeobject objects? In fact, another two pycodeobject exist in a pycodeobject, that is, demo.py itself is a large pycodeobject, and Class A and Def The Func two pycodeobject exist in the demo.py corresponding pycodeobject. We can use co_consts to see the other two pycodeobject:

>>> Source = open ("demo.py"). Read () >>> CO = compile (source, "demo.py", "exec") >>> Co.co_ Consts (' A ', <code object A at 0x7f966910ae30, file "demo.py", line 1>, <code object func at 0x7f96690aa930, file "demo.py", line 5>, None, ())

  

Sure enough, as we said earlier, Co_consts will return a tuple that contains the Class A and def func two pycodeobject,code types that still have many objects that we can explore, such as: Co_nlocals (Code The number of local variables in the block, including the number of positional parameters, Co_varnames (set of local variable names in Code block), interested students can do more research, here no longer show

  

Python code object and PYc file (iii)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.