Detailed introduction to the python persistent management pickle module

Last Update:2017-05-14 Source: Internet

Author: User

Tags object serialization

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article mainly introduces the python persistent management pickle module in detail. This article describes what is persistence and some Python programs that have been pickle, and provides 18 examples, if you need a friend, you can refer to persistence, which means to keep objects even when you execute the same program multiple times. Through this article, you will have a general understanding of the various persistence mechanisms of Python objects (from relational databases to Python pickle and other mechanisms. In addition, it will give you a deeper understanding of Python's object serialization capabilities.
What is persistence?

The basic idea of persistence is simple. Suppose there is a Python program, which may be a program for managing daily to-do items. you want to save the application objects (to-do items) between multiple executions of this program ). In other words, you want to store the object on the disk for later retrieval. This is durability. There are several methods to achieve this. Each method has its own advantages and disadvantages.

For example, object data can be stored in text files of a certain format, such as CSV files. You can also use relational databases, such as Gadfly, MySQL, PostgreSQL, or DB2. These file formats and databases are excellent. for all these storage mechanisms, Python has robust interfaces.

These storage mechanisms have one thing in common: the stored data is independent of the objects and programs that operate on the data. The advantage of this is that data can be used as shared resources for other applications. The disadvantage is that, in this way, other programs can access the object data, which violates the object-oriented encapsulation principle-that is, the object data can only be accessed through the public) interface.

In addition, the relational database method may not be ideal for some applications. In particular, relational databases do not understand objects. Instead, relational databases forcibly use their own type systems and relational data models (tables). each table contains a group of tuples (rows ), each row contains a fixed number of static fields (columns ). If the object model of the application cannot be easily switched to the relational model, it is difficult to map objects to tuples and to map them back to objects. This difficulty is often referred to as the impedence-mismatch problem.

Some pickle Python

The pickle module and its similar modules cPickle provide pickle support for Python. The latter is coded in C and has better performance. this module is recommended for most applications. We will continue to discuss pickle, but the example in this article actually uses cPickle. Most of the examples are displayed using Python shell, so we will first demonstrate how to import cPickle and reference it as a pickle:

The code is as follows:

>>> Import cPickle as pickle

Now that this module has been imported, let's take a look at the pickle interface. The pickle module provides the following function pairs: dumps (object) returns a string that contains an object in the pickle format; loads (string) returns an object contained in the pickle string; dump (object, file) writes an object to a file. This file can be an actual physical file, but it can also be any object similar to a file. this object has the write () method, A single string parameter is acceptable; load (file) returns the objects contained in the pickle file.

By default, dumps () and dump () use printable ASCII representation to create pickle. Both have a final parameter (optional). if it is True, this parameter specifies to create pickle in a faster and smaller binary representation. The loads () and load () functions automatically detect whether pickle is in binary or text format.

Listing 1 shows an interactive session. here we use the dumps () and loads () functions described just now:

List 1. Demo of dumps () and loads ()

The code is as follows:

Welcome To PyCrust 0.7.2-The Flakiest Python Shell
Stored sored by Orbtech-Your source for Python programming expertise.
Python 2.2.1 (#1, Aug 27 2002, 10:22:32)
[GCC 3.2 (Mandrake Linux 9.0 3.2-1mdk)] on linux-i386
Type "copyright", "credits" or "license" for more information.
>>> Import cPickle as pickle
>>> T1 = ('This is a String', 42, [1, 2, 3], None)
>>> T1
('This is a String', 42, [1, 2, 3], None)
>>> P1 = pickle. dumps (t1)
>>> P1
"(S 'This is a string '\ nI42 \ n (lp1 \ nI1 \ naI2 \ naI3 \ naNtp2 \ n ."
>>> Print p1
(S 'This is a string'
I42
(Lp1
I1
AI2
AI3
ANtp2
.
>>> T2 = pickle. loads (p1)
>>> T2
('This is a String', 42, [1, 2, 3], None)
>>> P2 = pickle. dumps (t1, True)
>>> P2
'(U \ x10this is a stringK *] q \ x01 (K \ x01K \ x02K \ x03eNtq \ x02 .'
>>> T3 = pickle. loads (p2)
>>> T3
('This is a String', 42, [1, 2, 3], None)

Note: The pickle format of this text is very simple and will not be explained here. In fact, all the conventions used are recorded in the pickle module. We should also point out that all simple objects are used in our example, so using the binary pickle format will not show much efficiency in space saving. However, in systems that actually use complex objects, you can see that using a binary format can significantly improve the size and speed.

Next, let's look at some examples. These examples use dump () and load (), which use objects of files and similar files. The operations of these functions are very similar to the dumps () and loads () we just saw. The difference is that they have another capability-dump () the function can dump several objects one by one to the same file. Load () is then called to retrieve these objects in the same order. Listing 2 shows the practical application of this capability:

Listing 2. dump () and load () examples

The code is as follows:

>>> A1 = 'apple'
>>> B1 = {1: 'one', 2: 'two', 3: 'Three '}
>>> C1 = ['fum ', 'fie', 'foe', 'fum']
>>> F1 = file ('temp. pkl', 'wb ')
>>> Pickle. dump (a1, f1, True)
>>> Pickle. dump (b1, f1, True)
>>> Pickle. dump (c1, f1, True)
>>> F1.close ()
>>> F2 = file ('temp. pkl', 'RB ')
>>> A2 = pickle. load (f2)
>>> A2
'Apple'
>>> B2 = pickle. load (f2)
>>> B2
{1: 'one', 2: 'two', 3: 'Three '}
>>> C2 = pickle. load (f2)
>>> C2
['Signature', 'fie', 'foe', 'fum ']
>>> F2.close ()

The power of Pickle

So far, we have talked about the basic knowledge about pickle. In this section, we will discuss some advanced issues that you will encounter when you start to pickle complex objects, including custom class instances. Fortunately, Python can easily handle this situation.

Portability

In terms of space and time, Pickle is portable. In other words, the pickle file format is independent from the machine's architecture, which means, for example, you can create a pickle in Linux and then send it to a Python program running in Windows or Mac OS. In addition, when you upgrade Python to an updated version, you do not have to worry about the possibility of discarding the existing pickle. Python developers have ensured that the pickle format is backward compatible with various Python versions. In fact, the pickle module provides detailed information about the current and supported formats:

Listing 3. search for Supported formats

The code is as follows:

>>> Pickle. format_version
'1. 3'
>>> Pickle. compatible_formats
['1. 0', '1. 1', '1. 2']

Multiple references, same object

In Python, variables are references to objects. You can also use multiple variables to reference the same object. It has been proved that Python has no difficulty in maintaining this behavior with pickle objects, as shown in listing 4:

Listing 4. maintenance of object reference

The code is as follows:

>>> A = [1, 2, 3]
>>> B =
>>>
[1, 2, 3]
>>> B
[1, 2, 3]
>>> A. append (4)
>>>
[1, 2, 3, 4]
>>> B
[1, 2, 3, 4]
>>> C = pickle. dumps (a, B ))
>>> D, e = pickle. loads (c)
>>> D
[1, 2, 3, 4]
>>> E
[1, 2, 3, 4]
>>> D. append (5)
>>> D
[1, 2, 3, 4, 5]
>>> E
[1, 2, 3, 4, 5]

Loop reference and Recursive Reference

You can extend the object reference that you just demonstrated to circular reference (two objects each contain references to the other) and Recursive Reference (one object contains references to itself ). The following two lists highlight this capability. Let's take a look at recursive references:

> List 5. Recursive Reference

The code is as follows:

>>> L = [1, 2, 3]
>>> L. append (l)
>>> L
[1, 2, 3, [...]
>>> L [3]
[1, 2, 3, [...]
>>> L [3] [3]
[1, 2, 3, [...]
>>> P = pickle. dumps (l)
>>> L2 = pickle. loads (p)
>>> L2
[1, 2, 3, [...]
>>> L2 [3]
[1, 2, 3, [...]
>>> L2 [3] [3]
[1, 2, 3, [...]

Now, let's look at an example of circular reference:

Listing 6. Circular reference

The code is as follows:

>>> A = [1, 2]
>>> B = [3, 4]
>>> A. append (B)
>>>
[1, 2, [3, 4]
>>> B. append ()
>>>
[1, 2, [3, 4, [...]
>>> B
[3, 4, [1, 2, [...]
>>> A [2]
[3, 4, [1, 2, [...]
>>> B [2]
[1, 2, [3, 4, [...]
>>> A [2] is B
>>> B [2] is
>>> F = file ('temp. pkl', 'w ')
>>> Pickle. dump (a, B), f)
>>> F. close ()
>>> F = file ('temp. pkl', 'r ')
>>> C, d = pickle. load (f)
>>> F. close ()
>>> C
[1, 2, [3, 4, [...]
>>> D
[3, 4, [1, 2, [...]
>>> C [2]
[3, 4, [1, 2, [...]
>>> D [2]
[1, 2, [3, 4, [...]
>>> C [2] is d
>>> D [2] is c

Note: If each object is pickle, rather than all the objects in a single tuple, the results will be slightly different (but important), as shown in listing 7:

Listing 7. pickle vs respectively.

The code is as follows:

>>> F = file ('temp. pkl', 'w ')
>>> Pickle. dump (a, f)
>>> Pickle. dump (B, f)
>>> F. close ()
>>> F = file ('temp. pkl', 'r ')
>>> C = pickle. load (f)
>>> D = pickle. load (f)
>>> F. close ()
>>> C
[1, 2, [3, 4, [...]
>>> D
[3, 4, [1, 2, [...]
>>> C [2]
[3, 4, [1, 2, [...]
>>> D [2]
[1, 2, [3, 4, [...]
>>> C [2] is d
>>> D [2] is c

Equal, but not always the same

As in the previous example, these objects are the same only when they reference the same object in memory. In the pickle case, each object is restored to an object equal to the original object, but not the same object. In other words, each pickle is a copy of the original object:

Listing 8. restored objects as original object copies

The code is as follows:

>>> J = [1, 2, 3]
>>> K = j
>>> K is j
>>> X = pickle. dumps (k)
>>> Y = pickle. loads (x)
>>> Y
[1, 2, 3]
>>> Y = k
>>> Y is k
>>> Y is j
>>> K is j

At the same time, we can see that Python can maintain references between objects. these objects are pickle as a unit. However, we also see that calling dump () will make Python unable to maintain references to pickle objects outside the unit. On the contrary, Python copies referenced objects and stores copies with pickle objects. There is no problem with pickle and applications that restore a single object hierarchy. But be aware that there are other situations.

It is worth noting that there is an option to allow pickle objects separately and maintain mutual references, as long as these objects are all pickle to the same file. The pickle and cPickle modules provide a Pickler (corresponding to Unpickler) that can trace the objects that have been pickle. By using this Pickler, we will use the reference instead of the value to pickle sharing and loop reference:

Listing 9. maintain reference between pickle objects

The code is as follows:

>>> F = file ('temp. pkl', 'w ')
>>> Pickler = pickle. Pickler (f)
>>> Pickler. dump ()

>>> Pickler. dump (B)

>>> F. close ()
>>> F = file ('temp. pkl', 'r ')
>>> Unpickler = pickle. Unpickler (f)
>>> C = unpickler. load ()
>>> D = unpickler. load ()
>>> C [2]
[3, 4, [1, 2, [...]
>>> D [2]
[1, 2, [3, 4, [...]
>>> C [2] is d
>>> D [2] is c

Objects that cannot be pickle

Some object types cannot be pickle. For example, Python cannot pickle a file object (or any object with a reference to a file object), because Python cannot guarantee that it can rebuild the state of the file during unpickle (Another example is hard to understand, it is not worth mentioning in such articles ). Attempting to pickle the file object causes the following errors:

Listing 10. Results of trying to pickle the file object

The code is as follows:

>>> F = file ('temp. pkl', 'w ')
>>> P = pickle. dumps (f)
Traceback (most recent call last ):
File"", Line 1, in?
File "/usr/lib/python2.2/copy_reg.py", line 57, in _ reduce
Raise TypeError, "can't pickle % s objects" % base. _ name __
TypeError: can't pickle file objects

Class instance

Compared with the pickle simple object type, you should pay more attention to the pickle class instance. This is mainly because Python will pickle the instance data (usually the _ dict _ attribute) and class name, rather than the pickle class code. When Python unpickle class is used, it tries to use the exact class name and module name (including the path prefix of any package) when pickle is used to import modules containing the class definition. In addition, the class definition must appear at the top layer of the module, which means that they cannot be Nested classes (classes defined in other classes or functions ).

When an instance of the unpickle class is used, its _ init _ () method is usually not called. On the contrary, Python creates a general-purpose class instance, applies the pickle attributes of the instance, and sets the _ class _ attribute of the instance to point it to the original class.

The unpickle mechanism for the new class introduced in Python 2.2 is slightly different from the original one. Although the processing result is actually the same as that of the old class, Python uses the _ reconstructor () function of the copy_reg module to restore the instance of the new class.

If you want to modify the default pickle behavior for instances of new or old classes, you can define the methods of special classes: _ getstate _ () and _ setstate _(), python calls these methods when saving and recovering the status information of a class instance. In the following sections, we will see some examples using these special methods.

Now let's look at a simple class instance. First, create a Python module of persist. py, which includes the following definitions of new classes:

Listing 11. definitions of new classes

The code is as follows:

Class Foo (object ):
Def _ init _ (self, value ):
Self. value = value

Now you can pickle Foo instance and take a look at its representation:

Listing 12. pickle Foo instance

The code is as follows:

>>> Import cPickle as pickle
>>> From Orbtech. examples. persist import Foo
>>> Foo = Foo ('What is a Foo? ')
>>> P = pickle. dumps (foo)
>>> Print p
Ccopy_reg
_ Reconstructor
P1
(COrbtech. examples. persist
Foo
P2
C _ builtin __
Object
P3
NtRp4
(Dp5
S 'value'
P6
S What is a Foo? '
Sb.
>>>

The class name Foo and the fully qualified module name Orbtech. examples. persist are stored in pickle. If this instance is pickle into a file, and then unpickle it later or unpickle on another machine, Python will try to import Orbtech. examples. if the persist module cannot be imported, an exception is thrown. If you rename the class and the module, or move the module to another directory, a similar error occurs.

Here is an example of an error message sent by Python. This error occurs when we rename the Foo class and try to mount the Foo instance that has previously been pickle:

Listing 13. trying to mount a pickle instance of the renamed Foo class

The code is as follows:

>>> Import cPickle as pickle
>>> F = file ('temp. pkl', 'r ')
>>> Foo = pickle. load (f)
Traceback (most recent call last ):
File"", Line 1, in?
AttributeError: 'module' object has no attribute 'foo'

After renaming the persist. py module, a similar error occurs:

Listing 14. trying to mount a pickle instance of the renamed persist. py module

The code is as follows:

>>> Import cPickle as pickle
>>> F = file ('temp. pkl', 'r ')
>>> Foo = pickle. load (f)
Traceback (most recent call last ):
File"", Line 1, in?
ImportError: No module named persist

We will provide some technology in the following pattern improvement section to manage such changes without disrupting the existing pickle.

Special status method

As mentioned above, some object types (such as file objects) cannot be pickle. You can use special methods (_ getstate _ () and _ setstate _ () to modify the status of a class instance when processing instance attributes of an object that cannot be pickle. Here is an example of the Foo class. we have modified it to process file object attributes:

Listing 15. handling instance properties that cannot be pickle

The code is as follows:

Class Foo (object ):
Def _ init _ (self, value, filename ):
Self. value = value
Self. logfile = file (filename, 'w ')
Def _ getstate _ (self ):
"Return state values to be pickled ."""
F = self. logfile
Return (self. value, f. name, f. tell ())
Def _ setstate _ (self, state ):
"Restore state from the unpickled state values ."""
Self. value, name, position = state
F = file (name, 'w ')
F. seek (position)
Self. logfile = f

Mode Improvement

Over time, you will find that you must change the class definition. If you have pickle a class instance and want to change the class, you may want to retrieve and update those instances, so that they can continue to work normally under the new class definition. We have seen some errors when making some changes to the class or module. Fortunately, the pickle and unpickle processes provide some hooks that we can use to support the needs for this mode improvement.

In this section, we will explore some methods to predict common problems and how to solve them. Because the pickle class instance code is not supported, you can add, modify, and remove methods without affecting existing pickle instances. For the same reason, you do not have to worry about the attributes of the class. Make sure that the code module containing the class definition is available in the unpickle environment. You must also plan the changes that may cause unpickle problems, including changing the class name, adding or removing instance attributes, and changing the name or location of the class definition module.

Class name change

To change the class name without damaging the pickle instance, follow these steps. First, make sure that the definition of the original class is not changed, so that it can be found when unpickle existing instances. Do not change the original name. Instead, create a copy of the class definition in the same module as the original class definition and give it a new class name. Then replace NewClassName with the actual new class name and add the following method to the definition of the original class:

Listing 16. change the class name: add the method to the original class definition

The code is as follows:

Def _ setstate _ (self, state ):
Self. _ dict _. update (state)
Self. _ class _ = NewClassName

When unpickle is an existing instance, Python searches for the definition of the original class and calls the _ setstate _ () method of the instance, at the same time, the _ class _ attribute of the instance will be re-allocated to the new class definition. Once you confirm that all existing instances have been unpickle, updated, and re-pickle, you can remove the old class definition from the source code module.

Add and delete attributes

These special status methods _ getstate _ () and _ setstate _ () allow us to control the status of each instance again and give us the opportunity to process changes in instance properties. Let's take a look at the definition of a simple class. we will add and remove some attributes to it. This is the initial definition:

Listing 17. Initial class definition

The code is as follows:

Class Person (object ):
Def _ init _ (self, firstname, lastname ):
Self. firstname = firstname
Self. lastname = lastname

Assuming that the Person instance has been created and pickle, we have decided to store only one name attribute, rather than the last name and name respectively. Here is a way to change the class definition, which migrates the previous pickle instance to the new definition:

The code is as follows:

Class Person (object ):
Def _ init _ (self, fullname ):
Self. fullname = fullname
Def _ setstate _ (self, state ):
If 'fullname' not in state:
First =''
Last =''
If 'firstname' in state:
First = state ['firstname']
Del state ['firstname']
If 'lastname' in state:
Last = state ['lastname']
Del state ['lastname']
Self. fullname = "". join ([first, last]). strip ()
Self. _ dict _. update (state)

In this example, we add a new property fullname and remove two existing attributes firstname and lastname. When unpickle is executed on an instance that has previously been pickle, the status of the instance that has previously been pickle is passed as a dictionary to _ setstate _ (), which includes the values of the firstname and lastname attributes. Next, combine these two values and assign them to the new property fullname. In this process, we deleted the old attribute in the state Dictionary. After updating and re-installing all pickle instances, you can remove the _ setstate _ () method from the class definition.

Module modification

In terms of concept, the change of the module name or position is similar to the change of the class name, but the processing method is completely different. This is because the module information is stored in pickle, rather than the attributes that can be modified through the standard pickle interface. In fact, the only way to change the module information is to find and replace the actual pickle file itself. It depends on the operating system and available tools. Obviously, in this case, you want to back up your files to avoid errors. However, this change should be very simple, and the change of the binary pickle format should be as effective as the change of the text pickle format.

Conclusion

Object Persistence depends on the object serialization capability of the underlying programming language. For Python objects, it means pickle. Python pickle provides a robust and reliable foundation for effective persistent management of Python objects. In the following references, you will find information about the system built on the Python pickle capability.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More