Python Persistence Management Pickle Module Detailed introduction

Source: Internet
Author: User
Tags object serialization
Persistence means persisting objects, even between multiple executions of the same program. With this article, you have a general understanding of the various persistence mechanisms for Python objects, from relational databases to Python pickle and other mechanisms. In addition, it gives you a deeper understanding of Python's object serialization capabilities.
What is persistence?

The basic idea of persistence is simple. Suppose you have a Python program that might be a program to manage your daily backlog, and you want to save application objects (to-dos) between multiple executions of the program. In other words, you want to store objects on disk for later retrieval. That's the durability. There are several ways to achieve this, and each has its advantages and disadvantages.

For example, you can store object data in a text file of a format, such as a CSV file. Or you can use a relational database, such as Gadfly, MySQL, PostgreSQL, or DB2. These file formats and databases are excellent, and for all of these storage mechanisms, Python has a robust interface.

These storage mechanisms have one thing in common: the stored data is independent of the objects and programs that manipulate the data. The benefit of this is that the data can be used as a shared resource for other applications. The disadvantage is that, in this way, other programs can be allowed to access the object's data, which violates the principle of object-oriented encapsulation-that is, the object's data can only be accessed through the public interface of the object itself.

Also, for some applications, relational database methods may not be ideal. In particular, relational databases do not understand objects. Instead, relational databases forcibly use their own type systems and relational data models (tables), each containing a set of tuples (rows), each containing a fixed number of static type fields (columns). If an application's object model cannot easily be converted to a relational model, it can be difficult to map objects to tuples and to map tuples back to objects. This difficulty is often referred to as an obstructive mismatch (impedence-mismatch) problem.

Some of the pickle Python

The pickle module and its homogeneous module cpickle provide pickle support to Python. The latter is encoded in C, it has better performance, and for most applications it is recommended to use this module. We will continue to discuss pickle, but the example in this article is actually taking advantage of Cpickle. Since most of these examples are to be displayed with a Python shell, first show how to import Cpickle and reference it as pickle:
Copy the Code code as follows:


>>> Import Cpickle as Pickle


Now that the module has been imported, let's take a look at the pickle interface. The Pickle module provides the following function pairs: Dumps (object) returns a string containing an object in the pickle format, loads (string) that returns the object contained in the pickle string, and dump (object, file) writes the object to the text , this file can be an actual physical file, but it can also be any file-like object that has the write () method, which can accept a single string parameter; Load (file) returns the object contained in the pickle file.

By default, dumps () and dump () use printable ASCII representations to create pickle. Both have a final parameter (optional), and if true, the parameter specifies a faster and smaller binary representation to create the pickle. The loads () and load () functions automatically detect whether pickle is in binary or text format.

Listing 1 shows an interactive session that uses the dumps () and loads () functions described earlier:


Listing 1. Demo of Dumps () and loads ()

Copy the Code code as follows:


Welcome to Pycrust 0.7.2-the flakiest Python Shell
Sponsored by Orbtech-your Source for Python programming expertise.
Python 2.2.1 (#1, 27 2002, 10:22:32)
[GCC 3.2 (Mandrake Linux 9.0 3.2-1mdk)] on linux-i386
Type "Copyright", "credits" or "license" for more information.
>>> Import Cpickle as Pickle
>>> T1 = (' This is a string ', [1, 2, 3], None)
>>> T1
(' This is a string ', [1, 2, 3], None)
>>> P1 = pickle.dumps (t1)
>>> P1
"(S ' This is a string ' \ni42\n (lp1\ni1\nai2\nai3\nantp2\n.")
>>> print P1
(S ' This is a string '
I42
(LP1
I1
AI2
AI3
ANtp2
.
>>> t2 = pickle.loads (p1)
>>> T2
(' This is a string ', [1, 2, 3], None)
>>> P2 = pickle.dumps (t1, True)
>>> P2
' (U\x10this is a stringk*]q\x01 (k\x01k\x02k\x03entq\x02. ')
>>> t3 = pickle.loads (p2)
>>> T3
(' This is a string ', [1, 2, 3], None)

Note: The text pickle format is simple and is not explained here. In fact, all the conventions used are recorded in the Pickle module. We should also point out that the simple objects are used in our example, so using the binary pickle format does not show much efficiency in space-saving. However, in a system that actually uses complex objects, you'll see that using a binary format can bring significant improvements in size and speed.

Next, let's look at some examples that use dump () and load (), which work with objects of files and similar files. The operations of these functions are very similar to the dumps () and loads () that we have just seen, except that they have another capability-dump () function can dump several objects to the same file one after the other. The load () is then called to retrieve the objects in the same order. Listing 2 shows the practical application of this capability:

Listing 2. Dump () and load () example

Copy the Code code as follows:


>>> a1 = ' Apple '
>>> B1 = {1: ' One ', 2: ' Both ', 3: ' Three '}
>>> c1 = [' Fee ', ' fie ', ' foe ', ' Fum ']
>>> f1 = file (' temp.pkl ', ' WB ')
>>> Pickle.dump (A1, F1, True)
>>> Pickle.dump (B1, F1, True)
>>> Pickle.dump (C1, F1, True)
>>> F1.close ()
>>> F2 = file (' temp.pkl ', ' RB ')
>>> A2 = pickle.load (F2)
>>> A2
' Apple '
>>> b2 = pickle.load (F2)
>>> B2
{1: ' One ', 2: ' Both ', 3: ' Three '}
>>> C2 = pickle.load (F2)
>>> C2
[' Fee ', ' fie ', ' foe ', ' Fum ']
>>> F2.close ()

The power of Pickle

So far, we've talked about the basics of pickle. In this section, you will discuss some of the advanced issues that you will encounter when you start pickle complex objects, including instances of custom classes. Fortunately, Python can easily handle this situation.

Portability

In terms of space and time, Pickle is portable. In other words, the pickle file format is independent of the machine architecture, which means that, for example, you can create a pickle under Linux and then send it to a Python program that runs under Windows or Mac OS. Also, when upgrading to a newer version of Python, you don't have to worry about the possibility of discarding an existing pickle. Python developers have ensured that the pickle format will be backwards compatible with Python versions. In fact, detailed information about the current and supported formats is provided in the Pickle module:


Listing 3. Retrieving the supported formats
Copy the Code code as follows:


>>> pickle.format_version
' 1.3 '
>>> Pickle.compatible_formats
[' 1.0 ', ' 1.1 ', ' 1.2 ']

Multiple references, same object

In Python, a variable is a reference to an object. You can also reference the same object with multiple variables. It has been shown that Python has no difficulty maintaining this behavior with pickle objects, as shown in Listing 4:

Listing 4. Maintenance of object references
Copy the Code code as follows:


>>> a = [1, 2, 3]
>>> B = A
>>> A
[1, 2, 3]
>>> b
[1, 2, 3]
>>> A.append (4)
>>> A
[1, 2, 3, 4]
>>> b
[1, 2, 3, 4]
>>> C = Pickle.dumps ((A, B))
>>> D, E = Pickle.loads (c)
>>> D
[1, 2, 3, 4]
>>> E
[1, 2, 3, 4]
>>> D.append (5)
>>> D
[1, 2, 3, 4, 5]
>>> E
[1, 2, 3, 4, 5]

Circular references and recursive references

The object references that you just demonstrated can be extended to circular references (two objects each contain a reference to each other) and a recursive reference (an object contains a reference to itself). The following two lists highlight this capability. Let's look at the recursive reference first:

> Listing 5. Recursive references
Copy the Code code as follows:


>>> L = [1, 2, 3]
>>> L.append (L)
>>> L
[1, 2, 3, [...]]
>>> L[3]
[1, 2, 3, [...]]
>>> L[3][3]
[1, 2, 3, [...]]
>>> p = pickle.dumps (L)
>>> L2 = Pickle.loads (p)
>>> L2
[1, 2, 3, [...]]
>>> L2[3]
[1, 2, 3, [...]]
>>> L2[3][3]
[1, 2, 3, [...]]

Now, look at an example of a circular reference:

Listing 6. Circular references

Copy the Code code as follows:


>>> a = [1, 2]
>>> B = [3, 4]
>>> A.append (b)
>>> A
[1, 2, [3, 4]]
>>> B.append (a)
>>> A
[1, 2, [3, 4, [...]]
>>> b
[3, 4, [1, 2, [...]]
>>> A[2]
[3, 4, [1, 2, [...]]
>>> B[2]
[1, 2, [3, 4, [...]]
>>> A[2] is b
>>> B[2] is a
>>> f = file (' temp.pkl ', ' W ')
>>> Pickle.dump ((A, B), F)
>>> F.close ()
>>> f = file (' temp.pkl ', ' R ')
>>> C, d = Pickle.load (f)
>>> F.close ()
>>> C
[1, 2, [3, 4, [...]]
>>> D
[3, 4, [1, 2, [...]]
>>> C[2]
[3, 4, [1, 2, [...]]
>>> D[2]
[1, 2, [3, 4, [...]]
>>> C[2] is D
>>> D[2] is C

Note that if you pickle each object individually, instead of pickle all objects together in a tuple, you get slightly different (but important) results, as shown in Listing 7:


Listing 7. Pickle vs. together in a tuple pickle

Copy the Code code as follows:


>>> f = file (' temp.pkl ', ' W ')
>>> Pickle.dump (A, F)
>>> Pickle.dump (b, f)
>>> F.close ()
>>> f = file (' temp.pkl ', ' R ')
>>> C = pickle.load (f)
>>> d = pickle.load (f)
>>> F.close ()
>>> C
[1, 2, [3, 4, [...]]
>>> D
[3, 4, [1, 2, [...]]
>>> C[2]
[3, 4, [1, 2, [...]]
>>> D[2]
[1, 2, [3, 4, [...]]
>>> C[2] is D
>>> D[2] is C

Equal, but not always the same

As implied in the previous example, these objects are the same only if they reference the same object in memory. In the pickle scenario, each object is reverted to an object that is equal to the original object, but not the same object. In other words, each pickle is a copy of the original object:


Listing 8. The restored object as a copy of the original object

Copy the Code code as follows:


>>> j = [1, 2, 3]
>>> k = J
>>> K is J
>>> x = Pickle.dumps (k)
>>> y = pickle.loads (x)
>>> y
[1, 2, 3]
>>> y = = k
>>> y is k
>>> y is J
>>> K is J

At the same time, we see that Python maintains references between objects that are pickle as a unit. However, we also see that calling dump () separately will make Python unable to maintain a reference to an object that is pickle outside the cell. Instead, Python replicates the referenced object and stores the copy with the Pickle object. This is not a problem for applications that pickle and restore a single object hierarchy. But be aware that there are other situations.

It is worth noting that there is an option that does allow separate pickle of objects and maintains references to each other as long as the objects are pickle to the same file. The pickle and Cpickle modules provide a pickler (as opposed to a unpickler) that tracks objects that have been pickle. By using this Pickler, the shared and circular references are pickle by reference rather than by value:


Listing 9. Maintaining references between objects that are pickle individually

Copy the Code code as follows:


>>> f = file (' temp.pkl ', ' W ')
>>> Pickler = pickle. Pickler (f)
>>> Pickler.dump (a)

>>> Pickler.dump (b)

>>> F.close ()
>>> f = file (' temp.pkl ', ' R ')
>>> Unpickler = pickle. Unpickler (f)
>>> C = unpickler.load ()
>>> d = unpickler.load ()
>>> C[2]
[3, 4, [1, 2, [...]]
>>> D[2]
[1, 2, [3, 4, [...]]
>>> C[2] is D
>>> D[2] is C

Non-pickle objects

Some object types are not pickle. For example, Python cannot pickle a file object (or any object that has a reference to a file object), because Python does not guarantee that it can reconstruct the state of the file when Unpickle (another example is difficult to understand and is not worth proposing in such an article). Attempting to pickle a file object causes the following error:


Listing 10. Attempting to pickle the result of a file object

Copy the Code code as follows:


>>> f = file (' temp.pkl ', ' W ')
>>> p = pickle.dumps (f)
Traceback (most recent):
File " ", line 1, in?
File "/usr/lib/python2.2/copy_reg.py", line page, in _reduce
Raise TypeError, "can ' t pickle%s objects"% base.__name__
Typeerror:can ' t pickle file objects


class instance

Pickle class instances should be more attentive than pickle simple object types. This is mainly because Python pickle the instance data (typically the _dict_ property) and the name of the class without pickle the class's code. When an instance of the Unpickle class is Python, it attempts to import the module containing the class definition using the exact class name and module name (including the path prefix of any package) when pickle the instance. Also note that class definitions must appear at the topmost level of the module, which means they cannot be nested classes (classes defined in other classes or functions).

When an instance of the Unpickle class is not normally called again, their _init_ () method is not invoked. Instead, Python creates a generic class instance, applies the instance properties that have been pickle, and sets the _class_ property of the instance to point to the original class.

The mechanism for Unpickle the new class introduced in Python 2.2 is slightly different from the original one. Although the result of the processing is actually the same as for the old class, Python uses the _reconstructor () function of the Copy_reg module to restore instances of the new class.

If you want to modify the default pickle behavior for instances of new or older classes, you can define methods for special classes _getstate_ () and _setstate_ (), which Python calls during the preservation and recovery of state information for class instances. In the following sections, we will see some examples that take advantage of these special methods.

Now, let's look at a simple class instance. First, create a persist.py Python module that contains the definitions of the following new classes:

Listing 11. Definition of a new class
Copy the Code code as follows:


Class Foo (object):
def __init__ (self, value):
Self.value = value


You can now pickle the Foo instance and look at its representation:

Listing 12. Pickle Foo Instance

Copy the Code code as follows:


>>> Import Cpickle as Pickle
>>> from Orbtech.examples.persist import Foo
>>> foo = foo (' What's a foo? ')
>>> p = pickle.dumps (foo)
>>> Print P
Ccopy_reg
_reconstructor
P1
(cOrbtech.examples.persist
Foo
P2
c__builtin__
Object
P3
NtRp4
(DP5
S ' value '
P6
S ' What's a Foo? '
Sb.
>>>

You can see the name of this class Foo and the fully qualified module name Orbtech.examples.persist are stored in pickle. If you pickle this instance into a file and then unpickle it later or Unpickle on another machine, Python will attempt to import the Orbtech.examples.persist module and throw an exception if it cannot be imported. A similar error can occur if you rename the class and the module or move the module to another directory.

Here is an example of a Python error message that occurs when we rename the Foo class and then try to mount an instance of Foo that was previously pickle:


Listing 13. Attempted to load a pickle instance of a renamed Foo class

Copy the Code code as follows:


>>> Import Cpickle as Pickle
>>> f = file (' temp.pkl ', ' R ')
>>> foo = pickle.load (f)
Traceback (most recent):
File " ", line 1, in?
Attributeerror: ' Module ' object has no attribute ' Foo '

A similar error occurs after renaming the persist.py module:

Listing 14. Attempted to load a pickle instance of a renamed persist.py module
Copy the Code code as follows:


>>> Import Cpickle as Pickle
>>> f = file (' temp.pkl ', ' R ')
>>> foo = pickle.load (f)
Traceback (most recent):
File " ", line 1, in?
Importerror:no module named persist

We'll be in the following pattern improvements This section provides some techniques to manage such changes without destroying the existing pickle.

A special state method

As mentioned earlier, some object types (for example, file objects) cannot be pickle. You can use special methods (_getstate_ () and _setstate_ ()) to modify the state of a class instance when processing instance properties of an object that cannot be pickle. Here is an example of the Foo class, which we have modified to handle the file object properties:

Listing 15. Handling instance properties that cannot be pickle
Copy the Code code as follows:


Class Foo (object):
def __init__ (self, value, filename):
Self.value = value
Self.logfile = file (filename, ' W ')
def __getstate__ (self):
"" "Return state values to being pickled." "" "
f = self.logfile
Return (Self.value, F.name, F.tell ())
def __setstate__ (self, State):
"" "Restore state from the unpickled state values." "
Self.value, name, Position = State
f = File (name, ' W ')
F.seek (position)
Self.logfile = f

Pattern improvements

Over time, you will find yourself having to change the definition of a class. If you have already pickle a class instance and now need to change the class, you might want to retrieve and update those instances so that they can continue to work properly under the new class definition. And we've seen some errors when making some changes to the class or module. Fortunately, the pickle and unpickle processes provide some hooks that we can use to support the need for this pattern improvement.

In this section, we will explore ways to predict common problems and how to address them. Because you cannot pickle class instance code, you can add, change, and remove methods without affecting existing pickle instances. For the same reason, you don't have to worry about the properties of the class. You must make sure that the code module that contains the class definition is available in the Unpickle environment. You must also plan for these changes that may cause unpickle issues, including changing the class name, adding or dropping the properties of the instance, and changing the name or location of the class definition module.

Change of class name

To change the class name without breaking the previously pickle instance, follow these steps. First, make sure that the definition of the original class is not changed so that it can be found when unpickle an existing instance. Instead of changing the original name, create a copy of the class definition in the same module as the original class definition, and give it a new class name. Then replace Newclassname with the actual new class name, adding the following method to the definition of the original class:

Listing 16. Change the class name: Method added to the original class definition
Copy the Code code as follows:


def __setstate__ (self, State):
Self.__dict__.update (state)
self.__class__ = Newclassname

When unpickle an existing instance, Python looks for the definition of the original class and invokes the instance's _setstate_ () method, and assigns the _class_ property of the instance to the new class definition. Once you have determined that all existing instances have been unpickle, updated, and re-pickle, you can remove the old class definition from the source code module.

Adding and removing attributes

These special state methods _getstate_ () and _setstate_ () once again allow us to control the state of each instance and give us the opportunity to handle changes in instance properties. Let's look at the definition of a simple class, and we'll add and remove some properties to it. This is the initial definition:


Listing 17. The initial class definition
Copy the Code code as follows:


class Person (object):
def __init__ (self, FirstName, LastName):
Self.firstname = FirstName
Self.lastname = LastName

Assuming that an instance of person has been created and pickle, we now decide to really just store a name attribute instead of storing the first and last names separately. Here's a way to change the definition of a class that migrates previously pickle instances to a new definition:

Copy the Code code as follows:


class Person (object):
def __init__ (self, fullname):
Self.fullname = FullName
def __setstate__ (self, State):
If ' fullname ' not in state:
First = "
Last = "
If ' FirstName ' in state:
First = state[' FirstName ')
Del state[' FirstName ']
If ' LastName ' in state:
last = state[' LastName ']
Del state[' LastName ']
Self.fullname = "". Join ([First, Last]). Strip ()
Self.__dict__.update (state)

In this example, we added a new attribute, FullName, and removed two existing properties FirstName and LastName. When Unpickle is performed on an instance that has previously been pickle, its previously pickle state is passed as a dictionary to _setstate_ (), which will include the values of the FirstName and LastName properties. Next, combine the two values and assign them to the new property, FullName. In this process, we removed the old attribute from the state dictionary. After updating and re-pickle all instances of previous pickle, you can now remove the _setstate_ () method from the class definition.

Modification of the module

Conceptually, the change in the name or location of a module is similar to a change in the class name, but it is handled in a completely different way. That's because the information for the module is stored in pickle, not the properties that can be modified by the standard pickle interface. In fact, the only way to change the module information is to perform a find and replace operation on the actual pickle file itself. As for how to do it exactly, it depends on the specific operating system and the tools available to it. Obviously, in this case, you will want to back up your files to avoid errors. However, this change should be very simple, and making changes to the binary pickle format should be as effective as making changes to the text pickle format.

Conclusion

Object persistence relies on the object serialization capabilities of the underlying programming language. For Python objects that means pickle. Python's Pickle provides a robust and reliable foundation for the effective persistence management of Python objects. In the resources below, you will find information about systems built on top of Python's pickle capabilities.

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.