Persistence means persisting objects, even between executing the same program multiple times. With this article, you will have a general understanding of the various persistence mechanisms of Python objects, from relational databases to Python pickle and other mechanisms. It also gives you a deeper understanding of Python's object serialization capabilities.
What is persistence?
The basic idea of permanence is simple. Suppose you have a Python program that may be a program to manage your day-to-day to-do items, and you want to save the Application object (to do) between this program multiple times. In other words, you want to store objects on disk for easy retrieval later. This is persistence. There are several ways to achieve this, each of which has its advantages and disadvantages.
For example, you can store object data in a text file of a format, such as a CSV file. Or you can use relational databases, such as Gadfly, MySQL, PostgreSQL, or DB2. These file formats and databases are excellent and Python has a robust interface for all of these storage mechanisms.
These storage mechanisms have one thing in common: stored data is independent of the objects and programs that manipulate the data. The advantage of this is that the data can be used as a shared resource for use by other applications. The disadvantage is that, in this way, other programs can be allowed to access the object's data, which violates the object-oriented encapsulation principle that the object's data can only be accessed through the object's own public interface.
In addition, relational database methods may not be ideal for some applications. In particular, relational databases do not understand objects. In contrast, relational databases force their own type systems and relational data models (tables), each containing a set of tuples (rows), each containing a fixed number of static type fields (columns). If an application's object model cannot be easily converted to a relational model, it can be difficult to map objects to tuples and to map tuples back to objects. This difficulty is often referred to as a blocking mismatch (impedence-mismatch) problem.
Some Python that went through pickle
The Pickle module and its equivalent module cpickle provide pickle support to Python. The latter is encoded in C, which has better performance and is recommended for most applications. We'll continue with the discussion of pickle, but the example in this article is actually taking advantage of Cpickle. Since most of these examples are to be shown with the Python shell, let's show you how to import Cpickle and reference it as a pickle:
Copy Code code as follows:
>>> Import Cpickle as Pickle
Now that the module has been imported, let's take a look at the pickle interface. The Pickle module provides the following function pairs: Dumps (object) returns a string containing an pickle-formatted object, loads (string) to return the object contained in the pickle string, and dump (object, file) to write objects to the text , this file can be the actual physical file, but it can be any object similar to a file, which has a write () method that accepts a single string parameter, and load (file) returns the object contained in the pickle file.
By default, dumps () and dump () use printable ASCII representations to create pickle. Both have a final argument (optional), and if true, the parameter specifies a faster and smaller binary representation to create the pickle. The loads () and load () functions automatically detect whether the pickle is binary or text-formatted.
Listing 1 shows an interactive session that uses the dumps () and loads () functions described just now:
demo of Listing 1 dumps () and loads ()
Copy Code code as follows:
Welcome to Pycrust 0.7.2-the flakiest Python Shell
Sponsored by Orbtech-your Source for Python programming expertise.
Python 2.2.1 (#1, Aug 27 2002, 10:22:32)
[GCC 3.2 (Mandrake Linux 9.0 3.2-1mdk)] on linux-i386
Type "Copyright", "credits" or "license" for the more information.
>>> Import Cpickle as Pickle
>>> T1 = (' This is a string ', +, [1, 2, 3], None)
>>> T1
(' This is a string ', [1, 2, 3], None)
>>> P1 = pickle.dumps (t1)
>>> P1
"(S ' is a string ' \ni42\n" (lp1\ni1\nai2\nai3\nantp2\n.)
>>> print P1
(S ' is a string '
I42
(LP1
I1
AI2
AI3
ANtp2
.
>>> t2 = pickle.loads (p1)
>>> T2
(' This is a string ', [1, 2, 3], None)
>>> P2 = pickle.dumps (t1, True)
>>> P2
' (U\x10this is a stringk*]q\x01 (k\x01k\x02k\x03entq\x02. ')
>>> t3 = pickle.loads (p2)
>>> T3
(' This is a string ', [1, 2, 3], None)
Note: This text pickle format is very simple, here is not explained. In fact, all the used conventions are documented in the Pickle module. We should also point out that we use simple objects in our example, so using the binary pickle format does not show much efficiency in space saving. However, in systems that actually use complex objects, you'll see that using binary formats can lead to significant improvements in size and speed.
Next, we look at some examples that use dump () and load (), which use files and objects of similar files. The operations of these functions are very similar to the dumps () and loads () that we have just seen, except that they have another capability-dump () function to dump several objects into the same file one after the other. The load () is then invoked to retrieve the objects in the same order. Listing 2 shows the actual application of this capability:
Listing 2. Dump () and load () example
Copy Code code as follows:
>>> a1 = ' Apple '
>>> B1 = {1: ' One ', 2: ' Two ', 3: ' Three '}
>>> c1 = [' Fee ', ' fie ', ' foe ', ' Fum ']
>>> f1 = file (' temp.pkl ', ' WB ')
>>> Pickle.dump (A1, F1, True)
>>> Pickle.dump (B1, F1, True)
>>> Pickle.dump (C1, F1, True)
>>> F1.close ()
>>> F2 = file (' temp.pkl ', ' RB ')
>>> A2 = pickle.load (F2)
>>> A2
' Apple '
>>> b2 = pickle.load (F2)
>>> B2
{1: ' One ', 2: ' Two ', 3: ' Three '}
>>> C2 = pickle.load (F2)
>>> C2
[' Fee ', ' fie ', ' foe ', ' Fum ']
>>> F2.close ()
The power of Pickle
So far, we've talked about the basics of pickle. In this section, you will discuss some of the advanced issues that you encounter when you start pickle complex objects, including instances of custom classes. Fortunately, Python can easily handle this situation.
Portability
In terms of space and time, Pickle is portable. In other words, the pickle file format is independent of the machine architecture, which means, for example, that you can create a pickle under Linux and then send it to a Python program that runs under Windows or Mac OS. Also, when upgrading to a newer version of Python, there is no need to worry about the possibility of discarding existing pickle. Python developers have ensured that the pickle format will be backwards compatible with Python versions. In fact, the Pickle module provides detailed information about the current and supported formats:
Listing 3. Retrieving supported formats
Copy Code code as follows:
>>> pickle.format_version
' 1.3 '
>>> Pickle.compatible_formats
[' 1.0 ', ' 1.1 ', ' 1.2 ']
Multiple references, same object
In Python, a variable is a reference to an object. You can also refer to the same object with multiple variables. It has proved that Python has no difficulty in maintaining this behavior with pickle objects, as shown in Listing 4:
Listing 4. Maintenance of object references
Copy Code code as follows:
>>> a = [1, 2, 3]
>>> B = A
>>> A
[1, 2, 3]
>>> b
[1, 2, 3]
>>> A.append (4)
>>> A
[1, 2, 3, 4]
>>> b
[1, 2, 3, 4]
>>> C = Pickle.dumps ((A, B))
>>> D, E = Pickle.loads (c)
>>> D
[1, 2, 3, 4]
>>> E
[1, 2, 3, 4]
>>> D.append (5)
>>> D
[1, 2, 3, 4, 5]
>>> E
[1, 2, 3, 4, 5]
Circular references and recursive references
You can extend the object reference support that you just demonstrated to a circular reference (two objects each contain a reference to each other) and a recursive reference (an object contains a reference to itself). The following two lists highlight this capability. Let's look at the recursive reference first:
> Listing 5. Recursive references
Copy Code code as follows:
>>> L = [1, 2, 3]
>>> L.append (L)
>>> L
[1, 2, 3, [...]]
>>> L[3]
[1, 2, 3, [...]]
>>> L[3][3]
[1, 2, 3, [...]]
>>> p = pickle.dumps (L)
>>> L2 = Pickle.loads (p)
>>> L2
[1, 2, 3, [...]]
>>> L2[3]
[1, 2, 3, [...]]
>>> L2[3][3]
[1, 2, 3, [...]]
Now, look at an example of a circular reference:
Listing 6. Circular Reference
Copy Code code as follows:
>>> a = [1, 2]
>>> B = [3, 4]
>>> A.append (b)
>>> A
[1, 2, [3, 4]]
>>> B.append (a)
>>> A
[1, 2, [3, 4, [...]]]
>>> b
[3, 4, [1, 2, [...]]]
>>> A[2]
[3, 4, [1, 2, [...]]]
>>> B[2]
[1, 2, [3, 4, [...]]]
>>> A[2] is b
>>> B[2] is a
>>> f = file (' temp.pkl ', ' W ')
>>> Pickle.dump ((A, B), F)
>>> F.close ()
>>> f = file (' temp.pkl ', ' R ')
>>> C, d = Pickle.load (f)
>>> F.close ()
>>> C
[1, 2, [3, 4, [...]]]
>>> D
[3, 4, [1, 2, [...]]]
>>> C[2]
[3, 4, [1, 2, [...]]]
>>> D[2]
[1, 2, [3, 4, [...]]]
>>> C[2] is D
>>> D[2] is C
Note that if you pickle each object separately, instead of pickle all objects together in a tuple, you get slightly different (but important) results, as shown in Listing 7:
listing 7, respectively Pickle vs. Pickle together in a tuple
Copy Code code as follows:
>>> f = file (' temp.pkl ', ' W ')
>>> Pickle.dump (A, F)
>>> Pickle.dump (b, f)
>>> F.close ()
>>> f = file (' temp.pkl ', ' R ')
>>> C = pickle.load (f)
>>> d = pickle.load (f)
>>> F.close ()
>>> C
[1, 2, [3, 4, [...]]]
>>> D
[3, 4, [1, 2, [...]]]
>>> C[2]
[3, 4, [1, 2, [...]]]
>>> D[2]
[1, 2, [3, 4, [...]]]
>>> C[2] is D
>>> D[2] is C
Equal, but not always the same
As indicated in the previous example, they are the same only if they refer to the same object in memory. In the Pickle case, each object is reverted to an object that is equal to the original object, but not the same object. In other words, each pickle is a copy of the original object:
Listing 8. The recovered object as a copy of the original object
Copy Code code as follows:
>>> j = [1, 2, 3]
>>> k = J
>>> K is J
>>> x = Pickle.dumps (k)
>>> y = pickle.loads (x)
>>> y
[1, 2, 3]
>>> y = = k
>>> y is k
>>> y is J
>>> K is J
At the same time, we see that Python can maintain references between objects that are pickle as a unit. However, we also see that a separate call to dump () will make it impossible for Python to maintain references to objects that are pickle outside the cell. Instead, Python replicates the referenced object and stores the replica with the object being pickle. This is no problem for applications that pickle and restore a single object hierarchy. But be aware that there are other situations.
It is worth noting that there is an option that does allow you to pickle objects separately and maintain references to each other as long as they are pickle to the same file. The pickle and Cpickle modules provide a pickler (which corresponds to a unpickler) that tracks objects that have been pickle. By using this pickler, you will pickle shared and circular references by reference rather than by value:
listing 9. Maintaining references between objects that are pickle separately
Copy Code code as follows:
>>> f = file (' temp.pkl ', ' W ')
>>> Pickler = pickle. Pickler (f)
>>> Pickler.dump (a)
<cpickle.pickler Object at 0x89b0bb8>
>>> Pickler.dump (b)
<cpickle.pickler Object at 0x89b0bb8>
>>> F.close ()
>>> f = file (' temp.pkl ', ' R ')
>>> Unpickler = pickle. Unpickler (f)
>>> C = unpickler.load ()
>>> d = unpickler.load ()
>>> C[2]
[3, 4, [1, 2, [...]]]
>>> D[2]
[1, 2, [3, 4, [...]]]
>>> C[2] is D
>>> D[2] is C
Objects that are not pickle
Some object types are not pickle. For example, Python cannot pickle a file object (or any object that has a reference to a file object), because Python does not guarantee that it can reconstruct the state of the file when Unpickle (another example is difficult to understand, and is not worth proposing in such an article). Attempting to pickle a file object causes the following error:
listing 10. Trying to pickle the result of a file object
Copy Code code as follows:
>>> f = file (' temp.pkl ', ' W ')
>>> p = pickle.dumps (f)
Traceback (most recent call last):
File "<input>", line 1, in?
File "/usr/lib/python2.2/copy_reg.py", line, in _reduce
Raise TypeError, "can ' t pickle%s objects"% base.__name__
Typeerror:can ' t pickle file objects
class Instance
Pickle class instances need to be more careful than pickle simple object types. This is primarily because Python pickle instance data (usually the _dict_ attribute) and the name of the class, without pickle the class code. When a Python unpickle an instance of a class, it attempts to import the module that contains the definition of the class by using the exact class name and module name (including the path prefix of any package) when the instance is pickle. Also note that class definitions must appear at the top level of the module, which means that they cannot be nested classes (classes defined in other classes or functions).
When an instance of a class is Unpickle, their _init_ () method is not usually invoked. Instead, Python creates a generic class instance, applies the instance properties that have been pickle, and sets the _class_ property of the instance to point to the original class.
The mechanism for Unpickle the new class introduced in Python 2.2 is slightly different from the original one. While the results of processing are actually the same as those for older classes, Python uses the _reconstructor () function of the Copy_reg module to recover instances of the new class.
If you want to modify the default pickle behavior for an instance of a new or old class, you can define the methods _getstate_ () and _setstate_ () of the special classes that Python invokes during the save and Recovery state information for the class instance. In the following sections, we'll see some examples that take advantage of these special methods.
Now, let's look at a simple class instance. First, create a persist.py Python module that contains the definitions of the following new classes:
Listing 11. Definition of a new class
Copy Code code as follows:
Class Foo (object):
def __init__ (self, value):
Self.value = value
Now you can pickle the instance of Foo and look at its representation:
Listing 12. Pickle Foo Instance
Copy Code code as follows:
>>> Import Cpickle as Pickle
>>> from Orbtech.examples.persist import Foo
>>> foo = foo (' What is a foo? ')
>>> p = pickle.dumps (foo)
>>> Print P
Ccopy_reg
_reconstructor
P1
(cOrbtech.examples.persist
Foo
P2
c__builtin__
Object
P3
NtRp4
(DP5
S ' value '
P6
S ' What is a Foo? '
Sb.
>>>
You can see that the name of this class, Foo, and the fully qualified module name Orbtech.examples.persist are stored in the pickle. If you pickle this instance into a file and then unpickle it later or Unpickle on another machine, Python will attempt to import the Orbtech.examples.persist module and throw an exception if it cannot be imported. A similar error can occur if you rename the class and the module or move the module to another directory.
Here's an example of Python emitting an error message that occurs when we rename the Foo class and then try to mount the previously Pickle foo instance:
Listing 13. An attempt to mount a pickle instance of the renamed Foo class
Copy Code code as follows:
>>> Import Cpickle as Pickle
>>> f = file (' temp.pkl ', ' R ')
>>> foo = pickle.load (f)
Traceback (most recent call last):
File "<input>", line 1, in?
Attributeerror: ' Module ' object has no attribute ' Foo '
A similar error occurs after renaming the persist.py module:
Listing 14. Attempted to mount a pickle instance of a renamed persist.py module
Copy Code code as follows:
>>> Import Cpickle as Pickle
>>> f = file (' temp.pkl ', ' R ')
>>> foo = pickle.load (f)
Traceback (most recent call last):
File "<input>", line 1, in?
Importerror:no module named persist
We will provide some techniques to manage such changes in the following pattern improvement section without destroying existing pickle.
A special state method
The previous reference to some object types (for example, file objects) cannot be pickle. You can modify the state of a class instance by using special methods (_getstate_ () and _setstate_ ()) When handling instance properties of this Pickle object. Here is an example of the Foo class, which we have modified to handle the file object properties:
Listing 15. Handling instance properties that cannot be pickle
Copy Code code as follows:
Class Foo (object):
def __init__ (self, value, filename):
Self.value = value
Self.logfile = file (filename, ' W ')
def __getstate__ (self):
"" "return the state values to be pickled." "
f = self.logfile
Return (Self.value, F.name, F.tell ())
def __setstate__ (self, State):
"" "" "" "" "" "" "" unpickled state values. ""
Self.value, name, Position = State
f = File (name, ' W ')
F.seek (position)
Self.logfile = f
Pattern improvement
Over time, you will find yourself having to change the definition of the class. If you have already pickle a class instance and now need to change that class, you may want to retrieve and update those instances so that they continue to work correctly under the new class definition. And we've seen some errors when making some changes to a class or module. Fortunately, the pickle and unpickle processes provide some hooks that we can use to support the need for this pattern improvement.
In this section, we'll explore ways to predict common problems and how to solve them. Because you cannot pickle class instance code, you can add, change, and drop methods without affecting existing pickle instances. For the same reason, you do not have to worry about the properties of the class. You must ensure that the code modules that contain class definitions are available in the Unpickle environment. You must also plan for these changes that may cause unpickle problems, including changing the class name, adding or dropping the properties of the instance, and changing the name or location of the class definition module.
Change of class name
To change the class name without destroying a previously pickle instance, follow these steps. First, make sure that the definition of the original class has not been changed so that it can be found when unpickle an existing instance. Instead of changing the original name, create a copy of the class definition in the same module as the original class definition, giving it a new class name. Then use the actual new class name instead of Newclassname, adding the following methods to the definition of the original class:
Listing 16. Change class Name: Methods added to the original class definition
Copy Code code as follows:
def __setstate__ (self, State):
Self.__dict__.update (state)
self.__class__ = Newclassname
When unpickle an existing instance, Python looks for the definition of the original class and calls the _setstate_ () method of the instance, and assigns the _class_ property of the instance to the new class definition. Once you have determined that all existing instances have been unpickle, updated, and pickle, you can remove the old class definition from the source code module.
Add and Remove Properties
These special state methods _getstate_ () and _setstate_ () once again enable us to control the state of each instance and give us the opportunity to handle the changes in the instance properties. Let's look at the definition of a simple class, and we'll add and drop some attributes to it. This is the original definition:
listing 17. The original class definition
Copy Code code as follows:
class Person (object):
def __init__ (self, FirstName, LastName):
Self.firstname = FirstName
Self.lastname = LastName
Assuming that an instance of person has been created and pickle, we now decide that we really want to store only one name attribute instead of storing the first and last names separately. Here is a way to change the definition of a class, which migrates a previously pickle instance to a new definition:
Copy Code code as follows:
class Person (object):
def __init__ (self, fullname):
Self.fullname = FullName
def __setstate__ (self, State):
If ' FullName ' isn't in the state:
The ""
last = "'
If ' FirstName ' in:
A = state[' FirstName ']
Del state[' FirstName ']
If ' LastName ' in:
last = state[' LastName ']
Del state[' LastName ']
Self.fullname = "". Join ([I, Last]). Strip ()
Self.__dict__.update (state)
In this example, we added a new property fullname and removed two existing properties FirstName and LastName. When you perform unpickle on an instance that has previously been pickle, its previously pickle state is passed as a dictionary to _setstate_ (), which includes the values of the FirstName and LastName properties. Next, combine the two values and assign them to the new attribute fullname. In this process, we removed the old attributes from the state dictionary. Once all instances of previous pickle have been updated and pickle, the _setstate_ () method can now be removed from the class definition.
Modification of the module
Conceptually, a change in the name or location of a module is similar to a change in the class name, but it is handled in a completely different way. That's because the module's information is stored in pickle, not the properties that can be modified by the standard pickle interface. In fact, the only way to change the module information is to perform a find and replace operation on the actual pickle file itself. As to how to do it exactly, it depends on the specific operating system and the tools available to use. Obviously, in this case, you will want to back up your files to avoid errors. However, this change should be very simple, and making changes to the binary pickle format should be as effective as making changes to the text pickle format.
Conclusion
Object persistence depends on the ability of the object serialization of the underlying programming language. For a Python object that means pickle. Python's Pickle provides a robust and reliable basis for effective persistence management of Python objects. In the following resources, you will find information about systems built on the capabilities of Python Pickle.