LINUX application tips: serialize and store Python objects

Source: Internet
Author: User
LINUX application tips: serialize and store Python objects-general Linux technology-Linux programming and kernel information. For more information, see the following. What is persistence?
The basic idea of persistence is simple. Suppose there is a Python program, which may be a program for managing daily to-do items. You want to save the application objects (To-Do items) between multiple executions of this program ). In other words, you want to store the object on the disk for later retrieval. This is durability. There are several methods to achieve this. Each method has its own advantages and disadvantages.
  
For example, object data can be stored in text files of a certain format, such as CSV files. You can also use relational databases, such as Gadfly, MySQL, PostgreSQL, or DB2. These file formats and databases are excellent. For all these storage mechanisms, Python has robust interfaces.
  
These storage mechanisms have one thing in common: the stored data is independent of the objects and programs that operate on the data. The advantage of this is that data can be used as shared resources for other applications. The disadvantage is that, in this way, other programs can access the object data, which violates the object-oriented encapsulation principle-that is, the object data can only be accessed through the public) interface.
  
In addition, the relational database method may not be ideal for some applications. In particular, relational databases do not understand objects. Instead, relational databases forcibly use their own type systems and relational data models (tables). Each table contains a group of tuples (rows ), each row contains a fixed number of static fields (columns ). If the object model of the application cannot be easily converted to the relational model, it is difficult to map objects to tuples and to map them back to objects. This difficulty is often referred to as the impedence-mismatch problem.
  
Object Persistence
If you want to transparently store Python objects without losing their identity and type information, some form of Object serialization is required: it is a process of converting any complex object into the text or binary representation of the object. Similarly, the object must be able to be restored to the original object after being serialized. In Python, this serialization process is called pickle, which can make the object pickle into a string, a file on the disk, or any object similar to a file, you can also unpickle these strings, files, or any objects similar to files into the original objects. We will discuss pickle in detail later in this article.
  
Assuming that you like to save everything as an object and want to avoid converting the object into a non-Object Storage-based overhead, pickle files can provide these benefits, but sometimes it may be more robust and scalable than this simple pickle file. For example, using pickle alone cannot solve the problem of naming and searching for pickle files. In addition, it cannot support concurrent access to persistent objects. If you need these features, you need to seek help from databases such as ZODB (for the Z object database of Python. ZODB is a robust, multi-user and object-oriented database system that can store and manage any complicated Python objects and support transaction operations and concurrency control. (See references to download ZODB .) It is interesting that even ZODB relies on Python's native serialization capability, and to effectively use ZODB, you must fully understand pickle.
  
Another interesting way to solve persistence problems is Prevayler, which was originally implemented in Java (for more information about Prevaylor's developerWorks article, see references ). Recently, a group of Python programmers transplanted Prevayler to Python and named PyPerSyst hosted by SourceForge (for links to the PyPerSyst project, see references ). The Prevayler/PyPerSyst concept is also based on the native serialization capability of Java and Python. PyPerSyst stores the entire Object System in the memory, pickle The System Snapshot to the disk from time to time, and maintain a command log (this log can be used to re-apply the latest snapshot) to provide disaster recovery. Therefore, although applications using PyPerSyst are limited by available memory, the advantage is that the local object system can be fully loaded into the memory, which is extremely fast, in addition, it is easy to implement a database such as ZODB. ZODB allows more objects than can be maintained in the memory at the same time.
  
Now that we have briefly discussed various methods for storing persistent objects, we should discuss the pickle process in detail. Although we are mainly interested in exploring various ways to save Python objects without converting them into some other formats, there are still some things to be concerned about, such: how to effectively pickle and unpickle simple objects and complex objects, including custom class instances, how to maintain object references, including loop references and recursive references, and how to handle changes to class definitions, therefore, there will be no problems when using pickle instances. We will discuss all these issues in subsequent discussions about Python's pickle capabilities.
  
Some pickle Python
The pickle module and Its similar modules cPickle provide pickle support for Python. The latter is coded in C and has better performance. This module is recommended for most applications. We will continue to discuss pickle, but the example in this article actually uses cPickle. Most of the examples are displayed using Python shell, so we will first demonstrate how to import cPickle and reference it as a pickle:
  
>>> Import cPickle as pickle
Now that this module has been imported, let's take a look at the pickle interface. The pickle module provides the following function pairs: dumps (object) returns a string that contains an object in the pickle format; loads (string) returns an object contained in the pickle string; dump (object, file) writes an object to a file. This file can be an actual physical file, but it can also be any object similar to a file. This object has the write () method, A single string parameter is acceptable; load (file) returns the objects contained in the pickle file.
  
By default, dumps () and dump () Use printable ASCII representation to create pickle. Both have a final parameter (Optional). If it is True, this parameter specifies to create pickle in a faster and smaller binary representation. The loads () and load () functions automatically detect whether pickle is in binary or text format.
  
Listing 1 shows an interactive session. Here we use the dumps () and loads () functions described just now:
  
List 1. demo of dumps () and loads ()
Welcome To PyCrust 0.7.2-The Flakiest Python Shell
Stored sored by Orbtech-Your source for Python programming expertise.
Python 2.2.1 (#1, Aug 27 2002, 10:22:32)
[GCC 3.2 (Mandrake Linux 9.0 3.2-1mdk)] on linux-i386
Type "copyright", "credits" or "license" for more information.
>>> Import cPickle as pickle
>>> T1 = ('this is a string', 42, [1, 2, 3], None)
>>> T1
('This is a string', 42, [1, 2, 3], None)
>>> P1 = pickle. dumps (t1)
>>> P1
"(S 'this is a string '\ nI42 \ n (lp1 \ nI1 \ naI2 \ naI3 \ naNtp2 \ n ."
>>> Print p1
(S 'this is a string'
I42
(Lp1
I1
AI2
AI3
ANtp2
.
>>> T2 = pickle. loads (p1)
>>> T2
('This is a string', 42, [1, 2, 3], None)
>>> P2 = pickle. dumps (t1, True)
>>> P2
'(U \ x10this is a stringK *] q \ x01 (K \ x01K \ x02K \ x03eNtq \ x02 .'
>>> T3 = pickle. loads (p2)
>>> T3
('This is a string', 42, [1, 2, 3], None)
  
Note: The pickle format of this text is very simple and will not be explained here. In fact, all the conventions used are recorded in the pickle module. We should also point out that all simple objects are used in our example, so using the binary pickle format will not show much efficiency in space saving. However, in systems that actually use complex objects, you can see that using a binary format can significantly improve the size and speed.
  
Next, let's look at some examples. These examples use dump () and load (), which use objects of files and similar files. The operations of these functions are very similar to the dumps () and loads () We just saw. The difference is that they have another capability-dump () the function can dump several objects one by one to the same file. Load () is then called to retrieve these objects in the same order. Listing 2 shows the practical application of this capability:
  
Listing 2. dump () and load () Examples
>>> A1 = 'apple'
>>> B1 = {1: 'one', 2: 'two', 3: 'Three '}
>>> C1 = ['fum ', 'fie', 'foe', 'fum']
>>> F1 = file ('temp. pkl', 'wb ')
>>> Pickle. dump (a1, f1, True)
>>> Pickle. dump (b1, f1, True)
>>> Pickle. dump (c1, f1, True)
>>> F1.close ()
>>> F2 = file ('temp. pkl', 'rb ')
>>> A2 = pickle. load (f2)
>>> A2
'Apple'
>>> B2 = pickle. load (f2)
>>> B2
{1: 'one', 2: 'two', 3: 'Three '}
>>> C2 = pickle. load (f2)
>>> C2
['Signature', 'fie', 'foe', 'fum ']
>>> F2.close ()
  
The power of Pickle
So far, we have talked about the basic knowledge about pickle. In this section, we will discuss some advanced issues that you will encounter when you start to pickle complex objects, including custom class instances. Fortunately, Python can easily handle this situation.
  
Portability
In terms of space and time, Pickle is portable. In other words, the pickle file format is independent from the machine's architecture, which means, for example, you can create a pickle in Linux and then send it to a Python program running in Windows or Mac OS. In addition, when you upgrade Python to an updated version, you do not have to worry about the possibility of discarding the existing pickle. Python developers have ensured that the pickle format is backward compatible with various Python versions. In fact, the pickle module provides detailed information about the current and supported formats:
  
Listing 3. Search for supported formats
>>> Pickle. format_version
'1. 3'
>>> Pickle. compatible_formats
['1. 0', '1. 1', '1. 2']
  
Multiple references, same object
In Python, variables are references to objects. You can also use multiple variables to reference the same object. It has been proved that Python has no difficulty in maintaining this behavior with pickle objects, as shown in Listing 4:
  
Listing 4. Maintenance of Object Reference
>>> A = [1, 2, 3]
>>> B =
>>>
[1, 2, 3]
>>> B
[1, 2, 3]
>>> A. append (4)
>>>
[1, 2, 3, 4]
>>> B
[1, 2, 3, 4]
>>> C = pickle. dumps (a, B ))
>>> D, e = pickle. loads (c)
>>> D
[1, 2, 3, 4]
>>> E
[1, 2, 3, 4]
>>> D. append (5)
>>> D
[1, 2, 3, 4, 5]
>>> E
[1, 2, 3, 4, 5]
  
Loop reference and recursive reference
You can extend the object reference that you just demonstrated to circular reference (two objects each contain references to the other) and recursive reference (one object contains references to itself ). The following two lists highlight this capability. Let's take a look at recursive references:
  
Listing 5. Recursive reference
>>> L = [1, 2, 3]
>>> L. append (l)
>>> L
[1, 2, 3, [...]
>>> L [3]
[1, 2, 3, [...]
>>> L [3] [3]
[1, 2, 3, [...]
>>> P = pickle. dumps (l)
>>> L2 = pickle. loads (p)
>>> L2
[1, 2, 3, [...]
>>> L2 [3]
[1, 2, 3, [...]
>>> L2 [3] [3]
[1, 2, 3, [...]
  
Now, let's look at an example of circular reference:
  
Listing 6. circular reference
>>> A = [1, 2]
>>> B = [3, 4]
>>> A. append (B)
>>>
[1, 2, [3, 4]
>>> B. append ()
>>>
[1, 2, [3, 4, [...]
>>> B
[3, 4, [1, 2, [...]
>>> A [2]
[3, 4, [1, 2, [...]
>>> B [2]
[1, 2, [3, 4, [...]
>>> A [2] is B
1
>>> B [2] is
1
>>> F = file ('temp. pkl', 'w ')
>>> Pickle. dump (a, B), f)
>>> F. close ()
>>> F = file ('temp. pkl', 'R ')
>>> C, d = pickle. load (f)
>>> F. close ()
>>> C
[1, 2, [3, 4, [...]
>>> D
[3, 4, [1, 2, [...]
>>> C [2]
[3, 4, [1, 2, [...]
>>> D [2]
[1, 2, [3, 4, [...]
>>> C [2] is d
1
>>> D [2] is c
1
  
NOTE: If each object is pickle, rather than all the objects in a single tuple, the results will be slightly different (but important), as shown in listing 7:
  
Listing 7. pickle vs respectively.
>>> F = file ('temp. pkl', 'w ')
>>> Pickle. dump (a, f)
>>> Pickle. dump (B, f)
>>> F. close ()
>>> F = file ('temp. pkl', 'R ')
>>> C = pickle. load (f)
>>> D = pickle. load (f)
>>> F. close ()
>>> C
[1, 2, [3, 4, [...]
>>> D
[3, 4, [1, 2, [...]
>>> C [2]
[3, 4, [1, 2, [...]
>>> D [2]
[1, 2, [3, 4, [...]
>>> C [2] is d
0
>>> D [2] is c
0
  
Equal, but not always the same
As in the previous example, these objects are the same only when they reference the same object in memory. In the pickle case, each object is restored to an object equal to the original object, but not the same object. In other words, each pickle is a copy of the original object:
  
Listing 8. Restored objects as original object copies
>>> J = [1, 2, 3]
>>> K = j
>>> K is j
1
>>> X = pickle. dumps (k)
>>> Y = pickle. loads (x)
>>> Y
[1, 2, 3]
>>> Y = k
1
>>> Y is k
0
>>> Y is j
0
>>> K is j
1
  
At the same time, we can see that Python can maintain references between objects. These objects are pickle as a unit. However, we also see that calling dump () will make Python unable to maintain references to pickle objects outside the unit. On the contrary, Python copies referenced objects and stores copies with pickle objects. There is no problem with pickle and applications that restore a single object hierarchy. But be aware that there are other situations.
  
It is worth noting that there is an option to allow pickle objects separately and maintain mutual references, as long as these objects are all pickle to the same file. The pickle and cPickle modules provide a Pickler (corresponding to Unpickler) that can trace the objects that have been pickle. By using this Pickler, we will use the reference instead of the value to pickle sharing and loop reference:
  
Listing 9. Maintain reference between pickle objects
>>> F = file ('temp. pkl', 'w ')
>>> Pickler = pickle. Pickler (f)
>>> Pickler. dump ()
  
>>> Pickler. dump (B)
  
>>> F. close ()
>>> F = file ('temp. pkl', 'R ')
>>> Unpickler = pickle. Unpickler (f)
>>> C = unpickler. load ()
>>> D = unpickler. load ()
>>> C [2]
[3, 4, [1, 2, [...]
>>> D [2]
[1, 2, [3, 4, [...]
>>> C [2] is d
1
>>> D [2] is c
1
  
Objects that cannot be pickle
Some object types cannot be pickle. For example, Python cannot pickle a file object (or any object with a reference to a file object), because Python cannot guarantee that it can rebuild the state of the file during unpickle (another example is hard to understand, it is not worth mentioning in such articles ). Attempting to pickle the file object causes the following errors:
  
Listing 10. Results of trying to pickle the file object
>>> F = file ('temp. pkl', 'w ')
>>> P = pickle. dumps (f)
Traceback (most recent call last ):
File "", line 1, in?
File "/usr/lib/python2.2/copy_reg.py", line 57, in _ reduce
Raise TypeError, "can't pickle % s objects" % base. _ name __
TypeError: can't pickle file objects
  
Class instance
Compared with the pickle simple object type, you should pay more attention to the pickle class instance. This is mainly because Python will pickle the instance data (usually the _ dict _ attribute) and Class Name, rather than the pickle class code. When Python unpickle class is used, it tries to use the exact class name and module name (including the path prefix of any package) When pickle is used to import modules containing the class definition. In addition, the class definition must appear at the top layer of the module, which means that they cannot be nested classes (classes defined in other classes or functions ).
  
When an instance of the unpickle class is used, its _ init _ () method is usually not called. On the contrary, Python creates a general-purpose class instance, applies the pickle attributes of the instance, and sets the _ class _ attribute of the instance to point it to the original class.
  
The unpickle Mechanism for the new class introduced in Python 2.2 is slightly different from the original one. Although the processing result is actually the same as that of the old class, Python uses the _ reconstructor () function of the copy_reg module to restore the instance of the new class.
  
If you want to modify the default pickle behavior for instances of new or old classes, you can define the methods of special classes: _ getstate _ () and _ setstate _(), python calls these methods when saving and recovering the status information of a class instance. In the following sections, we will see some examples using these special methods.
  
Now let's look at a simple class instance. First, create a Python module of persist. py, which includes the following definitions of new classes:
  
Listing 11. Definitions of new classes
Class Foo (object ):
  
Def _ init _ (self, value ):
Self. value = value
  
Now you can pickle Foo instance and take a look at its representation:
  
Listing 12. pickle Foo instance
>>> Import cPickle as pickle
>>> From Orbtech. examples. persist import Foo
>>> Foo = Foo ('What is a Foo? ')
>>> P = pickle. dumps (foo)
>>> Print p
Ccopy_reg
_ Reconstructor
P1
(COrbtech. examples. persist
Foo
P2
C _ builtin __
Object
P3
NtRp4
(Dp5
S 'value'
P6
S What is a Foo? '
Sb.
>>>
  
The class name Foo and the fully qualified Module name Orbtech. examples. persist are stored in pickle. If this instance is pickle into a file, and then unpickle it later or unpickle on another machine, Python will try to import Orbtech. examples. if the persist module cannot be imported, an exception is thrown. If you rename the class and the module, or move the module to another directory, a similar error occurs.
  
Here is an example of an error message sent by Python. This error occurs when we rename the Foo class and try to mount the Foo instance that has previously been pickle:
  
Listing 13. Trying to mount a pickle instance of the renamed Foo class
>>> Import cPickle as pickle
>>> F = file ('temp. pkl', 'R ')
>>> Foo = pickle. load (f)
Traceback (most recent call last ):
File "", line 1, in?
AttributeError: 'module' object has no attribute 'foo'
  
After renaming the persist. py module, a similar error occurs:
  
Listing 14. Trying to mount a pickle instance of the renamed persist. py Module
>>> Import cPickle as pickle
>>> F = file ('temp. pkl', 'R ')
>>> Foo = pickle. load (f)
Traceback (most recent call last ):
File "", line 1, in?
ImportError: No module named persist
  
We will provide some technology in the following pattern Improvement Section to manage such changes without disrupting the existing pickle.
  
Special status Method
As mentioned above, some object types (such as file objects) cannot be pickle. You can use special methods (_ getstate _ () and _ setstate _ () to modify the status of a class instance when processing instance attributes of an object that cannot be pickle. Here is an example of the Foo class. We have modified it to process file object attributes:
  
Listing 15. Handling instance properties that cannot be pickle
Class Foo (object ):
  
Def _ init _ (self, value, filename ):
Self. value = value
Self. logfile = file (filename, 'w ')
  
Def _ getstate _ (self ):
"Return state values to be pickled ."""
F = self. logfile
Return (self. value, f. name, f. tell ())
  
Def _ setstate _ (self, state ):
"Restore state from the unpickled state values ."""
Self. value, name, position = state
F = file (name, 'w ')
F. seek (position)
Self. logfile = f
  
For an instance of pickle Foo, Python only returns the value of pickle when it calls the _ getstate _ () method of the instance. Similarly, in the case of unpickle, Python will provide the _ setstate _ () method that passes through the unpickle value as a parameter to the instance. In the _ setstate _ () method, you can reconstruct the file object based on the pickle name and location information, and assign the file object to the logfile attribute of the instance.
  
Mode Improvement
Over time, you will find that you must change the class definition. If you have pickle a class instance and want to change the class, you may want to retrieve and update those instances, so that they can continue to work normally under the new class definition. We have seen some errors when making some changes to the class or module. Fortunately, the pickle and unpickle processes provide some hooks that we can use to support the needs for this mode improvement.
  
In this section, we will explore some methods to predict common problems and how to solve them. Because the pickle class instance code is not supported, you can add, modify, and remove methods without affecting existing pickle instances. For the same reason, you do not have to worry about the attributes of the class. Make sure that the Code module containing the class definition is available in the unpickle environment. You must also plan the changes that may cause unpickle problems, including changing the class name, adding or removing instance attributes, and changing the name or location of the class definition module.
  
Class Name Change
To change the class name without damaging the pickle instance, follow these steps. First, make sure that the definition of the original class is not changed, so that it can be found when unpickle existing instances. Do not change the original name. Instead, create a copy of the class definition in the same module as the original class definition and give it a new class name. Then replace NewClassName with the actual New Class Name and add the following method to the definition of the original class:
  
Listing 16. Change the Class Name: add the method to the original Class Definition
Def _ setstate _ (self, state ):
Self. _ dict _. update (state)
Self. _ class _ = NewClassName
  
When unpickle is an existing instance, Python searches for the definition of the original class and calls the _ setstate _ () method of the instance, at the same time, the _ class _ attribute of the instance will be re-allocated to the new class definition. Once you confirm that all existing instances have been unpickle, updated, and re-pickle, you can remove the old class definition from the source code module.
  
Add and delete attributes
These special status Methods _ getstate _ () and _ setstate _ () allow us to control the status of each instance again and give us the opportunity to process changes in instance properties. Let's take a look at the definition of a simple class. We will add and remove some attributes to it. This is the initial definition:
  
Listing 17. Initial class definition
Class Person (object ):
  
Def _ init _ (self, firstname, lastname ):
Self. firstname = firstname
Self. lastname = lastname
  
Assuming that the Person instance has been created and pickle, we have decided to store only one name attribute, rather than the last name and name respectively. Here is a way to change the class definition, which migrates the previous pickle instance to the new definition:
  
Listing 18. New class definitions
Class Person (object ):
  
Def _ init _ (self, fullname ):
Self. fullname = fullname
  
Def _ setstate _ (self, state ):
If 'fullname' not in state:
First ='
Last ='
If 'firstname' in state:
First = state ['firstname']
Del state ['firstname']
If 'lastname' in state:
Last = state ['lastname']
Del state ['lastname']
Self. fullname = "". join ([first, last]). strip ()
Self. _ dict _. update (state)
  
In this example, we add a new property fullname and remove two existing attributes firstname and lastname. When unpickle is executed on an instance that has previously been pickle, the status of the instance that has previously been pickle is passed as a dictionary to _ setstate _ (), which includes the values of the firstname and lastname attributes. Next, combine these two values and assign them to the new property fullname. In this process, we deleted the old attribute in the state dictionary. After updating and re-installing all pickle instances, you can remove the _ setstate _ () method from the class definition.
  
Module Modification
In terms of concept, the change of the module name or position is similar to the change of the class name, but the processing method is completely different. This is because the module information is stored in pickle, rather than the attributes that can be modified through the standard pickle interface. In fact, the only way to change the module information is to find and replace the actual pickle file itself. It depends on the operating system and available tools. Obviously, in this case, you want to back up your files to avoid errors. However, this change should be very simple, and the change of the binary pickle format should be as effective as the change of the text pickle format.
  
Conclusion
Object Persistence depends on the object serialization capability of the underlying programming language. For Python objects, it means pickle. Python pickle provides a robust and reliable foundation for effective persistent Management of Python objects. In the following references, you will find information about the system built on the Python pickle capability.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.