What is persistence? Concept
The basic idea of persistence is simple. Suppose you have a Python program that might be a program to manage your daily backlog, and you want to save application objects (to-dos) between multiple executions of the program. In other words, you want to store objects on disk for later retrieval. That's the durability. There are several ways to achieve this, and each has its advantages and disadvantages.
For example, you can store object data in a text file of a format, such as a CSV file. Or you can use a relational database, such as Gadfly, MySQL, PostgreSQL, or DB2. These file formats and databases are excellent, and for all of these storage mechanisms, Python has a robust interface.
Characteristics
These storage mechanisms have one thing in common: the stored data is independent of the objects and programs that manipulate the data. The benefit of this is that the data can be used as a shared resource for other applications. The disadvantage is that, in this way, other programs can be allowed to access the object's data, which violates the principle of object-oriented encapsulation-that is, the object's data can only be accessed through the public interface of the object itself.
Also, for some applications, relational database methods may not be ideal. In particular, relational databases do not understand objects. Instead, relational databases forcibly use their own type systems and relational data models (tables), each containing a set of tuples (rows), each containing a fixed number of static type fields (columns). If an application's object model cannot easily be converted to a relational model, it can be difficult to map objects to tuples and to map tuples back to objects. This difficulty is often referred to as an obstructive mismatch (impedence-mismatch) problem.
Object Persistence
If you want to store Python objects transparently without losing information such as their identity and type, you need some form of object serialization: It is a process of turning arbitrarily complex objects into text or binary representations of objects. Similarly, you must be able to restore an object to its original object in a serialized form. In Python, this serialization process, called Pickle, can be used to pickle objects into strings, files on disk, or any file-like object, or you can unpickle those strings, files, or any file-like object into the original object. We will discuss pickle in detail later in this article.
-
Note:
Pickle
- English [' P?k (?) L
- Beauty [' p?kl]
- N. pickles; bittern; pickled food
- Vt. to soak or marinate;
Suppose you like to save everything as an object, and you want to avoid converting objects to some kind of overhead based on non-object storage, then the pickle file can provide these benefits, but sometimes you might need something more robust and scalable than this simple pickle file. For example, using only pickle does not solve the problem of naming and locating pickle files, nor does it support concurrent access to persistent objects. If you need these features, ask for a database similar to ZODB (Z object Database for Python). ZODB is a robust, multiuser, and object-oriented database system capable of storing and managing arbitrary complex Python objects and supporting transactional operations and concurrency control. (See Resources to download ZODB.) It is interesting enough that even ZODB relies on Python's native serialization capabilities, and to use ZODB effectively, it is necessary to fully understand pickle.
Another interesting approach to solving persistence problems is Prevayler, which was originally implemented in Java (see Resources for developerworks articles on Prevaylor). Recently, a group of Python programmers migrated Prevayler to Python, named Pypersyst, hosted by SourceForge (see Resources for links to pypersyst projects). The Prevayler/pypersyst concept is also built on the native serialization capabilities of the Java and Python languages. Pypersyst saves the entire object system in memory and provides disaster recovery by pickle the system snapshot to disk and maintaining a command log (through which the latest snapshot can be re-applied). So, although applications using Pypersyst are limited by available memory, the benefit is that the native object system can be fully loaded into memory, so it is extremely fast and is implemented to be as simple as a database such as ZODB, ZODB allows more objects than can be persisted in memory at the same time.
Now that we've briefly discussed the various ways to store persistent objects, it's time to explore the pickle process in detail. While we are primarily interested in exploring ways to save Python objects without having to convert them to some other format, we still have a few areas to focus on, such as: How to effectively pickle and unpickle simple objects and complex objects, including instances of custom classes , how to maintain references to objects, including circular references and recursive references, and how to handle changes in class definitions so that no problems occur when using previously pickle instances. We will cover all of these issues in our subsequent discussion of the pickle capabilities of Python.
Pickle
Role: Serialization of objects.
Python version: Pickle for L4 and later versions, Cpickle for 1.5 and later
The pickle module implements an algorithm that can convert an arbitrary Python object into a sequence of bytes. This process is also known as a serialized object. A byte stream that represents an object can be transferred or stored, and then reconstructed to create a new object of the same nature.
The Cpickle module implements the same algorithm, but is implemented in C rather than python. It is several times faster than Python implementations, so this module is often used instead of pure Python implementations.
Pickle's documentation clearly states that it does not provide any security assurances. In fact, pickle can execute arbitrary code for data cancellation. Be careful when using pickle to complete interprocess jie electricity or data storage, and do not trust data that is not secure. See section hmac-, where an example shows the use of a safe way to verify the pickle number of Huai swimming to pass.
Import
Since cpickle is faster than pickle, it is usually first attempted to import Cpickle, and given an alias "Pickle", if the import fails, then the built-in Python implementation in Pickle is used. This means that if there is a faster implementation, the program always prefers to use a faster implementation, otherwise the portable implementation will be used.
try: importas pickleexcept: import pickle
The C and Python versions of the APIs are identical, and the data can be exchanged between programs that use the C and Python repositories.
Encode and decode string data
The first example uses dumps to encode a data structure into a string and then print the string to the console. It uses a data structure that consists entirely of built-in types. An instance of any class can be pickle, as shown in the following example:
try: importas pickleexcept: import pickleimport='a':'A''b':2'c':3.0 } ]print'DATA:'= pickle.dumps(data)print'PICKLE: %r'% data_string
By default, pickle contains only ASCII characters. There is also a more efficient binary pickle format, but all of the examples here use ASCII output because it is easier to understand when printing
DATA:[{'a''A''b'2'c'3.0"(lp1\n(dp2\nS'a'\nS'A'\nsS'c'\nF3\nsS'b'\nI2\nsa."in0.1s]
After the data is serialized, it can be written to a file, socket, or pipe, and so on. You can then read the file, pickle the data, and construct a new object with the same value.
try : Import cpickle as pickleexcept : import Pickleimport pprintdata1 = [{: Span class= "st" > ' A ' , ' B ' : 2 , ' C ' : Span class= "FL" >3.0 }]print ' before: ' , Pprint.pprint (data1) Data1_string = pickle.dumps (data1) data2 = pickle.loads (data1_ String) print ' after: ' , Pprint.pprint (data2) ' same?: ' , (data1 is data2) print , (data1 == data2)
The newly constructed object is equal to the original object, but not the same object.
BEFORE: [{'a''A''b'2'c'3.0}]AFTER : [{'a''A''b'2'c'3.0FalseTruein0.1s]
Process Flow
In addition to dumps and loads,pickle, some convenience functions are provided to handle the flow of class files. You can write multiple objects to a stream, and then read them from the stream without knowing in advance how many objects to write or how large the objects are
Try:ImportCpickle asPickleexcept:ImportPickleImportPprint fromStringioImportStringioclassSimpleobject (Object):def __init__( Self, name): Self. Name=Name Self. name_backwards=name[::-1]returnData=[]data.append (Simpleobject (' Pickle ')) Data.append (Simpleobject (' Cpickle ')) Data.append (Simpleobject (' last '))# Simulate a file with Stringioout_s=Stringio ()# Write to the stream forOinchDataPrint ' WRITING:%s (%s)' %(O.name, O.name_backwards) pickle.dump (o, out_s) Out_s.flush ()# Set up a read-able streamIn_s=Stringio (Out_s.getvalue ())# Read The data while True:Try: o=Pickle.load (in_s)except Eoferror: Break Else:Print ' READ:%s (%s)' %(O.name, O.name_backwards)
This example uses two Stringio buffers to simulate a stream. The first buffer receives the Pickle object. The value is passed to the second buffer, and load () reads the buffer. A simple database format can also use pickle to store objects (see shelve)
WRITING : pickle (elkcip)WRITING : cPickle (elkciPc)WRITING : last (tsal)READ : pickle (elkcip)READ : cPickle (elkciPc)READ in0.1s]
In addition to storing data, pickle is also convenient for interprocess communication. For example, Os.fork and os.pipe can be used to establish worker processes, read job instructions from one pipe, and write the results to another pipeline. The core code that manages the worker thread pool and sends jobs and receives responses can be reused because the job and response objects do not have to be backbone to a particular class s using a pipe or socket, do not forget to flush the output after dumping the individual objects and push the data through the connection to the other end. To understand the reusable worker pool manager, see the multiprocessing module.
Problems with refactoring objects
When processing a custom class, the Pickle class must appear in the namespace of the process where the pickle is read. Only the data for this instance is pickle, not the class definition. The class name is used to look up a constructor to create a new object when Pickle is lifted. The following example writes an instance of a class to a file.
Try:ImportCpickle asPickleexcept:ImportPickleclassSimpleobject (Object):def __init__( Self, name): Self. Name=Name L= List(name) L.reverse () Self. name_backwards= ''. Join (L)returnif __name__ == ' __main__ ': Data=[] Data.append (Simpleobject (' Pickle ')) Data.append (Simpleobject (' Cpickle ')) Data.append (Simpleobject (' last ')) filename= "Test.dat" with Open(FileName,' WB ') asout_s:# Write to the stream forOinchDataPrint ' WRITING:%s (%s)' %(O.name, O.name_backwards) pickle.dump (o, out_s)
Operation Result:
in0.1s]
If you simply try to load the resulting pickle object, it will fail.
Try:ImportCpickle asPickleexcept:ImportPickleImportPprint fromStringioImportStringioImportSysfilename= "Test.dat" with Open(FileName,' RB ') asIn_s:# Read The data while True:Try: o=Pickle.load (in_s)except Eoferror: Break Else:Print ' READ:%s (%s)' %(O.name, O.name_backwards)
The reason for failure is that there is no simpleobject class.
Traceback (most recent call last):。。。。。。 = pickle.load(in_s)AttributeError'module'object'SimpleObject'
The revised version imports simpleobject from the original script, which runs successfully. Add the following import statement at the end of the import list, allowing the script to look up the class and construct the object.
fromimport SimpleObject
Running the modified script now produces the desired result.
in0.1s]
Non-pickle objects
Not all objects can be pickle. sockets, file handles, database presses, and other run-time state objects that depend on the operating system or other processes may not be saved in a meaningful way. If the object contains non-pickle properties, you can define _getstate_()
and _setstate_()
return a subset of the state of the pickle instance. The new class can also be determined to return the parameters to be passed to the class memory allocator ( C._new_()
). The use of these features is described in more detail in the standard library documentation.
Circular references
The Pickle protocol automatically handles circular references between objects, so complex data structures do not require any special processing. Consider the forward graph in the diagram. The diagram contains several loops, but you can still pickle the correct structure and re-add it.
ImportPickleclassNode (Object):"" " A simple digraph """ def __init__( Self, name): Self. Name=Name Self. connections=[]defAdd_edge ( Self, node):"Create an edge between this node and the other." Self. Connections.append (node)def __iter__( Self):return ITER( Self. connections)defPreorder_traversal (Root, seen=None, the parent=None):"" " Generator function to yield the edges in a graph. """ ifSeen is None: Seen= Set()yield(Parent, Root)ifRootinchSeenreturnSeen.add (Root) forNodeinchRoot forParent, subnodeinchPreorder_traversal (node, seen, root):yield(Parent, subnode)defShow_edges (Root):"Print all the edges in the graph." forParent, childinchPreorder_traversal (Root):if notParentContinue Print '%5s -%2s (%s)' % \(Parent.name, Child.name,ID(child))# Set up the nodes.Root=Node (' Root ') A=Node (' A ') b=Node (' B ') c=Node (' C ')# Add edges between them.Root.add_edge (a) Root.add_edge (b) A.add_edge (b) B.add_edge (a) B.add_edge (c) A.add_edge (a)Print ' ORIGINAL GRAPH: 'Show_edges (Root)# Pickle and unpickle the graph to create# A new set of nodes.Dumped=Pickle.dumps (Root) reloaded=Pickle.loads (dumped)Print '\ nRELOADED GRAPH: 'Show_edges (Reloaded)
The re-planted node is not the same object, but the relationship between the nodes is maintained, and if the object has multiple references, only one copy of it is re-planted. To verify these two points, you can check the ID values of the nodes before and after passing through pickle.
ORIGINAL Graph:root -A83389632) A -B (83714736) b -A83389632) b -C83714792) A -A83389632) root -B (83714736) RELOADED Graph:root -A83714904) A -B (83714960) b -A83714904) b -C83715296) A -A83714904) root -B (83714960) [Finishedinch 0.1s]
Python STL Pickle