Serialization of
From https://www.liaoxuefeng.com/
In the process of running the program, all the variables are in memory, for example, to define a dict:
d = dict(name=‘Bob‘, age=20, score=88)
Variables can be modified at any time, such as name
change ‘Bill‘
, but once the program is finished, the memory used by the variables is fully recycled by the operating system. If you do not save the modified ‘Bill‘
storage to disk, the next time you rerun the program, the variable is initialized ‘Bob‘
.
The process of changing a variable from memory to a storage or transfer is called serialization, and in Python it's called pickling, which is also called serialization,marshalling,flattening in other languages, and so on.
After serialization, the serialized content can be written to disk or transferred over the network to another machine.
In turn, re-reading the variable contents from the serialized object into memory is called deserialization, i.e. unpickling.
Python provides two modules for serialization: cPickle
and pickle
. The two modules function is the same, the difference cPickle
is written in C language, fast, pickle
is pure python writing, slow, cStringIO
and StringIO
a reason. When using, try the import first cPickle
, if it fails, then import pickle
:
try: import cPickle as pickleexcept ImportError: import pickle
First, we try to serialize and write an object to the file:
>>> d = dict(name=‘Bob‘, age=20, score=88)>>> pickle.dumps(d)"(dp0\nS‘age‘\np1\nI20\nsS‘score‘\np2\nI88\nsS‘name‘\np3\nS‘Bob‘\np4\ns."
pickle.dumps()
Method serializes an arbitrary object into a str and then writes the STR to the file. Or use another method pickle.dump()
to serialize the object directly after it is written to a File-like object:
>>> f = open(‘dump.txt‘, ‘wb‘)>>> pickle.dump(d, f)>>> f.close()
Look at the dump.txt
files that are written, a bunch of messy stuff, all of the information inside the object that Python holds.
When we want to read the object from disk to memory, we can first read the content to one str
, and then deserialize the object with a pickle.loads()
method, or directly pickle.load()
from a method to deserialize the file-like Object
object directly. We open another Python command line to deserialize the object we just saved:
>>> f = open(‘dump.txt‘, ‘rb‘)>>> d = pickle.load(f)>>> f.close()>>> d{‘age‘: 20, ‘score‘: 88, ‘name‘: ‘Bob‘}
The contents of the variable are back!
Of course, this variable and the original variable are completely irrelevant objects, they are just the same content.
The problem with Pickle is the same as for all other programming language-specific serialization problems, that is, it can only be used in Python, and may be incompatible with each other in Python, so it's okay to save only those unimportant data with pickle and not successfully deserialize it.
Json
If we are going to pass objects between different programming languages, we have to serialize the object into a standard format, such as XML, but the better way is to serialize it to JSON, because JSON represents a string that can be read by all languages, easily stored to disk, or transmitted over a network. JSON is not only a standard format, but also faster than XML, and can be read directly in the Web page, very convenient.
JSON represents objects that are standard JavaScript language objects, and JSON and Python have built-in data types that correspond to the following:
JSON type |
Python type |
{} |
Dict |
[] |
List |
"String" |
' Str ' or U ' Unicode ' |
1234.56 |
int or float |
True/false |
True/false |
Null |
None |
Python's built-in json
modules provide a very sophisticated translation of Python objects into JSON format. Let's look at how to turn the Python object into a JSON:
>>> import json>>> d = dict(name=‘Bob‘, age=20, score=88)>>> json.dumps(d)‘{"age": 20, "score": 88, "name": "Bob"}‘
dumps()
method returns one str
, the content is the standard JSON. Similarly, the dump()
method can write JSON directly to one file-like Object
.
To deserialize JSON into a Python object, loads()
or a corresponding load()
method, the former JSON string is deserialized, the latter reads the string from file-like Object
and deserializes:
>>> json_str = ‘{"age": 20, "score": 88, "name": "Bob"}‘>>> json.loads(json_str){u‘age‘: 20, u‘score‘: 88, u‘name‘: u‘Bob‘}
It is important to note that all string objects that are deserialized are, by default, unicode
not str
. Because the JSON standard specifies that JSON encoding is UTF-8, we are always able to correctly str
unicode
convert between Python or JSON strings.
JSON advanced
Python dict
objects can be serialized directly into JSON {}
, but, many times, we prefer to class
represent objects, such as defining Student
classes, and then serializing:
import jsonclass Student(object): def __init__(self, name, age, score): self.name = name self.age = age self.score = scores = Student(‘Bob‘, 20, 88)print(json.dumps(s))
Run the code and relentlessly get one TypeError
:
call last): ...TypeError: <__main__.Student object at 0x10aabef50> is not JSON serializable
The reason for the error is that the Student
object is not an object that can be serialized as JSON.
If even class
instance objects cannot be serialized as JSON, this is certainly unreasonable!
Don't worry, let's take a closer look at dumps()
the parameter list of the method, you can see that the obj
dumps()
method provides a whole bunch of optional parameters in addition to the first required parameter:
Https://docs.python.org/2/library/json.html#json.dumps
These optional parameters are for us to customize JSON serialization. The previous code was unable to Student
serialize the class instance to JSON because by default the dumps()
method does not know how to Student
change the instance to a JSON {}
object.
The optional parameter default
is to turn any object into an object that can be serialized as JSON, we just need to Student
write a conversion function, and then pass in the function:
def student2dict(std): return { ‘name‘: std.name, ‘age‘: std.age, ‘score‘: std.score }print(json.dumps(s, default=student2dict))
In this way, the Student
instance is first student2dict()
converted into a function and dict
then serialized into JSON.
However, the next time you encounter an Teacher
instance of a class, you cannot serialize to JSON. We can steal a lazy, turn any class
instance into dict
:
print(json.dumps(s, default=lambda obj: obj.__dict__))
Because class
the usual instance has a __dict__
property, it is the one dict
that stores the instance variable. There are a few exceptions, such as __slots__
the defined class.
Similarly, if we are going to deserialize JSON into an Student
object instance, the loads()
method first converts an dict
object, and then our incoming object_hook
function is responsible for dict
converting to an Student
instance:
def dict2student(d): return Student(d[‘name‘], d[‘age‘], d[‘score‘])json_str = ‘{"age": 20, "score": 88, "name": "Bob"}‘print(json.loads(json_str, object_hook=dict2student))
The results of the operation are as follows:
<__main__.Student object at 0x10cd3c190>
Prints out an instance object that is deserialized Student
.
Summary
Python language-specific serialization modules are pickle
, but you can use modules if you want to make serialization more generic and more web-compliant json
.
json
Modules dumps()
and loads()
functions are examples of very well-defined interfaces. When we use it, we only need to pass in a required parameter. However, when the default serialization or deserialization mechanism does not meet our requirements, we can also pass in more parameters to customize the serialization or deserialization rules, not only the interface is simple to use, but also to achieve full scalability and flexibility.
Python Serialization-Review