Beginner Python reply content: Thank you.
The serialization process is generally described in many Tutorials:
Object 1 -- serialization-> byte string -- deserialization-> Object 2
So many people do not know why serialization is required.
It is estimated that many people have heard that Python has poor performance in processing computing-intensive tasks. Generally, it cannot fully use the advantages of multi-core CPU. At this time, it will use multi-process optimization.
There is a multi-process computing method. The process is divided into master and worker, the master is responsible for scheduling tasks, and the worker is dedicated to computing, such as the Celery database.
Then the problem arises. A task generated in the master must be handed over to the worker for calculation. Because the memory is isolated between processes, the worker cannot directly access the task object.
Therefore, the master needs to express this object to the worker in some way, and the worker can construct this object (the proxy) based on this representation. This process is serialization and deserialization.
Pickle Is a serialization method inside Python, which has good support for Python objects. This Is also the reason why Celery uses pickle by default. Is Celery dependent on pickle?
.
From the serialization perspective, there is no essential difference between the pickle solution and JSON, YAML, XML, and so on.
However, pickle is not secure enough to never deserialize pickle byte strings from untrusted sources. Therefore, the pickle solution is not suitable for network communication. Thank you. Pickle uses strings to represent variables in almost any format (all built-in types + pickle-supported class instances) and is usually used to store intermediate results.
What does this mean? For example, one day you wrote a program that would take a long time to run, so you decided to add the [Save the current progress to a file] function. If you can't finish running the program today, the archive can be read tomorrow to continue the progress of today.
But the problem arises: the "current progress" is not necessarily a string, it may be a list, a dictionary, a set, or even an instance of a class ...... How to write such a mess into a file?
Therefore, pickle is useful.
>>> import pickle>>> data = {... '1': True,... 23.45: str,... print: set(),... b'hello': [0,0,0],... }>>> pickle.dumps(data)b'\x80\x03}q\x00(G@7s33333cbuiltins\nstr\nq\x01cbuiltins\nprint\nq\x02cbuiltins\nset\nq\x03]q\x04\x85q\x05Rq\x06X\x01\x00\x00\x001q\x07\x88C\x05helloq\x08]q\t(K\x00K\x00K\x00eu.'>>> pickle.loads(_){23.45:
,
: set(), '1': True, b'hello': [0, 0, 0]}
Have you ever played a game? Do you know how to Save/Load? The file function provided by python can only store and read data in string format.
Pickle can store and read data in other formats, such as list dict. This is called serialization and deserialization. It converts your data structure into a string and saves it to a file for quick recovery next time, you can also write crawlers before network transmission ...... Accidentally writing crashes ...... At that time, no database was used, and this product was automatically restored. For example, when building a machine learning model, let's talk about decision trees. In general, decision tree models are built first, then pruned, and then made predictions. But what's worse is that, the test data is run on the same tree, but the tree needs to be re-built every time, and most of the time of decision tree creation is wasted, therefore, I can use pickle to save the entire tree during the first full run, and load the entire tree for prediction or pruning when I run the test again. This saves a lot of time. Serialization is useful. serialization scenarios include sessions. In other advanced languages, serialization of objects is troublesome. You need to encode and split objects into strings, then, store the file. Reverse serialization, that is, reading. When the object is restored, it must be decoded, intercepted, and restored to an object. With pickle, you can easily implement it by using dump and load. To put it bluntly, it is a storage and Acquisition Tool. pickle stores the Python Data Structure in another simple form to a file, and then facilitates transfer and propagation, then restore it in the same way.