Python multiprocessing Usage Note [3]-about queue

Source: Internet
Author: User

 

Original article: http://blog.ftofficer.com/2009/12/python-multiprocessing-3-about-queue/


Continue to discuss Python multiprocessing. The main content of this discussion is queue, one of the core components of the MP library.

 

Queue is a method in the MP library that provides multi-process Object Exchange. Object switching and the previous part
The object sharing mentioned in this document is to allow multiple processes to access the same object. The difference between the two is that object sharing allows multiple processes to access the same object.
Object Exchange is another process that transfers an object from one process.

 

The queue usage in multiprocessing is similar to the threading. Queue object built in Python. It supports a put operation
The object is put into the queue, and a get operation is also supported to read the object from the queue. Unlike threading. queue, MP. queue is not supported by default.
Join () and task_done operations, which support using the MP. joinablequeue object.

 

Because the queue object is responsible for object transmission between processes, the first problem is how to share the queue object between two processes. The three sharing methods described in the previous section are as follows:
. This is because the queue itself is implemented based on the pipe object of UNIX, while the pipe object sharing needs
It must be inherited. Therefore, in a typical application implementation model, the parent process should create a queue, and then create a sub-process to share the queue, which is read and written by the parent process and the sub-process respectively. For example
This example:

 

Import multiprocessing </P> <p> q = multiprocessing. queue () </P> <p> def reader_proc (): <br/> Print Q. get () </P> <p> reader = multiprocessing. process (target = reader_proc) <br/> reader. start () </P> <p> q. put (100) <br/> reader. join ()

 

Another implementation method is to create a queue for the parent process, and create multiple sub-processes. Some sub-processes read the queue, and some sub-processes write the queue, for example:

 

Import multiprocessing </P> <p> q = multiprocessing. queue () </P> <p> def writer_proc (): <br/> q. put (100) </P> <p> def reader_proc (): <br/> Print Q. get () </P> <p> reader = multiprocessing. process (target = reader_proc) <br/> reader. start () <br/> writer = multiprocessing. process (target = writer_proc) <br/> writer. start () </P> <p> reader. join () <br/> writer. join ()

 

Because the inheritance method is used to share the queue, the Code does not explicitly transmit the code of the queue object itself. It seems that you only need
The program can still work. Otherwise, do I change threading to an existing multi-threaded program?
Can multiprocessing work? Yes, but it is more likely that you will encounter many problems.

 

The first problem is that the queue of MP needs to consider object transmission between multiple processes. Therefore, the transmitted object must be pickle. Otherwise
Output picklingerror.

 

Some other differences are manifested in some technical details. These are not abstracted by any high-level logic, and these differences may lead to some potential errors, such as deadlocks. Summing up these potential mistakes
At the same time, we may also take a brief look at the implementation of the queue in MP, so that we can easily understand why such behavior occurs. These implementation problems only apply to Linux and Windows
The above implementation and problems are not involved here.

 

The MP. queue is built on the pipe of the system, but in fact, the process does not directly write the object into the pipe, but first writes a local buffer, and then a specialized
Feed thread to put it into pipe. The read end reads the object directly from pipe. This feed thread is required to provide the queue interface function.
Put timeout control. However, due to the existence of this feed thread, MP. Queue provides several additional functions to control it. One function is close to stop this thread, and
Join_thread to join this thread. Close is also responsible for refreshing all objects in the buffer to pipe.

 

However, this feed thread is also a trouble maker. To ensure that all the items put into the queue can reach the other end of the process, the MP library registers an atexit processing letter.
To automatically close and join the feed thread when the process exits. This Join Operation brings about many problems, such as potential deadlocks. Consider the following situation: A parent process creates
Two sub-processes are created, one is read, and the other is written. When you need to stop these processes, if the parent process ends the read process but the write process has written too many objects to the queue,
As a result, the subsequent object waits in the buffer, the process will not be able to terminate, because the atexit processing function is waiting to put all the objects in the buffer into pipe, but Pipe
It is full and then stuck in a deadlock.

 

Someone may ask, as long as the process is always stopped in the order of data streams. The problem is that a circular data stream may exist in many complicated system processes. In this case
After all, a process may fall into this situation.

 

Fortunately, the queue object also provides a member function cancel_join_thread, which can prevent join operations when the process is stopped.
It can avoid deadlocks. The cost is that objects that have not been refreshed to pipe will be lost at this time. Because even if join_thread is called, the objects left in pipe may still be lost,
Therefore, once you select to use the MP queue object, do not assume that the object will not be lost in the process.

 

Another possible solution is to use the simplequeue object in the MP library. This object is not mentioned in the document,
Defined in the multiprocessing. Queue module. This object removes the buffer's queue object, soPossible
Yes
Enough to avoid the problem mentioned above. However, simplequeue does not provide put and get timeout processing, and both actions are blocked.

 

In addition to using multiprocessing. queue, you can also use multiprocessing. Pipe for communication. MP. pipe is queue
But there is no timeout control for the feed thread and put/get. To some extent, it is very similar to simplequeue. Note that pipe has a parameter.
Duplex, when set to true (default), pipe is not implemented using the system pipe, but through socketpair, that is, Unix domain
Socket. This is slightly different from pipe in terms of performance.

 

Another method of using queue is not built in the MP library. This method uses the server mentioned in the previous article
Process to share a queue object. This queue object is actually on the server
In process, all sub-processes connect to the server through socket
Process obtains the proxy object of the queue for operations. Speaking of this, someone will think that the MP library has a built-in syncmanager object.
Can be obtained through the Multiprocess. Manager function, through the object's
The queue method can obtain the proxy object of a queue. Unfortunately, this method is not the correct way to get the queue, for the reason as described in the previous article
The syncmanager. Queue method obtains a proxy object for the new object instead of a shared object. Use server correctly
The queue in process is as follows:

 

Common parts:

 

Import multiprocessing. managers as MPM <br/> Import queue </P> <p> class sharedqueuemanager (MPM. basemanager): Pass <br/> q = queue. queue () <br/> sharedqueuemanager. register ('queue ', lambda: Q)

 

Service Process:

 

Mgr = sharedqueuemanager (address = ('', 12345) <br/> Server = Mgr. get_server () <br/> server. serve_forever ()

 

Customer process:

 

Mgr = sharedqueuemanager (address = ('localhost', 12345) <br/> Mgr. connect () <br/> q = Mgr. queue () # Here Q is the proxy object of the shared queue object

 

Compared with the built-in queue of the MP library, this method has some performance impact, because after all, it involves multiple network communications, but the advantage is that there is no series of problems brought by the feed thread, and
Theoretically, data will not be lost unless the server process crashes. But as mentioned in the previous article
Process itself is not very reliable, so here it is just "Theoretically" that data will not be lost.

 

Speaking of performance, here we will list two performance data items, which were previously mentioned on Twitter
Of
(If the two connections are inaccessible, contact me ):

 

The operation object is
For objects of the last 512 bytes after pickle, the performance of the queue operation through proxy is about 7000 times/second (local) or 1100 times/second (multi-host). If you use
Multiprocessing. queue, the efficiency can be up to 54000 times/second.

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.