Python multiprocessing Usage Note [2]-Cross-process object sharing

Source: Internet
Author: User

Original article: http://blog.ftofficer.com/2009/12/python-multiprocessing-2-object-sharing-across-process/

 

Continue to write about Python
The usage notes of multiprocessing. Following the previous process model, this article discusses the cross-process object sharing in multiprocessing.

 

In the MP library, there are three methods for cross-process object sharing. The first method is only applicable to the native machine type, that is, the type in Python. ctypes.Shared
Memory


Shared object through shared memory.Server process

,
That is, a server process is responsible for maintaining all objects, while other processes are connected to the process, and the objects in the server process are operated through the proxy object. The last one is not separately proposed in the MP document, however
And is the most important way to share the MP library.Inheritance

, That is, the inheritance, object in
After the parent process is created through multiprocessing. process, the child process automatically inherits the objects in the parent process.
Operations on some objects are reflected in the same object.

 

The three sharing methods have their own characteristics. Here we will make some simple comparisons.

 

The first is the object type to be shared. See this table:

Sharing Mode Supported types
Shared Memory The type in ctypes is provided by rawvalue, rawarray, and other packaging classes.
Inheritance System kernel objects and objects implemented based on these objects. Including pipe, queue,
Joinablequeue, synchronization object (semaphore, lock, rlock, condition, event, etc)
Server process All objects, you may need to manually provide Proxy)

 

This table summarizes the types supported by three different sharing methods.

 

The simplest of them is shared.
In memory mode, only the data types in ctypes can be shared. Because the MP library itself lacks a naming mechanism, that is, the objects created in one process cannot be stored in another
Processes are referenced by names. Therefore, this sharing method depends on inheritance. The object should be created by the parent process and then referenced by the child process. For an example of this mechanism, see the python documentation.
In the example synchronization types like locks,
Conditions and queues, refer to the test_sharedvalues function.

 

Then the inheritance method. First, we need to explain the inheritance method. inheritance is not an object sharing mechanism in nature. Object sharing is only a side effect. The objects inherited from the parent process by the child process are not necessarily
Shared. Inheritance is essentially a child process generated by the parent process fork automatically inherits the memory status and object descriptor of the parent process. Therefore, sub-ProcessesCopy
A copy
The parent process object, except that when the object encapsulates the descriptor of some system kernel objects, copying this object (and its encapsulated descriptor) Achieves object sharing. Therefore, in the preceding table, only
The system kernel objects and objects implemented based on these objects can be shared through inheritance. Shared Objects are inherited on the Linux platform without any restrictions.
There are fork implementations, so there are some additional restrictions
Therefore, in windows, the inheritance method is almost unusable.

 

Finally, server process is used. This method supports more types than the other two because the model is as follows:

Server Process Model

 

In this model, a manager process is responsible for managing actual objects. The real object is also in the memory space of the Manager process. All processes that need to access this object
Connect to the management process and obtain a proxy object of the object. Normally, this proxy object providesPublic letter
Quantity

The function parameter is pickle, and then transmitted to the management process through the connection. The management process forwards the parameter unpickle to the corresponding actual object.
After the management process pickle, the return value (or exception) is passed back to the client process through connection, and then the proxy object unpickle is returned to the caller or thrown
Exception.

 

Obviously, this model is a typical RPC (Remote Procedure Call) model. Because each client process is actually an object in the Access Manager process, you can use this
Object sharing.

 

The connection between manager and proxy can be a socket-based network connection or a Unix
Pipe. If a socket-based connection is used, you need to call the connect function of the manager object and create a remote Manager process before using the proxy.
Establish a connection. Because the Manager process opens the port to receive the connection, authentication is required. Otherwise, anyone can connect to the Manager to mess up your shared objects. The MP library passes through
Authkey.

 

In implementation, the Manager process is implemented through the multiprocessing. Manager class or the basemanager subclass.
Basemanager provides the register function to register a function to obtain the proxy of the shared object. This function will be called by the client process and then executed in the Manager Process
Line. This function can return a shared object (return the same object for all calls), or create a new object for each call, the former allows multiple processes to share an object.
For more information, see the python documentation.
In the example "demonstration of how to create and
Use customized managers and proxies ".

 

The typical code for exporting a shared object is:

 

Objecttype object _ <br/> class objectmanager (multiprocessing. Managers. basemanager): Pass <br/> objectmanager. Register ("object", lambda: object _)

 

Note the four words "public functions" I mentioned when I introduced the proxy object above. Each proxy object exports only the public functions of the actual object. There are two meanings:
", That is, all members starting with a non-underline, and the other is a" function ", that is, all callable members. This imposes some restrictions. One is that attributes cannot be exported, and the other is that some common special letters cannot be exported.
Number, for example, _ Get __,
_ Next. There is a set of processing for this MP library, that is, the custom proxy object. First, the register of basemanager can provide
Proxy_type is the third parameter, which specifies which members need to be exported. For detailed usage, see the first example in the document.

 

In addition, the manager has some details to note. Because the proxy object is not thread-safe, if you need to use proxy in a multi-threaded program, the MP library will
Each thread creates a proxy object, and each proxy object
Process creates a connection, and the manager creates a separate thread for each connection to serve it. The problem is that if the customer process has many threads, it is easy
As a result, the number of FD of the Manager process reaches the limit of ulimit. Even if the limit is not reached, the Manager process is too multithreading, which seriously affects the manager.
Yes. The solution can be a cache in a process. Only one separate thread can create a proxy object to access the shared object. Other threads can only access the cache in the process.

 

Once the manager has reached the ulimit limit or other exceptions, the Manager will exit directly. Unfortunately, the established PROXY will try to reconnect at this time.
Manager-but it does not exist anymore. This will cause the customer process hang to call the proxy function. At this time, there is no other way except killing the process.

 

In addition, the proxy uses the socket method to compare tricky, so there is a lot of conflict with the built-in socket library, such
Socket. setdefatimetimeout (Python issue 6056

). After setdefatimetimeout is called, all sockets created through the socket module in the process are set to the Unblock mode, but the MP
Library does not know this, and it always assumes that the socket is in block mode, so once setdefatimetimeout is called, all the proxy function calls
Oserror will be thrown when used, and the error code is 11. The error cause is very misleading "resource temporarily
Unavailable "is actually eagain. This error can be caused by a patch provided by me.
To remedy (this patch also contains some other fixes, so please check and modify the patch on your own ).

 

For some of the above reasons
As an object sharing mode, the process mode provides the most flexible sharing mode, but it also has the most problems. This is measured by yourself during use. Currently, our system
Data reliability requirements are not high, and data loss is acceptable. However, we only use this mode to maintain the statistical value and do not dare to maintain more things.

 

The cross-process shared object is written here, And the content to be continued ......

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.