Tutorial on implementing a distributed process in a Python Program
This article mainly introduces how to implement distributed processes in Python programs. It is very useful in multi-process programming. The sample code is based on Python2.x. For more information, see
Process should be preferred in Thread and Process because Process is more stable and can be distributed to multiple machines, while Thread can be distributed to multiple CPUs of the same machine at most.
The multiprocessing module of Python not only supports multi-process, but also supports multi-process distribution on multiple machines. A service process can be used as a scheduler to distribute tasks to multiple other processes and rely on network communication. Because the managers module is well encapsulated, you can easily write distributed multi-process programs without having to know the details of network communication.
For example, if we already have a multi-process program that uses Queue communications to run on the same machine, now, due to the heavy workload of processing the task, you want to distribute the processes for sending and processing tasks to two machines. How to use distributed processes?
The original Queue can continue to be used. However, by exposing the Queue through the network through the managers module, other machine processes can access the Queue.
Let's first look at the service process. The service process is responsible for starting the Queue, registering the Queue to the network, and then writing the task to the Queue:
?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
# Taskmanager. py Import random, time, Queue From multiprocessing. managers import BaseManager # Queue for sending tasks: Task_queue = Queue. Queue () # Receiving result queue: Result_queue = Queue. Queue () # QueueManager inherited from BaseManager: Class QueueManager (BaseManager ): Pass # Register both Queue on the network. The callable parameter is associated with the Queue object: QueueManager. register ('get _ task_queue ', callable = lambda: task_queue) QueueManager. register ('get _ result_queue ', callable = lambda: result_queue) # Bind port 5000 and set the verification code 'abc ': Manager = QueueManager (address = ('', 5000), authkey = 'abc ') # Start Queue: Manager. start () # Obtain the Queue object accessed through the network: Task = manager. get_task_queue () Result = manager. get_result_queue () # Put several tasks in: For I in range (10 ): N = random. randint (0, 10000) Print ('put task % d... '% n) Task. put (n) # Read results from the result queue: Print ('try get results ...') For I in range (10 ): R = result. get (timeout = 10) Print ('result: % s' % r) # Disable: Manager. shutdown () |
Note that when we write a multi-process program on a machine, the created Queue can be used directly. However, in a distributed multi-process environment, adding a task to the Queue cannot directly perform operations on the original task_queue, which bypasses the QueueManager encapsulation and must pass the manager. added the Queue interface obtained by get_task_queue.
Then, start the task process on another machine (or on the local machine ):
?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 |
# Taskworker. py Import time, sys, Queue From multiprocessing. managers import BaseManager # Create a similar QueueManager: Class QueueManager (BaseManager ): Pass # Since this QueueManager only obtains Queue from the network, only the name is provided during registration: QueueManager. register ('get _ task_queue ') QueueManager. register ('get _ result_queue ') # Connect to the server, that is, the machine that runs taskmanager. py: Server_addr = '2017. 0.0.1' Print ('connect to server % s... '% server_addr) # Ensure that the port and Verification Code are consistent with those set in taskmanager. py: M = QueueManager (address = (server_addr, 5000), authkey = 'abc ') # Slave network connection: M. connect () # Get the Queue object: Task = m. get_task_queue () Result = m. get_result_queue () # Retrieve the task from the task queue and write the result to the result queue: For I in range (10 ): Try: N = task. get (timeout = 1) Print ('run task % d * % d... '% (n, n )) R = '% d * % d = % d' % (n, n, n * n) Time. sleep (1) Result. put (r) Failed t Queue. Empty: Print ('Task queue is empty .') # Processing ended: Print ('worker exit .') |
To connect a task process to a service process through a network, you must specify the IP address of the service process.
Now, you can try the effects of distributed processes. Start the taskmanager. py service process first:
?
1 2 3 4 5 6 7 8 9 10 11 12 |
$ Python taskmanager. py Put task 3411... Put task 1605... Put task 1398... Put task 4729... Put task 5300... Put task 7471... Put task 68... Put task 4219... Put task 339... Put task 7866... Try get results... |
After the taskmanager process sends the task, it starts to wait for the result queue results. Start taskworker. py now:
?
1 2 3 4 5 6 7 8 9 10 11 12 13 |
$ Python taskworker. py 127.0.0.1 Connect to server 127.0.0.1... Run task 3411*3411... Run task 1605*1605... Run task 1398*1398... Run task 4729*4729... Run task 5300*5300... Run task 7471*7471... Run task 68*68... Run task 4219*4219... Run task 339*339... Run task 7866*7866... Worker exit. |
When the taskworker process ends, the result is printed in the taskmanager process:
?
1 2 3 4 5 6 7 8 9 10 |
Result: 3411*3411 = 11634921 Result: 1605*1605 = 2576025 Result: 1398*1398 = 1954404 Result: 4729*4729 = 22363441 Result: 5300*5300 = 28090000 Result: 7471*7471 = 55815841 Result: 68*68 = 4624 Result: 4219*4219 = 17799961 Result: 339*339 = 114921 Result: 7866*7866 = 61873956 |
What is the use of this simple Manager/Worker model? In fact, this is a simple but real distributed computing. By slightly modifying the code and starting multiple workers, You can distribute tasks to several or even dozens of machines, for example, if you change the code for Calculating n * n to send mail, the mail queue can be asynchronously sent.
Where is the Queue object stored? Note that no Queue code is created in taskworker. py. Therefore, the Queue object is stored in the taskmanager. py process:
Queue can be accessed through the network through QueueManager. Since QueueManager manages more than one Queue, You need to name the network call interface of each Queue, such as get_task_queue.
What is the use of authkey? This is to ensure normal communication between the two machines and prevent malicious interference from other machines. If the authkey of taskworker. py is inconsistent with the authkey of taskmanager. py, the connection will definitely fail.
Summary
Python's distributed process interface is simple and well encapsulated. It is suitable for distributing heavy tasks to multiple machines.
Note that the function of Queue is to pass the task and receive the result. The description data volume of each task should be as small as possible. For example, if you want to send a job to process log files, do not send hundreds of megabytes of log files, but send the complete path of the log files to be stored. The Worker process then reads the files from the shared disk.