In thread and process, the process should be preferred because the process is more stable, and the process can be distributed across multiple machines, and the thread can only be distributed to multiple CPUs on the same machine.
Python's multiprocessing module not only supports multiple processes, where the managers sub-module also supports the distribution of multiple processes across multiple machines. A service process can act as a dispatcher, distributing tasks across multiple processes and relying on network traffic. Because the managers module is well encapsulated, it is easy to write distributed multi-process programs without having to understand the details of network traffic.
For example, if we already have a multi-process program with queue communication running on the same machine, now, due to the heavy workload of processing tasks, we want to distribute the process of sending tasks and the process of processing tasks to two machines. How to implement with distributed process?
The existing queue can continue to be used, but by exposing the queue through the network through the Managers module, the process of other machines can access the queue.
Let's look at the service process, the service process is responsible for starting the queue, registering the queue on the network, and then writing the task to the queue:
# Taskmanager.pyimport random, time, queuefrom multiprocessing.managers import basemanager# queue for sending tasks: Task_queue = Queue.queue () # Queue to receive results: Result_queue = Queue.queue () # Queuemanager:class QueueManager (BaseManager) inherited from Basemanager: pass# Registers two queues on the network, callable parameters associated with the queue object: Queuemanager.register (' Get_task_queue ', callable=lambda:task_ Queue) Queuemanager.register (' Get_result_queue ', callable=lambda:result_queue) # Bind Port 5000, set Authenticode ' abc ': Manager = QueueManager ("address= ('), authkey= ' abc ') # Start Queue:manager.start () # Get a queue object accessed over the network: task = Manager.get_task_ Queue () result = Manager.get_result_queue () # Put a few tasks into it: For I in range: n = random.randint (0, 10000) print (' Put Task%d ... '% n ' task.put (n) # reads results from the result queue: print (' Try get results ... ') for I in range: r = Result.get (timeout =10) print (' Result:%s '% r) # closed: Manager.shutdown ()
Note that when we write multi-process programs on a single machine, the created queue can be used directly, but in a distributed multi-process environment, adding tasks to the queue can not directly manipulate the original task_queue, bypassing the QueueManager encapsulation. The queue interface that must be obtained through Manager.get_task_queue () is added.
Then, start the task process on another machine (it can also be started on this machine):
# taskworker.pyimport time, sys, queuefrom multiprocessing.managers import basemanager# Create a similar Queuemanager:class QueueManager (Basemanager): pass# because this queuemanager only gets the queue from the network, it only provides the name when registering: Queuemanager.register (' Get_task_ Queue ') queuemanager.register (' Get_result_queue ') # connects to the server, which is the machine that runs taskmanager.py: server_addr = ' 127.0.0.1 ' Print (' Connect to server%s ... '% server_addr) # port and Verification code note remain fully consistent with the taskmanager.py setting: M = QueueManager (address= (SERVER_ADDR, 5000), authkey= ' abc ') # From Network connection: M.connect () # Gets the object of the queue: task = m.get_task_queue () result = M.get_result_queue () # Fetch tasks from the task queue, and writes the results to the result queue: For I in range: try: n = task.get (timeout=1) print (' Run task%d *%d ... '% (n, N)) r = '%d *%d =%d '% (n, N, n*n) time.sleep (1) Result.put (R) except Queue.empty: print (' Task Queue is Empty. ') # Processing Ends: print (' worker exit. ')
The task process is to connect to the service process over the network, so specify the IP of the service process.
Now you can try out the effects of the distributed process. Start the taskmanager.py service process first:
$ python taskmanager.py Put Task 3411...Put Task 1605...Put Task 1398...Put Task 4729...Put Task 5300...Put Task 7471...Pu T Task 68...Put Task 4219...Put Task 339...Put task 7866...Try Get results ...
After the TaskManager process sends the task, it begins to wait for the result queue. Now start the taskworker.py process:
$ python taskworker.py 127.0.0.1Connect to server 127.0.0.1...run task 3411 * 3411...run Task 1605 * 1605...run Task 1398 * 1398...run Task 4729 * 4729...run Task 5300 * 5300...run Task 7471 * 7471...run Task * 68...run task 4219 * 4219...ru N Task 339 * 339...run Task 7866 * 7866...worker exit.
The Taskworker process ends, and the results will continue to print in the TaskManager process:
result:3411 * 3411 = 11634921result:1605 * 1605 = 2576025result:1398 * 1398 = 1954404result:4729 * 4729 = 22363441Resu lt:5300 * 5300 = 28090000result:7471 * 7471 = 55815841result:68 * "4624result:4219 *" * 4219 = 17799961result:339 * 339 = 114921result:7866 * 7866 = 61873956
What is the use of this simple Manager/worker model? In fact, this is a simple but real distributed computing, the code is slightly modified to start a number of workers, you can distribute the task to several or even dozens of machines, such as the calculation of the N*n code to send mail, the implementation of the message queue asynchronous delivery.
Where is the queue object stored? Note that there is no code to create a queue in taskworker.py, so the queue object is stored in the taskmanager.py process:
The reason that the queue can be accessed through the network is achieved through QueueManager. Because QueueManager manages more than one queue, a name is given to each queue's network invocation interface, such as Get_task_queue.
What's the use of Authkey? This is to ensure that the two machines communicate properly, not by other machines malicious interference. If taskworker.py's Authkey and taskmanager.py's authkey are inconsistent, they must not be connected.
Summary
Python's distributed Process interface is simple, well packaged, and suitable for environments where heavy tasks need to be distributed across multiple machines.
Note that queue is used to transfer tasks and receive results, and the amount of descriptive data for each task should be as small as possible. For example, to send a task to process log files, do not send hundreds of megabytes of the log file itself, but send the full path of log file storage, the worker process to share the disk to read the file.