In this article we will look at what is
python Distributed Process。 Learn about the Python distributed process and what the distributed process can do in Python programming.
In thread and process, the process should be preferred because the process is more stable, and the process can be distributed across multiple machines, and the thread can only be distributed to multiple CPUs on the same machine.
Python's multiprocessing module not only supports multiple processes, where the managers sub-module also supports the distribution of multiple processes across multiple machines. A service process can act as a dispatcher, distributing tasks across multiple processes and relying on network traffic. Because the managers module is well encapsulated, it is easy to write distributed multi-process programs without having to understand the details of network traffic.
For example, if we already have a multi-process program with queue communication running on the same machine, now, due to the heavy workload of processing tasks, we want to distribute the process of sending tasks and the process of processing tasks to two machines. How to implement with distributed process?
The existing queue can continue to be used, but by exposing the queue through the network through the Managers module, the process of other machines can access the queue.
Let's look at the service process, the service process is responsible for starting the queue, registering the queue on the network, and then writing the task to the queue:
# task_master.pyimport random, time, Queuefrom multiprocessing.managers Import basemanager# the queue that sent the task: Task_queue = queue. Queue () # Queues that receive results: Result_queue = queue. Queue () # Queuemanager:class QueueManager (Basemanager) inherited from Basemanager: pass# registers two queues on the network with the callable parameter associated with the queue object: Queuemanager.register (' Get_task_queue ', callable=lambda:task_queue) queuemanager.register (' Get_result_queue ', Callable=lambda:result_queue) # Bind Port 5000, set the Captcha ' abc ': Manager = QueueManager (address= (",", "," Authkey=b ' abc ') # Start Queue:manager.start () # Get a queue object accessed over the network: task = manager.get_task_queue () result = Manager.get_result_queue () # Put a few tasks into it: For I in range: n = random.randint (0, 10000) print (' Put task%d ... '% n) task.put (n) # reads the result from the result queue: P Rint (' Try get results ... ') for I in range: R = Result.get (timeout=10) print (' Result:%s '% r) # closed: Manager.shutdow N ()
Note that when we write multi-process programs on a single machine, the created queue can be used directly, but in a distributed multi-process environment, adding tasks to the queue can not directly manipulate the original task_queue, bypassing the QueueManager encapsulation. The queue interface that must be obtained through Manager.get_task_queue () is added.
Then, start the task process on another machine (it can also be started on this machine):
# Task_master.pyimport random, time, queuefrom multiprocessing.managers import basemanager# the queue to send the task: Task_queue = queue. Queue () # Queues that receive results: Result_queue = queue. Queue () # Queuemanager:class QueueManager (Basemanager) inherited from Basemanager: pass# registers two queues on a network, The callable parameter is associated with the queue object: Queuemanager.register (' Get_task_queue ', callable=lambda:task_queue) queuemanager.register (' Get_result_queue ', callable=lambda:result_queue) # Bind Port 5000, set the Captcha ' abc ': Manager = QueueManager (address= ("', 5000"), Authkey=b ' abc ') # Start Queue:manager.start () # Get a queue object accessed over the network: task = manager.get_task_queue () result = Manager.get_ Result_queue () # Put a few tasks in there: For I in range: n = random.randint (0, 10000) print (' Put task%d ... '% n) task.put (n) # reads results from the result queue: print (' Try get results ... ') for I in range: r = Result.get (timeout=10) print (' Result:%s ') % r) # closed: Manager.shutdown ()
The task process is to connect to the service process over the network, so specify the IP of the service process.
Now you can try out the effects of the distributed process. Start the task_master.py service process first:
$ python3 task_master.py Put Task 3411...Put Task 1605...Put Task 1398...Put Task 4729...Put Task 5300...Put Task 7471...P UT Task 68...Put Task 4219...Put Task 339...Put task 7866...Try Get results ...
After the task_master.py process sends the task, it begins to wait for the result queue. Now start the task_worker.py process:
$ python3 task_worker.pyconnect to Server 127.0.0.1...run task 3411 * 3411...run Task 1605 * 1605...run Task 1398 * 1398: . Run Task 4729 * 4729...run Task 5300 * 5300...run Task 7471 * 7471...run Task * 68...run Task 4219 * 4219...run Task 3 339...run Task 7866 * 7866...worker exit.
The task_worker.py process ends, and the results will continue to print in the task_master.py process:
result:3411 * 3411 = 11634921result:1605 * 1605 = 2576025result:1398 * 1398 = 1954404result:4729 * 4729 = 22363441Resu lt:5300 * 5300 = 28090000result:7471 * 7471 = 55815841result:68 * "4624result:4219 *" * 4219 = 17799961result:339 * 339 = 114921result:7866 * 7866 = 61873956
What is the use of this simple Master/worker model? In fact, this is a simple but real distributed computing, the code is slightly modified to start a number of workers, you can distribute the task to several or even dozens of machines, such as the calculation of the N*n code to send mail, the implementation of the message queue asynchronous delivery.
The reason that the queue can be accessed through the network is achieved through QueueManager. Because QueueManager manages more than one queue, a name is given to each queue's network invocation interface, such as Get_task_queue.
What's the use of Authkey? This is to ensure that the two machines communicate properly, not by other machines malicious interference. If task_worker.py's Authkey and task_master.py's authkey are inconsistent, they must not be connected.
Python's distributed Process interface is simple, well packaged, and suitable for environments where heavy tasks need to be distributed across multiple machines.
Note that queue is used to transfer tasks and receive results, and the amount of descriptive data for each task should be as small as possible. For example, to send a task to process log files, do not send hundreds of megabytes of the log file itself, but send the full path of log file storage, the worker process to share the disk to read the file.