This article mainly introduced in the Python program to implement the distributed process of the tutorial, in the process of programming is very useful, sample code based on python2.x version, the need for friends can refer to the
In thread and process, the process should be optimized because the process is more stable and the process can be distributed across multiple machines, while the thread can only be distributed to multiple CPUs on the same machine.
Python's multiprocessing module not only supports multiple processes, where managers modules also support the spread of multiple processes across multiple machines. A service process can act as a dispatcher, distributing tasks across multiple processes and relying on network traffic. Because the managers module is well packaged, you can easily write a distributed, multiple-process program without having to understand the details of the network traffic.
For example: if we already have a multiple-process program that communicates through the queue running on the same machine, now, because of the heavy processing task, we want to distribute the process of sending the task and the process of processing the task to two machines. How do I implement it with a distributed process?
The original queue can continue to be used, but the managers module allows the queue to be exposed through the network, allowing the process of other machines to access the queue.
We look at the service process first, the service process starts the queue, registers the queue on the network, and then writes the task to the queue:
|1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 This is the|
Note that when we write a multiple-process program on a machine, the queue created can be used directly, but in a distributed, multiple-process environment, adding a task to the queue does not directly manipulate the original task_queue, bypassing the QueueManager encapsulation, Must be added through the queue interface obtained by Manager.get_task_queue ().
Then, start the task process on the other machine (it can be started on this computer):
|1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 The|
The task process is to connect to the service process over the network, so specify the IP of the service process.
Now you can try to work with the distributed process. Start the taskmanager.py service process first:
|1 2 3 4 5 6 7 8 9 10 11-12||$ python taskmanager.py put task 3411 ... Put task 1605 ... Put task 1398 ... Put task 4729 ... Put task 5300 ... Put task 7471 ... Put task 68 ... Put task 4219 ... Put task 339 ... Put task 7866 ... Try Get results ...|
When the TaskManager process finishes sending the task, it waits for the result queue to begin. Now start the taskworker.py process:
|1 2 3 4 5 6 7 8 9 10 11 12-13||$ python taskworker.py 127.0.0.1 Connect to server 127.0.0.1 ... Run Task 3411 * 3411 ... Run Task 1605 * 1605 ... Run Task 1398 * 1398 ... Run Task 4729 * 4729 ... Run Task 5300 * 5300 ... Run Task 7471 * 7471 ... Run Task 68 * 68 ... Run Task 4219 * 4219 ... Run Task 339 * 339 ... Run Task 7866 * 7866 ... Worker exit.|
The Taskworker process finishes, and the results will continue to print in the TaskManager process:
|1 2 3 4 5 6 7 8 9 10||result:3411 * 3411 = 11634921 result:1605 * 1605 = 2576025 result:1398 * 1398 = 1954404 result:4729 * 4729 = 22363441 result:5300 * 5300 = 28090000 result:7471 * 7471 = 55815841 result:68 * = 4624 result:4219 * 4219 = 17799961 result : 339 * 339 = 114921 result:7866 * 7866 = 61873956|
What's the use of this simple manager/worker model? In fact, this is a simple but true distributed computing, the code slightly modified to start a number of worker, you can spread the task to several or even dozens of machines, such as the calculation of n*n code to send mail, realizes the asynchronous sending of the message queue.
Where is the queue object stored? Note that there is no code in taskworker.py to create the queue, so the queue object is stored in the taskmanager.py process:
And the reason why the queue can access through the network, is through the QueueManager implementation. Because QueueManager manages more than one queue, the network invoke interface for each queue is given a name, such as Get_task_queue.
What's the use of Authkey? This is to ensure that two machines are normal communication, not by other machines malicious interference. If the taskworker.py of Authkey and taskmanager.py authkey inconsistent, certainly not connected.
Python's distributed Process interface is simple, well packaged, and is suited to the need to distribute heavy tasks across multiple machines.
Note that the role of the queue is to pass the task and receive the results, each task to the minimum amount of descriptive data. For example, to send a task to process the log file, do not send hundreds of megabytes of log file itself, but send the full path of log file storage, the worker process to the shared disk read the file.