# This is a learning note for the Liaoche teacher Python tutorial
1. Overview
in thread and process, the process should be preferredbecause the process is more stable, and the process can be distributed across multiple machines, and the thread can only be distributed to multiple CPUs on the same machine.
Python's multiprocessing module not only supports multiple processes , where the managers sub-module also supports the distribution of multiple processes across multiple machines . Depending on network communication, a service process can act as a dispatcher and distribute tasks across multiple processes. Because the managers module is well encapsulated, it is easy to write distributed multi-process programs without having to understand the details of network traffic.
Example:
There has been a passing Queue the multi-process program of communication runs on the same machine, and now, due to the heavy workload of processing tasks, we want to distribute the process of sending the task and the process of processing the task to two machines. How to implement with distributed process?
the original Queue can continue to be used, however, by Managers module to Queue exposed through the network, you can let the process of other machines access Queue the
1.1. Preparation of service process
The service process is responsible for initiating Queue , put Queue Register on the network, and then go to Queue write a task inside
# task_master.py
Import random, time, queue
From multiprocessing.managers import Basemanager
# The queue for sending tasks, type class:
Task_queue = queue. Queue ()
# Queue to receive results:
Result_queue = queue. Queue ()
# defines the global variables Task_queue functions to operate on
Def ret_task_queue ():
Global Task_queue
Return Task_queue
Def ret_result_queue ():
Global Result_queue
Return Result_queue
# QueueManager inherited from Basemanager:
Class QueueManager (Basemanager):
Pass
# Two queues are registered on the network, and an executable function is associated with it, and the return value of the function is the queue object
# Give Get_task_queue associated with a Ret_task_queue function. The return value of the function is the Queue object. That is get_task_queue () ==ret_task_queue ()
# equivalent callable assigns the contents of an executable object to the Get_task_queue
# Register () method is to Manager Add a method. In the back,dir (manager), you can see
queuemanager.register (' Get_task_queue ', callable=ret_task_queue)
Queuemanager.register (' Get_result_queue ', callable= ret_result_queue)
# Bind Port 5000, set the verification code ' ABC ', the verification code is to ensure the normal communication between the two machines, not by other machines malicious interference
manager = QueueManager (address= (","), Authkey=b ' abc ') #manger is a class. My understanding is here to manger, which defines two attributes
# Start Queue:
Manager.start ()
# Get a Queue object accessed over the network:
# My understanding is that QueueManager finally made Get_task_queue () a way for the manager.
task = Manager.get_task_queue () # Task It's a class .
result = Manager.get_result_queue ()
# Put a few quests in:
For I in range (10):
n = random.randint (0, 10000)
Print (' Put task%d ... '% n)
Task.put (N)
# read results from the result queue:
Print (' Try get results ... ')
For I in range (10):
R = Result.get (timeout=10)
Print (' Result:%s '% r)
Close
Manager.shutdown ()
Print (' Master exit. ')
1.2. Start the task process
in a distributed multi-process environment, add tasks to Queue can not be directly to the original Task_queue to do so, bypassing the QueueManager the package must pass the Manager.get_task_queue () obtained by Queue Interface Additions
# task_worker.py
Import time, sys, queue
From multiprocessing.managers import Basemanager
# Create a similar QueueManager:
Class QueueManager (Basemanager):
Pass
# Since this queuemanager only gets the queue from the network, it only provides the name when registering:
# Register () is to QueueManager () , a method was added
Queuemanager.register (' Get_task_queue ')
Queuemanager.register (' Get_result_queue ')
# Connect to the server, which is the machine running task_master.py:
server_addr = ' 127.0.0.1 '
Print (' Connect to server%s ... '% server_addr)
# port and Authenticode note remain exactly the same as the task_master.py settings:
m = QueueManager (address= (server_addr, mm), authkey=b ' abc ')
# from the network connection:
M.connect ()
# Gets the object of the queue:
task = M.get_task_queue ()
result = M.get_result_queue ()
# Pull tasks from the task queue and write the results to the result queue:
For I in range (10):
Try
n = task.get (timeout=1)
Print (' Run task%d *%d ... '% (n, N))
R = '%d *%d =%d '% (n, N, n*n)
Time.sleep (1)
Result.put (R)
Except Queue.empty:
Print (' Task queue is empty. ')
# processing Ends:
Print (' Worker exit. ')
Results:
# task_master.py Service Process:
$ Python3 task_master.py
Put Task 3411 ...
Put Task 1605 ...
Put Task 1398 ...
Put Task 4729 ...
Put Task 5300 ...
Put Task 7471 ...
Put Task 68 ...
Put Task 4219 ...
Put Task 339 ...
Put Task 7866 ...
Try Get results ...
# After the task_master.py process sends the task, it waits for the result queue to start. Start the task_worker.py process now
$ Python3 task_worker.py
Connect to server 127.0.0.1 ...
Run Task 3411 * 3411 ...
Run Task 1605 * 1605 ...
Run Task 1398 * 1398 ...
Run Task 4729 * 4729 ...
Run Task 5300 * 5300 ...
Run Task 7471 * 7471 ...
Run Task 68 * 68 ...
Run Task 4219 * 4219 ...
Run Task 339 * 339 ...
Run Task 7866 * 7866 ...
Worker exit.
# task_worker.py process ended, in task_master.py the results will continue to print in the process:
result:3411 * 3411 = 11634921
result:1605 * 1605 = 2576025
result:1398 * 1398 = 1954404
result:4729 * 4729 = 22363441
result:5300 * 5300 = 28090000
result:7471 * 7471 = 55815841
result:68 * 68 = 4624
result:4219 * 4219 = 17799961
result:339 * 339 = 114921
result:7866 * 7866 = 61873956
2. Summary
Where is the queue object stored? Note that there is no code to create a queue in task_worker.py, so the queue object is stored in the task_master.py process.
the reason that the queue can be accessed through the network is achieved through QueueManager . because QueueManager manages more than one queue, a name is given to each queue's network invocation interface, such as Get_task_queue.
What's the use of Authkey? This is to ensure that the two machines communicate properly, not by other machines malicious interference. If task_worker.py's Authkey and task_master.py's authkey are inconsistent, they must not be connected.
Note that queue is used to transfer tasks and receive results, and the amount of descriptive data for each task should be as small as possible . For example, to send a task to process log files, do not send hundreds of megabytes of the log file itself, but send the full path of log file storage, the worker process to share the disk to read the file.
Python Learning note __10.5 Distributed process