Python Distributed Process

Source: Internet
Author: User

Distributed processes:

Distributed process refers to the spread of process processes across multiple machines, taking advantage of the performance of multiple machines to complete complex tasks. In thread and process, the process should be preferred because the process is more stable, and the process can be distributed across multiple machines, and the thread can only be distributed to multiple CPUs on the same machine.

Python's multiprocessing module not only supports multiple processes, where the managers sub-module also supports the distribution of multiple processes across multiple machines. A service process can act as a dispatcher, distributing tasks across multiple processes and relying on network traffic. Because the managers module is well encapsulated, it is easy to write distributed multi-process programs without having to understand the details of network traffic.

For example: When doing a crawler, often encounter such a scene, we want to crawl the image of the link address, the link address into the queue, the other process is responsible for reading the link address from the queue to download and store to local. Now that the process is distributed, the process on one machine is responsible for crawling the link, the other machine process is responsible for downloading the storage, then the main problem is to expose the queue to the network, so that other machine processes can be accessed, distributed process is the process of encapsulation, We can refer to this process as a network of this queue.

Creating a distributed process requires a service process and a task process:

Service Process creation:
    • Queues are established for inter-process communication. The service process creates the task queue task_queue, which is used as a channel for passing tasks to the task process, and the service process creates the result queue result_queue, which is the channel to reply to the service process after the task process completes the task. In a distributed multi-process environment, tasks must be added by obtaining the queue interface by QueueManager.
    • The queue established in the first step is registered on the network, exposed to other processes (hosts), and the network queue is acquired after registration, which corresponds to the image of the queue of the team.
    • Establish a dangerously (QueueManager (Basemanager)) instance Manager, bind the port and verify the password.
    • Start the example set up in the third step, which is to start the management manager and supervise the information channel
    • A queue object accessed over the network is obtained by managing the instance, and the network queue is then manifested as a local queue that can be used.
    • Creates a task into the "local" queue, automatically uploads the task to the network queue, and assigns it to the task process for processing.

Note: I am here based on Windows OS, Linux system will be different

# coding:utf-8# taskmanager.py for winimport queuefrom multiprocessing.managers import basemanagerfrom multiprocessing Import freeze_support# Task Count Task_num = 10# define send and receive queue Task_queue = Queue.queue (task_num) result_queue = Queue.queue (task_num) Def get_task (): Return Task_queuedef Get_result (): Return result_queue# Create a similar queuemanagerclass QueueManager (Basemana GER): Passdef Win_run (): # The binding call interface in Windows cannot use lambda, so you can only define the function and then bind the Queuemanager.register (' Get_task_queue ', callable =get_task) queuemanager.register (' Get_result_queue ', Callable=get_result) # Bind the port and set the authentication password, Windows needs to fill in the IP address, Linux is not filled, The default is local manager = QueueManager (address= (' 127.0.0.1 ', 4000), authkey= ' qty ') # start Manager.start () # Get task queue and result team over the network Column task = Manager.get_task_queue () result = Manager.get_result_queue () Try: # Add task for I in range (10 ): print ' Put task%s ... '% i task.put (i) print ' Try get result ... ' For I in range (10) : print ' result is%s ' % Result.get (timeout=10) except:print ' manage error ' finally: # Be sure to turn it off, otherwise it will report an error that manages not closed manager.sh Utdown () print ' Master exit! ' if __name__ = = ' __main__ ': # Windows under multiple processes may be problematic, adding this sentence can alleviate Freeze_support () Win_run ()
Task process
    • Using QueueManager to register the method name used to get the queue, the task process can only get the queue on the network by name
    • Connection server, port and authentication password attention remains exactly the same as in the service process
    • Get a queue from the network for localization
    • Gets the task from the tasks queue and results The result queue
# coding:utf-8import timefrom multiprocessing.managers import BaseManager# 创建类似的QueueManager:class QueueManager(BaseManager):    pass# 第一步:使用QueueManager注册用于获取Queue的方法名称QueueManager.register(‘get_task_queue‘)QueueManager.register(‘get_result_queue‘)# 第二步:连接服务器server_addr = ‘127.0.0.1‘print "Connect to server %s" % server_addr# 端口和验证口令注意保持与服务进程完全一致m = QueueManager(address=(server_addr, 4000), authkey=‘qty‘)# 从网络连接m.connect()# 第三步:获取Queue的对象task = m.get_task_queue()result = m.get_result_queue()# 第四步:从task队列获取任务,并把结果写入result队列:while not task.empty():    index = task.get(True, timeout=10)    print ‘run task download %s‘ % str(index)    result.put(‘%s---->success ‘ % str(index))# 处理结束print ‘worker exit.‘

# # #执行结果

Run first: The service process gets results

put task 0...put task 1...put task 2...put task 3...put task 4...put task 5...put task 6...put task 7...put task 8...put task 9...try get result...

Run Now: The task process gets results and prevents the process from getting results, so be sure to do it immediately.

Connect to server 127.0.0.1run task download 0run task download 1run task download 2run task download 3run task download 4run task download 5run task download 6run task download 7run task download 8run task download 9worker exit.

And finally look back at the results of the Service Process window

put task 0...put task 1...put task 2...put task 3...put task 4...put task 5...put task 6...put task 7...put task 8...put task 9...try get result...result is 0---->success result is 1---->success result is 2---->success result is 3---->success result is 4---->success result is 5---->success result is 6---->success result is 7---->success result is 8---->success result is 9---->success master exit!

This is a simple but true distributed computing, the code is slightly modified to start a number of workers, the task distributed to several or even dozens of machines, the implementation of large-scale distributed crawler

Python Distributed Process

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.