Anatomy of a Python process pool (ii)

Source: Internet
Author: User
Tags iterable

The previous article described the pool of process pools that came with the multiprocessing module in Python, and made a simple analysis of the data structure in the process pool and the relationships between the threads, and this section looks at how the client assigns tasks to the process pool and obtains the results.

We know that when the task queue in the process pool is not empty, the worker process is triggered to work, so how to add tasks to the task queue in the process pool, the process pool class has two key sets of methods to create tasks, namely Apply/apply_async and Map/map_async, In fact, the apply and map methods of the process pool class are similar to the two identically named methods built in Python, and Apply_async and Map_async are their non-blocking versions respectively.

First look at the Apply_async method, the source code is as follows:

 def  Apply_async (Self, Func, args= (), kwds={}, Callback=none):  assert  self._state == RUN result  = return  result  
func means to perform this task
args, Kwds table func positional parameters and keyword parameters
callback represents a single-parameter method, and when a result is returned, the callback method is called , the parameter is the result of the task execution

Each time the Apply_result method is called, a task is actually added to the _taskqueue, noting that this is a non-blocking (asynchronous) invocation, where the newly created task in the Apply_async method is simply added to the task queue, not executed, and no wait is required. Return directly to the created Applyresult object, and note that when the Applyresult object is created, it is placed in the cache _cache of the process pool.

The task queue has a newly created task, then according to the processing process of the previous section analysis, the _task_handler thread of the process pool, takes the task from the Taskqueue and puts it into _inqueue, and triggers the worker process to invoke Func based on args and Kwds. At the end of the run, the result is put into _outqueue, and then by the _handle_results thread in the process pool, the run result is fetched from _outqueue and the Applyresult object in the _cache cache is found, _set its running result, wait for the caller to get.

Since the Apply_async method is asynchronous, how does it know to end the task and get the result? Here you need to understand the two main methods in the Applyresult class:

defGet (self, timeout=None): self.wait (timeout)if  notSelf._ready:RaiseTimeouterrorifself._success:returnSelf._valueElse:        RaiseSelf._valuedef_set (self, I, obj): self._success, Self._value=objifSelf._callback andSelf._success:self._callback (Self._value) self._cond.acquire ()Try: Self._ready=True self._cond.notify ()finally: Self._cond.release ()delSelf._cache[self._job]

As you can see from these two method names, the Get method is provided to the client to get the result of the worker process running, and the result of the run is to call the _set method through the _handle_result thread and store it in the Applyresult object.
The _set method saves the run result in Applyresult._value and wakes up the Get method that blocks on the condition variable. The client returns the result of the run by calling the Get method.

The Apply method runs the fetch process result in a blocking way, its implementation is simple, it is also called Apply_async, but does not return Applyresult, but directly returns the result of the worker process running:

def apply (self, Func, args= (), kwds={}):        assert self._state = =        RUN  Return Self.apply_async (func, args, Kwds). Get ()

The Apply/apply_async method above allows you to assign only one task to the process pool at a time, and you can use the Map/map_async method if you want to assign multiple tasks to the process pool at once. Let's start by looking at how the Map_async method is defined:

defMap_async (Self, func, iterable, Chunksize=none, callback=None):    assertSelf._state = =RUNif  notHasattr (Iterable,'__len__'): Iterable=list (iterable)ifChunksize isnone:chunksize, Extra= Divmod (len (iterable), Len (self._pool) * 4)        ifextra:chunksize+ = 1ifLen (iterable) = =0:chunksize=0 task_batches=Pool._get_tasks (func, iterable, chunksize) result=Mapresult (Self._cache, chunksize, Len (iterable), callback) Self._taskqueue.put ((((Result._job, I, Mapstar, (x,), { })                               forI, XinchEnumerate (task_batches), None) )returnResultfunc represents a method that performs this task iterable means that the task parameter sequence chunksize means that the iterable sequence is split by the size of each set of chunksize. Each segmented sequence is submitted to a task in the process pool for processing callback represents a single-parameter method, and when a result is returned, the callback method is called, and the parameter is the result of the task execution

From the source can be seen, map_async than Apply_async complex, first it will be based on chunksize to the task parameter sequence grouping, chunksize represents the number of tasks in each group, when the default Chunksize=none, Calculates the number of groupings based on the task parameter sequence and the number of processes in the process pool: chunk, extra = Divmod (len (iterable), Len (self._pool) * 4). Assuming that the number of processes in the process pool is len (self._pool) = 4, the task parameter sequence Iterable=range (123), then chunk=7, extra=11, executes down, and chunksize=8, indicating that the task parameter sequence is divided into 8 groups. Task actual grouping:

 task_batches = Pool._get_tasks (func, iterable, chunksize)
def_get_tasks (func, it, size): it=iter (IT) while1: x=Tuple (Itertools.islice (it, size))if notx:return yield(func, X)
This uses yield to compile the _get_tasks method with the genetic builder. In fact, for range (123) Such a sequence, according to Chunksize=8 After grouping, the elements for each group of 16 groups are as follows: (Func, (0,1, 2, 3, 4, 5, 6, 7)) (Func, (8, 9, 10, 11, 12, 13, 14, 15)) (Func, (16, 17, 18, 19, 20, 21, 22, 23))... (Func, (112, 113, 114, 115, 116, 117, 118, 119)) (Func, (120, 121, 122))

After grouping, a Mapresult object is defined here: result = Mapresult (Self._cache, chunksize, Len (iterable), callback) it inherits from the Appyresult class and also provides get and The _set method interface. Put the grouped task into the task queue, and then return the result object that you just created.

 self._ Taskqueue.put ((((Result._job, I, Mapstar, (x,), {})  for< /span> I, x in   enumerate (task_batches)) (None)) Take the task parameter sequence  =range (123 1, 2, 3, 4, 5, 6, 7 8, 9, ten, one, one, one, all, 15), 121, 122),), {}, None)  
Note the i in each tuple, which represents the position of the current tuple in the entire task tuple collection, through which the _ The Handle_result thread will be able to populate the Mapresult object with the results of the worker process running in the correct order.

Note that only one put method is called, and the 16 groups of tuples are put into the task queue as a whole sequence, then whether the task _task_handler thread will also pass the entire task sequence to _inqueue like the Map_async method. This causes only one worker process in the process pool to get to the task sequence, rather than the multi-process approach. Let's look at how the _task_handler thread is handled:

def_handle_tasks (Taskqueue, put, outqueue, pool, cache): Thread=Threading.current_thread () forTaskseq, Set_lengthinchiter (Taskqueue.get, None): I=-1 forI, TaskinchEnumerate (TASKSEQ):ifThread._state:debug ('task handler found thread._state! = RUN')                 Break            Try: Put (Task)exceptException as E:job, Ind= Task[:2]                Try: Cache[job]._set (Ind, (False, E))exceptKeyerror:Pass        Else:            ifSet_length:debug ('doing set_length ()') set_length (i+1)            Continue         Break    Else: Debug ('task handler got Sentinel')

Note that for the statement for I, the task in Enumerate (taskseq), the original _task_handler thread is not directly put into the task sequence after it is obtained through taskqueue   In Inqueue, instead, the tasks in the sequence are placed in _inqueue, followed by the previously divided group, and the task in the loop is each of these task tuples: (result._job, 0, Mapstar, (func, (0, 1, 2, 3, 4, 5, 6, 7)),), {}, None). The worker process is then triggered. The worker process obtains each set of tasks and processes the tasks:

Job, I, Func, args, Kwds =TaskTry:
Result= (True, func (*args, * *Kwds))exceptException, E:
Result=(False, E)try:
Put (Job, I, result)
except Exception as E:
wrapped = Maybeencodingerror (E, result[1])
Debug ("Possible encoding error while sending result:%s"% (
Wrapped))
Put (Job, I, (False, wrapped)))
According to the correspondence of the order, it can be seen that mapstar in the tuple represents here the callback function Func, ((Func, (0,1, 2, 3, 4, 5, 6, 7), and {} represent the args and Kwds parameters respectively.
Note the Func in the tuple represents the task method that is specified when the client assigns the task. Then look at how Mapstar is defined:defMapstar (args):returnMap (*args) After the task parameters are actually grouped, each group of tasks is invoked through the built-in map method.
After running, call put (job, I, result) to put the result into _outqueue, the _handle_result thread pulls the result out of the _outqueue and finds the Mapresult object in the _cache cache, _set its run result

Now let's summarize how the Map_async method of the process pool works, we will use range (123) as the task sequence, pass it to the Map_async method, assuming the chunksize is not specified, and the CPU is four cores, then the method is divided into 16 groups (0~ 14 sets of 8 elements in each group, and the last set of 3 elements). After grouping the task into the task queue, altogether 16 groups, then each process needs to run 4 times to process, each time through the built-in map method, the sequence will execute 8 tasks in the group, then put the result into _outqueue, find the Mapresult object in the _cache cache, _ Set its run result, waiting for the client to get it. using the Map_async method calls multiple worker processes to process the task, each Worler process runs to the end, the result is passed to the _outqueue, and the _handle_result thread writes the result to the Mapresult object. How do you ensure that the order of the result sequence is consistent with the sequence of task arguments passed in when Map_async is called, and we look at the implementation of the Mapresult constructor and _set method.

 def __init__ (self, cache, chunksize, length, callback):
applyresult.__init__ (self, cache, callback)
self._success = True
self._value = [None] * length
self._chunksize = chunksize
if chunksize <= 0:
self._number_left = 0
Self._ready = True
del Cache[self._job]
Else:
Self._number_left = length//chunksize + bool (length% chunksize)

Def_set (self, I, Success_result): Success, result=Success_resultifSuccess:self._value[i*self._chunksize: (i+1) *self._chunksize] =result Self._number_left-= 1ifSelf._number_left = =0:ifSelf._callback:self._callback (Self._value)delSelf._cache[self._job] Self._cond.acquire ()Try: Self._ready=True self._cond.notify ()finally: Self._cond.release ()Else: Self._success=False Self._value=resultdelSelf._cache[self._job] Self._cond.acquire ()Try: Self._ready=True self._cond.notify ()finally: Self._cond.release ()

In the Mapresult class, _value saves the running result of the map_async, and the length of the list,list that is one element to none at the time of initialization is the same as the length of the task parameter sequence, _chunksize indicates how many tasks each group has after grouping the tasks, _number _left indicates how many groups the entire task sequence is divided into. The _handle_result thread will save the worker process's run results to _value through the _set method, so how can you fill the results of the worker process run into the correct location in _value, and remember that Map_async When the queue fills the task, I in each group, I represents the group number of the current task group, the _set method is based on the group number of the current task, the parameter I, and decrements _number_left, when _number_left decrements to 0 o'clock, Indicates that all tasks in the task parameter sequence have been processed by the Woker process, _value are all computed, wake up the conditional variable that is blocking on the Get method, and the client can obtain the running result.

The map function is a blocking version of Map_async, which, on the basis of Map_async, calls the Get method and blocks directly to the result return:

def map (self, func, iterable, chunksize=None):    assert self._state = =    RUN return Self.map_async (func, Iterable, chunksize). Get ()

This section focuses on two sets of interfaces for assigning tasks to process pools: Apply/apply_async and Map/map_async. The Apply method handles one task at a time, the execution method (callback function) of different tasks, the parameters can be different, and the map method can process a task sequence each time, each task executes the same way.

To be Continued ...

Python Process Pool anatomy (ii)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.