Anatomy of a Python process pool (i)

Last Update:2015-06-13 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Two of the modules commonly used to process processes in Python are subprocess and multiprocessing, where subprocess is typically used to execute external programs, such as some third-party applications, rather than Python programs. If you need to implement the ability to invoke external programs, Python's Psutil module is a better choice, it not only supports the functionality provided by subprocess, but also can monitor the current host or the launched external program, such as access to network, CPU, memory and other information usage, Support is more comprehensive when doing some automated operation and maintenance work. Multiprocessing is a multi-process module of Python, mainly by starting the Python process, invoking the target callback function to handle the task, and corresponding to the Python Multithreaded module threading, which have a similar interface, By defining multiprocessing. Process, Threading. Thread, which specifies the target method, which invokes start () to run the process or thread.

The use of multi-threading in Python due to the presence of the global Interpretation Lock (GIL) does not significantly increase the efficiency of the program "1". Therefore, when dealing with concurrency problems with Python, try to use multiple processes rather than multithreading. In concurrent programming, the simplest pattern is that the main process waits for a task, and when a new task arrives, starts a new process to handle the current task. This process of each task, each task will be accompanied by the creation, operation, destruction of a process, if the process of the shorter running time, the creation and destruction of the proportion of time, it is clear that we should try to avoid the creation and destruction of the process itself, the additional overhead, improve the efficiency of the process. We can use process pooling to reduce the creation and overhead of processes and to improve the reuse of process objects.

In fact, Python has implemented a powerful process pool (multiprocessing. Pool), here's a quick analysis of how Python's own process pools are implemented.

To create a process pool object, you call the pool function, which declares the following function:

Pool (Processes=none, Initializer=none, initargs= (), maxtasksperchild=None)    Returns a process pool Objectprocesses represents the number of worker processes, the default is None, which indicates that the number of worker processes is cpu_count () initializer represents the initialization function called when the worker process start. Initargs represents the parameters of the initializer function, and if initializer is not none, initializer (* Initargs) is called before each worker process start Maxtaskperchild indicates the number of work tasks that each worker process needs to complete before exiting/being replaced by another new process, which defaults to none, indicating that the worker process survives the same pool, i.e. it does not automatically exit/ be replaced. function returns a process pool object

There are some data structures in the process pool object returned by the pool function:

Self._inqueue  receives the task queue (Simplequeue), which is used by the main process to send the task to the worker process Self._outqueue  sending the result queue (Simplequeue), The task queue used by the worker process to send results to the main process Self._taskqueue  synchronization, and the task Self._cache = {} task cache that the thread pool assigns to the main process self._ Processes worker Process number Self._pool = [] woker process queue

When a process pool is working, the task receives, assigns. The return of the results is done by the various threads inside the process pool, to see which threads are inside the process pool:

The _work_handler thread, which is responsible for ensuring that worker processes in the process pool have exited, create new worker processes and add them to the process queue (pools), keeping the number of worker processes in the process pool always processes. The _worker_handler thread callback function is the Pool._handler_workers method, whichloops through the _maintain_pool method when the process pool state== run, monitors whether a process exits, and creates a new process. Append to process pool pools, keep the number of worker processes in the process pool always processes.

Self._worker_handler =Threading. Thread (target=Pool._handle_workers, args=(Self,)) The Pool._handle_workers method loops through the _maintain_pool method when the _worker_handler thread state is run (Status==run):Def_maintain_pool (self):IfSelf._join_exited_workers (): Self._repopulate_pool () _join_exited_workers () monitors if the process in the pools queue has ended, and then waits for it to end, and remove from pools, when there is a process at the end, call _repopulate_pool () and create a new process: W =Self. Process (target=Worker, args=(Self._inqueue,Self._outqueue,Self._initializer,Self._initargs,Self._maxtasksperchild) Self._pool.append (W) W is the newly created process, which is the process used to handle the actual task, and the worker is its callback function:def worker (Inqueue, Outqueue, Initializer=none, initargs= (), maxtasks=None):Assert MaxtasksIs Noneor (type (maxtasks) = = int> Maxtasks0) put =Outqueue.put get =Inqueue.getIf Hasattr (Inqueue,‘_writer‘): Inqueue._writer.close () Outqueue._reader.close ()If initializerIsNotNone:initializer (*Initargs) completed =0While MaxtasksIs Noneor (maxtasks< completedMaxtasks):Try: task =Get ()Except(Eoferror, IOError): Debug (‘Worker got Eoferror or IOError--exiting‘)BreakIf taskIsNone:debug (‘Worker got Sentinel--exiting‘)BreakJob, I, Func, args, Kwds =TaskTry: result = (True, func (*args, * *KWDS))ExceptException, E:result =(False, E)try: Put (job, I, result) except Exception as e:wrapped = Maybeencodingerror (E, Result[1 Possible encoding Error while sending result:%s ( wrapped)) Put (Job, I, (False, wrapped)) completed + = 1 debug ( '  completed) All worker processes use the worker callback function to handle the task uniformly, as can be seen from the source: its function is to read the task from the Access Task Queue (inqueue) and then call it according to the function and parameters of the task (result = (True, func (*args, **kwds), then puts the result in the result queue (Outqueue), and if there is a limit of maximum processing limit maxtasks, then when the process is processed to the task the number of hours to exit.

_task_handler thread, which takes the task out of the task_queue in the process pool and puts it in Receive task queue (Pipe),

 self._task _handler = threading. Thread (Target=pool._handle_tasks, Args= (self._taskqueue, self._quick_put, self._outqueue, self._pool)) Pool._handle_tasks method constantly gets the task from Task_queue and puts it in the Accept task queue (In_ Queue), which triggers the worker process for task processing. When reading from Task_queue to the None element, it indicates that the process pool is about to be terminated (terminate), the task request is no longer processed, and the Accept task queue and the result task queue are put to the none element, notifying other threads to end.

The _handle_results thread, which is responsible for reading the results of the completed tasks from the Outqueue (Pipe), is placed in the task cache caches,
```
Self._result_handler = Threading. Thread (Target=pool._handle_results, args= (self._cache))
```

_terminate, the _terminate here is not a thread, but a finalize object

Self._terminate =Finalize (SelfSelf._terminate_pool, args=(Self._taskqueue,Self._inqueue,Self._outqueue,Self._pool,Self._worker_handler,Self._task_handler,Self._result_handler,Self._cache), exitpriority=15The constructor of the finalize class is similar to the thread constructor, and _terminate_pool is its callback function, the parameter of the args callback function. The _terminate_pool function terminates the work of the process pool: Terminates the above three threads, terminates the worker process in the process pool, and clears the data in the queue. _terminate is an object rather than a thread, how does it execute the callback function _terminate_pool like a thread invoking the start () method? View the pool source to discover the process pool termination function:def terminate (self): Debug (' Terminating pool ')Self._state = TERMINATESelf._worker_handler._state = TERMINATEThe Self._terminate () function finally executes the _terminate object as a method, and _terminate itself is a Finalize object, and we look at the definition of the Finalize class and discover that it implements the __call__ method:def __call__ (self, Wr=none):TryDel _finalizer_registry[self._key]Except Keyerror:sub_debug (' finalizer no longer registered ')Else:ifself._pid! = Os.getpid (): res = None Else:res = self._callback (*Self._args, * *Self._kwargs) self._weakref = Self._callback = Self._args = \ self._k Wargs = Self._key = None return res and the method Self._callback (*self._args, **self._kwargs) This statement executes the _terminate_pool letter To terminate the process pool.

The data structure in the process pool, and the relationships between the threads are as follows:

"1" Here is for CPU-intensive programs, multithreading does not bring an increase in efficiency, but also may be due to frequent thread switching, resulting in efficiency degradation, if it is IO-intensive, multi-threaded process can take advantage of IO blocking waiting idle time to execute other threads, improve efficiency.

To be Continued ...

Anatomy of a Python process pool (i)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Anatomy of a Python process pool (i)

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support