Analysis of python process pool (1)

Last Update:2015-06-09 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Analysis of python process pool (1)
In python, two modules are commonly used to process subprocess and multiprocessing. subprocess is usually used to execute external programs, such as some third-party applications, rather than Python programs. To implement the function of calling external programs, python's psutil module is a better choice. It not only supports the functions provided by subprocess, but also monitors the current host or started external programs, for example, you can obtain network, cpu, memory, and other information usage information to provide more comprehensive support for automated O & M. Multiprocessing is a python multi-process module. It mainly starts the python Process and calls the target callback function to process the task. It corresponds to the threading of the python multi-thread module, which has similar interfaces, by defining multiprocessing. process, threading. thread, specify the target method, and call start () to run the process or Thread. In python, due to the existence of the global interpretation lock (GIL), multithreading does not greatly improve the program running efficiency [1 ]. Therefore, when using python to handle concurrency problems, try to use multiple processes instead of multiple threads. In concurrent programming, the simplest mode is that the main process waits for a task. When a new task arrives, a new process is started to process the current task. In this way, each task is processed by a process, and each task is created, run, and destroyed by a process. If the process runs for a short time, the creation and destruction time accounts for a larger proportion. Obviously, we should try our best to avoid the additional overhead of the process itself and improve the running efficiency of the process. We can use the process pool to reduce Process Creation and overhead and improve the reuse of process objects. In fact, python has implemented a powerful process Pool (multiprocessing. Pool). Here we will briefly analyze how the built-in process Pool of python is implemented. To create a process Pool object, you need to call the Pool function. The function declaration is as follows: Pool (processes = None, initializer = None, initargs = (), maxspertaskchild = None) returns a process pool objectprocesses indicates the number of worker processes. The default value is None, indicating that the number of worker processes is cpu_count () initializer, indicating the initialization function called when the worker process starts. initargs indicates the parameters of the initializer function, if initializer is not set to None, initializer (* initargs) maxtaskperchild will be called before each worker starts to indicate that each worker is in the process of exiting/being replaced by another new process, number of tasks to be completed. The default value is None, indicating that the worker process has the same survival time as the pool, that is, it will not automatically exit/be replaced. The function returns a process Pool object. The process Pool object returned by the Pool function has the following data structures: self. _ inqueue receives the task queue (Pipe), which is used by the main process to send the task to the worker process self. _ outqueue: The sending result Queue (Pipe), used by the worker process to send the result to the main process self. _ taskqueue: the synchronization task queue, which saves the task self assigned to the main process by the thread pool. _ cache = {} task cache self. _ processes worker Process count self. _ pool = [] receives and distributes tasks when the woker process queue process pool is working. The results are returned by the various threads in the process pool. Let's take a look at the threads inside the process pool: _ work_handler, ensures that the worker process in the process pool creates a new worker process and adds it to the process Queue (pools) When exiting, ensure that the number of worker processes in the process pool is always processes. _ Worker_handler the thread callback function is Pool. _ handler_workers method. When the process pool state = RUN, the _ maintain_pool method is called cyclically to monitor whether a process exits, create a new process, and append it to the process pool pools, ensure that the number of worker processes in the process pool is always processes. Self. _ worker_handler = threading. thread (target = Pool. _ handle_workers, args = (self,) Pool. the _ handle_workers method cyclically calls the _ maintain_pool method: def _ maintain_pool (self): if self when the status of the _ worker_handler thread is running (status = RUN. _ join_exited_workers (): self. _ repopulate_pool () _ join_exited_workers (): monitors whether a process in the pools queue has ended. If yes, wait until it ends and delete it from pools. When a process ends, call _ repopulate_pool () to create a new process: w = self. process (target = worker, args = (self. _ inqueue, self. _ outqueue, self. _ initializer, self. _ initargs, self. _ maxtasksperchild) self. _ pool. append (w) w is a newly created process, which is used to process the actual task, and worker is its callback function:

def worker(inqueue, outqueue, initializer=None, initargs=(), maxtasks=None):    assert maxtasks is None or (type(maxtasks) == int and maxtasks > 0)    put = outqueue.put    get = inqueue.get    if hasattr(inqueue, '_writer'):        inqueue._writer.close()        outqueue._reader.close()    if initializer is not None:        initializer(*initargs)    completed = 0    while maxtasks is None or (maxtasks and completed < maxtasks):        try:            task = get()        except (EOFError, IOError):            debug('worker got EOFError or IOError -- exiting')            break        if task is None:            debug('worker got sentinel -- exiting')            break        job, i, func, args, kwds = task        try:            result = (True, func(*args, **kwds))        except Exception, e:            result = (False, e)        try:            put((job, i, result))        except Exception as e:            wrapped = MaybeEncodingError(e, result[1])            debug("Possible encoding error while sending result: %s" % (                wrapped))            put((job, i, (False, wrapped)))        completed += 1    debug('worker exiting after %d tasks' % completed)

All worker processes use the worker callback function to process tasks in a unified manner. From the source code, we can see that the function of worker is to read task tasks from the inqueue, then call the task according to the function and parameter (result = (True, func (* args, ** kwds), and put the result into the result Queue (outqueue ), if maxtasks is restricted by the maximum number of tasks, the process will exit when the number of tasks reaches the upper limit. _ Task_handler thread is responsible for extracting tasks from the task_queue in the process pool and placing them into the receiving task queue (Pipe), self. _ task_handler = threading. thread (target = Pool. _ handle_tasks, args = (self. _ taskqueue, self. _ quick_put, self. _ outqueue, self. _ pool) Pool. the _ handle_tasks method continuously retrieves tasks from task_queue and puts them into the accept task queue (in_queue) to trigger the worker Process for task processing. When reading the None element from task_queue, it indicates that the process pool will be terminate and no longer process subsequent task requests. At the same time, it will put the None element to the acceptance task queue and result task queue, end other threads.

The _ handle_results thread reads the processed task results from outqueue (Pipe) and stores them in the task cache, self. _ result_handler = threading. thread (target = Pool. _ handle_results, args = (self. _ outqueue, self. _ quick_get, self. _ cache) _ terminate. Here _ terminate is not a thread, but a Finalize object self. _ terminate = Finalize (self, self. _ terminate_pool, args = (self. _ taskqueue, self. _ inqueue, self. _ outqueue, self. _ pool, self. _ worker_handler, self. _ task_handler, self. _ result_handler, self. _ cache), exitpriority = 15)

The Finalize constructor is similar to the thread constructor. _ terminate_pool is its callback function and The args callback function parameter. The _ terminate_pool function terminates the work of the Process pool: terminates the preceding three threads, terminates the worker Process in the process pool, and clears the data in the queue. _ Terminate is an object rather than a thread. How does one execute the callback function _ terminate_pool like a thread that calls the start () method? View the Pool source code and find the termination function of the Process pool: def terminate (self): debug ('terminating pooled ') self. _ state = TERMINATE self. _ worker_handler. _ state = TERMINATE self. in the _ terminate () function, the _ terminate object is finally executed as a method, and _ terminate itself is a Finalize object. Let's take a look at the definition of the Finalize class, it is found that the _ call _ method is implemented:

def __call__(self, wr=None):    try:        del _finalizer_registry[self._key]    except KeyError:        sub_debug('finalizer no longer registered')    else:        if self._pid != os.getpid():            res = None        else:            res = self._callback(*self._args, **self._kwargs)        self._weakref = self._callback = self._args = \                        self._kwargs = self._key = None        return res

In the method, the statement self. _ callback (* self. _ args, ** self. _ kwargs) executes the _ terminate_pool function and terminates the process pool.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Analysis of python process pool (1)

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support