How to ensure that the child process exits at the same time without becoming an orphan process when the main process is killed (iii)

Last Update:2016-02-19 Source: Internet

Author: User

Tags signal handler

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The previous two articles discussed how to kill a child process when the process exited unexpectedly, this section we studied under using process pool multiprocessing. Pool, how to ensure that the main process quits unexpectedly, worker processes in the process pool exit at the same time, do not produce orphan process. If you are not clear about the Python standard library process pool, you can look at a few articles that were written earlier. We try to use the process pool in the main process to see if the worker process exits:

1 Import Time2 ImportOS3 ImportSignal4  fromMultiprocessingImportPool5 6 7 defFun (x):8     Print 'Current sub-process pid is%s'%os.getpid ()9      whileTrue:Ten         Print 'args is%s'%x OneTime.sleep (1) A  - defTerm (Sig_num, addtion): -     Print 'Current PID was%s, group ID is%s'%(Os.getpid (), Os.getpgrp ()) the OS.KILLPG (Os.getpgid (Os.getpid ()), signal. SIGKILL) -  - if __name__=='__main__': -     Print 'Current pid is%s'%os.getpid () +Mul_pool =Pool () - signal.signal (signal. SIGTERM, term) +  A      forIinchRange (3): atMul_pool.apply_async (Func=fun, args= (str (i),))

Running the code above, I found that the process exited when I didn't have time to send sigterm through the KILL command, and the worker processes in both the main process and the process pool exited. Combined with the characteristics of the thread, it is possible that the default startup mode is daemon when you create a new worker process. By viewing the source code, the worker process is set to Daemon=true before it is started, that is, the main process does not wait for the worker process to exit, in which case the worker process as a child of the main process will exit with the main process exit, some of the source code is as follows:

1W = self. Process (target=worker,2args=(Self._inqueue, Self._outqueue,3 Self._initializer,4 Self._initargs, Self._maxtasksperchild)5                 )6 Self._pool.append (W)7W.name = W.name.replace ('Process','Poolworker')8W.daemon =True9W.start ()

Then I manually changed the source code, set the daemon to False, and then start the process, found that the phenomenon is still the same, the program just started immediately after the full exit (main process and child process). Strangely, does the meaning of daemon represent a difference in process and thread? Contact the previous two articles on the process pool analysis, found that several threads in the process pool are also set to Daemon=true before starting, continue to manually modify the next source, set the thread daemon to false, start the process again, this process continues to run, the main process does not exit, After sending the sigterm signal through the KILL command, the entire process group exits. Encoding, of course, we can not modify the source code, the pool in the standard library provides a join method, which can wait for threads in the process pool and worker processes, and note that calling the Close method before calling the join ensures that the process pool is not receiving new tasks. We are making some changes to the above code:

1 if __name__=='__main__':2     Print 'Current pid is%s'%os.getpid ()3Mul_pool =Pool ()4 signal.signal (signal. SIGTERM, term)5 6      forIinchRange (3):7Mul_pool.apply_async (Func=fun, args=(str (i),))8         9 mul_pool.close ()TenMul_pool.join ()

The program does not exit automatically after it has been changed, but a new problem has arisen, sending the kill command to the process, and the process has not captured the signal and continues to run. A similar problem was found in StackOverflow, which describes the signal in the standard library as follows: A Python signal handler does not get executed inside the low-level (C) signal hand Ler. Instead, the low-level signal handler sets a flag which tells the virtual machine -execute the corresponding Python signal handler at a-later point The next bytecode instruction). This has consequences:

It makes little sense-to-catch synchronous errors like SIGFPE or SIGSEGV That is caused by a invalid operation in C code. Python would return from the signal handler to the C code, which are likely to raise the same signal again, causing Python t O apparently hang. From Python 3.3 onwards, you can use the faulthandler Module to report on synchronous errors.
A long-running calculation implemented purely in C (such as regular expression matching on A large body of text) may R Un uninterrupted for a arbitrary amount of time, regardless of any signals received. The Python signal handlers is called when the calculation finishes.

The standard library's interpretation of signal handler is roughly that the signal processing function in Python is not triggered by a low-level signal processor. Instead, the low-level signal handler sets a flag that tells the virtual machine to execute the signal handler at a later time (for example, the next byte Code directive). The result is:

Synchronization exceptions caused by invalid C code operations, such as SIGFPE, SIGSEGV, are difficult to capture. Python will return from Signal processing to the C code, which is likely to present the same signal again, causing Python to hang.
Long-time computing programs implemented with C (such as regular expressions that match large pieces of text) may not be interrupted for any length of time, regardless of the reception of the signal. When the calculation is complete, the Python signal processing function is executed.

Calling Mul_pool.join causes the main process (thread) to block at the join, meaning it is blocked in the C method Pthread_join call. Pthread_join is not a long-running calculation program, but a system call blocking, however, until it ends, otherwise the signal processing function cannot be executed. The workaround given in the post updates the Python version to 3.3, and the version I'm using is python2.7. Instead of trying to use the python3.3 version, I replaced the join with loop sleep, simply modifying the code above:

1 if __name__=='__main__':2     Print 'Current pid is%s'%os.getpid ()3Mul_pool =Pool ()4 signal.signal (signal. SIGTERM, term)5 6      forIinchRange (3):7Mul_pool.apply_async (Func=fun, args=(str (i),))8         9      whileTrue:TenTime.sleep (60)

　　The entire process group will still be able to exit after receiving the Sigterm command without leaving the orphan process. But think about whether we are doing this somewhat arbitrarily, if some worker processes are running some important business logic, forcing the end may make data loss, or some other difficult to restore the consequences, then there is no more reasonable way, so that the worker process after processing the current round of data, and then exit? The answer is also yes, the Python standard library provides some tools for inter-process synchronization, where we use the event object for synchronization. First we need to pass the multiprocessing. The manager class gets an event object that uses event to control the exit of the worker process and first modifies the worker process's callback function:

1 def Fun (x, event): 2      while  not Event.is_set (): 3         Print ' process%s Running args is%s ' % (Os.getpid (), x)4         time.sleep (3)5     Print  'process%s, call fun finish' % os.getpid ()

The event object is used to control the worker process, and the use of the code is, of course, just a simple example in which the worker process is not as simple as a while. We want to control the worker process exit through the event, then we can see that when event.is_set () = = True, the worker will automatically exit, then can capture sigterm signal, in Signal_ Handler the event object to set:

1 defTerminate (pool, event, Sig_num, addtion):2     Print 'Terminate process%d'%os.getpid ()3     if  notEvent.is_set ():4 Event.set ()5     6 pool.close ()7 Pool.join ()8     9     Print 'exit ...'

In the main process, you first create a manager object that has it to produce the event object, and note that after you create the Manager object, you can see through the background PS command that there is one more process, and actually creating the manager object creates a new process for synchronizing the data. We implement the set event in the signal signal processing function and terminate the process pool, and the signal.signal callback function can only have two parameters, so the partial partial function is still used for processing:

1 if __name__=='__main__':2     Print 'Current pid is%s'%os.getpid ()3Mul_pool =Pool ()4Manager =Manager ()5event =Manager. Event ()6 7Handler =functools.partial (Terminate, Mul_pool, event)8 signal.signal (signal. SIGTERM, Handler)9 Ten      forIinchRange (4): OneMul_pool.apply_async (Func=fun, args=(str (i), event)) A  -      whileTrue: -Time.sleep (60)

Run the program, through the KILL command to send the sigterm signal, observed that the phenomenon is received signal signal, after the execution of the Event.set () method, worker process exit, process pool shutdown, but PS, after the discovery of two processes are running, One is the master process and one is the manager process synchronization object through the process ID and the Strace command. In the code, the main process finally enters the loop sleep state, so when we receive the signal, although the worker process and the process pool end through an event, the main process is still in sleep, so the manager process synchronization object is also exited. So we can simply modify the code to handle it, you can add the manager parameter in the Terminate method, call Manager.shutdown () in the method to close the process synchronization object, and then force the exit. You can also use event in the main process to replace the Whlie true loop. Here we use the first way to simply modify the code above:

1 defTerminate (pool, event, manager, Sig_num, addtion):2     Print 'Terminate process%d'%os.getpid ()3     if  notEvent.is_set ():4 Event.set ()5 6 pool.close ()7 Pool.join ()8 Manager.shutdown ()9     Print 'exit ...'Ten os._exit (0) One  A if __name__=='__main__': -     Print 'Current pid is%s'%os.getpid () -Mul_pool =Pool () theManager =Manager () -event =Manager. Event () -  -Handler =functools.partial (Terminate, Mul_pool, event, manager) + signal.signal (signal. SIGTERM, Handler) -  +      forIinchRange (4): AMul_pool.apply_async (Func=fun, args=(str (i), event)) at  -      whileTrue: -Time.sleep (60)

How to ensure that the child process exits at the same time without becoming an orphan process when the main process is killed (iii)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More