Then we tested it in the interactive Python prompt:
>>> Import subprocess
>>> subprocess.check_call ("false")
0
When the other machine runs the same code, it throws the error correctly:
>>> Subprocess.check_call ("false")
Traceback (most recent call last):
File "", Line 1, in
File/ system/library/frameworks/python.framework/versions/2.7/lib/python2.7/subprocess.py ", line 542, in Check_call
Raise Calledprocesserror (Retcode, cmd)
subprocess. Calledprocesserror:command ' false ' returned Non-zero exit status 1
It appears that subprecess mistakenly thought the child process had successfully exited the cause.
In-depth analysis
At first glance, the problem should be the result of Python itself or the operating system. How the hell did this happen? So my colleague looked at Subprocess's Wait () method:
def wait (self): "" "", "" Wait for the child
process to terminate. Returns returncode attribute.
"" " While Self.returncode is None:
try:
pid, sts = _eintr_retry_call (os.waitpid, self.pid, 0)
except OSError as E :
if E.errno!= errno. Echild:
Raise
# This happens if SIGCLD are set to being ignored or waiting
# for child processes has otherwise bee n Disabled for our
# process. This are dead, we can ' t get the status.
PID = self.pid
sts = 0
# Check the PID and loop as Waitpid has been known to return
# 0 even without Wnohan G in odd situations. issue14396.
if pid = = Self.pid:
self._handle_exitstatus (STS) return
Self.returncode
Visible, if the os.waitpid echild detection fails, then the error is not thrown. Typically, when a process finishes, the system continues to record its information until the parent process invokes the wait () method. In the meantime, this process is called "zombie". If the child process does not exist, then we have no way of knowing whether it succeeded or failed.
The above code also solves another problem: Python defaults to the child process exiting successfully. In most cases, this assumption is fine. However, when a process explicitly indicates that the SIGCHLD of the subprocess is ignored, waitpid () will always be successful.
Back to the original code.
Are we explicitly setting the Ignore SIGCHLD in our program? Not likely, because we use a lot of subprocess, but only in rare cases does the same problem occur. After using git grep again, we found that only in a single piece of code we ignored SIGCHLD. But this generation is not part of the program at all, just a quote.
A week later
A week later, the mistake happened again. And through simple debugging, this error is reproduced in debugger.
After some testing, we identified the bug that was caused by the program ignoring SIGCHLD. But how did this happen?
We looked at the standalone code, which included a section:
Signal.signal (signal. SIGCHLD, Signal. Sig_ign)
Did we accidentally import this code into the program? The result shows that our guesses are correct. When this code is import, because the above statement is at the top of this module, rather than in a function, it runs, ignoring the sigchld, causing the child process error not to be thrown!
Summarize
The occurrence of this bug has given us two lessons. The first is that in the Debug check, the new code should be to the old code, and then to the Python Library. Because new code is more likely to be wrong than old code, the Python library is less likely to have errors.
Second, do not write code that may cause side effects to the top of the module, but should be written to Functuon. Because if the module is import, the code at the top level runs, causing various unknown events to occur.