Once, it was tortured by the system () function, because the system () function was not well understood. Simply knowing that using this function to execute a system command is not enough, it is not sufficient, its return value, the return value of the command it executes, and the reason for the failure of the command execution, which is the point. Originally because of this function risk is more, so abandon not use, use other method. Let's not say what I'm using here, it's important to understand the system () function, because there are still a lot of people using the system () function, and sometimes you have to face it.
Let's take a look at a brief introduction to the System () function:
2 int System (const char *command);
System () executes a command specified in command by CALLING/BIN/SH-C command, and returns after the command have been COM Pleted. During execution of the command, SIGCHLD would be blocked, and SIGINT and Sigquit would be ignored.
The system () function calls/bin/sh to execute the command specified by the parameter,/bin/sh is typically a soft connection, pointing to a specific shell, such as the BASH,-C option, which tells the Shell to read the command from the string command;
During the command execution, SIGCHLD is blocked, as in saying: Hi, kernel, this will not send me sigchld signal, and so on I am busy to say;
During the command execution, SIGINT and sigquit are ignored, meaning that the process receives both signals without any action.
Look again at the system () function return value:
The value returned is-1 on error (e.g. fork (2) failed), and the return status of the command otherwise. This latter return status was in the format specified in Wait (2). Thus, the exit code of the command would be Wexitstatus (status). In case/bin/sh could not being executed, the exit status would be, the a command that does exit (127).
If the value of command is NULL, System () returns nonzero if the shell was available, and zero if not.
In order to better understand the system () function return value, you need to understand its execution, actually the system () function performed three steps:
1.fork a sub-process;
2. Call the EXEC function in the child process to execute the command;
3. Call wait in the parent process to wait for the child process to end.
For fork failure, the system () function returns-1.
If exec executes successfully, that is, command executes successfully, returns the value returned by command via exit or return.
(Note that command execution does not execute successfully, such as command: "rm debuglog.txt", regardless of whether the file does not exist, the command is executed successfully)
The system () function returns 127 if the exec execution fails, that is, if the command is not executed smoothly, such as by signal interruption, or if command commands do not exist at all.
If command is NULL, the system () function returns a value other than 0, typically 1.
Both Popen and system can execute external commands.
The Popen equivalent is to create a pipe, fork, close one end of the pipe, execute exec, and return a standard IO file pointer.
The system is equivalent to calling the fork successively, exec,waitpid to execute the external command
The popen itself is non-blocking and must be blocked by reading the standard IO
The system itself is blocked.
Take a look at the source code of the system () function
After reading these, I want to be sure that someone to the system () function return value is still unclear, see the source of the clearest, the following gives a system () function implementation:
cmdstring int System (const char *)
return (1); If cmdstring is empty, returns a value other than 0, typically 1
status =-1; Fork failed, return-1
Execl ("/bin/sh", "sh", "-C", Cmdstring, (char *) 0);
_exit (127); Exec execution Failure Returns 127, note that EXEC only returns to the current process if it fails, and if successful, the current process does not exist.
while (Waitpid (PID, &status, 0) < 0)
if (errno! = eintr)
status =-1; Returns 1 if the Waitpid is interrupted by a signal
break;
return status; Returns the return status of the child process if the waitpid succeeds
After reading through the simple implementation of the system () function, the return value of the function is clear, so when does the system () function return 0? Returns 0 o'clock only in command commands.
Take a look at how to monitor the system () function execution state
Here's what I'm doing:
if (NULL = = cmdstring)//If the cmdstring is empty, it will go away, although the system () function can handle null pointers Popen and system can execute external commands.
The Popen equivalent is to create a pipe, fork, close one end of the pipe, execute exec, and return a standard IO file pointer.
The system is equivalent to calling the fork successively, exec,waitpid to execute the external command
The popen itself is non-blocking and must be blocked by reading the standard IO
The system itself is blocked. Both Popen and system can execute external commands.
The Popen equivalent is to create a pipe, fork, close one end of the pipe, execute exec, and return a standard IO file pointer.
The system is equivalent to calling the fork successively, exec,waitpid to execute the external command
The popen itself is non-blocking and must be blocked by reading the standard IO
The system itself is blocked.
Status = System (cmdstring);
%s\t printf ("CMD: Error:%s", Cmdstring, Strerror (errno)); Be sure to output or log the errno information here
printf ("Normal termination, exit status =%d\n", Wexitstatus (status)); Get cmdstring Execution Results
+ Else if (wifsignaled (status))
printf ("Abnormal termination,signal number =%d\n", Wtermsig (status)); If the cmdstring is interrupted by the signal, get the signal value
All else if (wifstopped (status))
printf ("Process stopped, signal number =%d\n", Wstopsig (status)); If the cmdstring is paused for signal execution, the signal value is obtained
To get a description of the return value of the child process refer to another article: http://my.oschina.net/renhc/blog/35116
The system () function is easily error-prone, returns too many values, and the return value can easily be confused with the command's return value. It is recommended to use the Popen () function instead, and the simple use of the Popen () function can also be viewed through the links above.
The advantage of the Popen () function over the system () function is that it is simple to use, and the Popen () function returns only two values:
The status of the child process is successfully returned, and the return result of the command can be obtained using the wifexited related macro;
Failure returns-1, we can use the Perro () function or the strerror () function to get useful error information.
This article deals only with the simple use of the system () function, and does not talk about the effects of SIGCHLD, SIGINT, and Sigquit on system () functions, and in fact, this article was written today because the system was used by someone in the project () The function caused a very serious accident. Now, as the system () function executes, an error occurs: "No child Processes".
For the analysis of this error, interested friends can take a look: http://my.oschina.net/renhc/blog/54582
Above this chain on, Split Line, Chapter 2 ——————————————————————————
——————————————————————————————————————-
Today, a program that has been running for nearly a year suddenly hangs up, and the problem is fixed to the system () function, and the simple use of the function is described in my last article:
http://my.oschina.net/renhc/blog/53580
Look at the problem first.
Simply encapsulate the system () function:
1 int pox_system (const char *cmd_line)
3 return System (Cmd_line);
Function call:
2 ret = Pox_system ("gzip-c/var/opt/i00005.xml >/var/opt/i00005.z");
5 Log ("Zip file failed\n");
Problem phenomenon: Every time we execute here, it will zip failed. It is always right to take the command out of the shell and execute it in the shells, in fact the code has been running for a long time and has never had a problem.
Bad log.
System nonblocking Mode Note: ' & ' turns the background while redirecting output. Otherwise, it becomes blocking mode. System nonblocking Mode Note: ' & ' turns the background while redirecting output. Otherwise, it becomes blocking mode. System nonblocking Mode Note: ' & ' turns the background while redirecting output. Otherwise, it becomes blocking mode. When we analyze log, we can only see the "Zip file failed", the information we have customized, and why fail has no clue.
Well, let's try to find out more clues first:
2 ret = Pox_system ("gzip-c/var/opt/i00005.xml >/var/opt/i00005.z");
5 Log ("Zip file failed:%s\n", Strerror (errno)); Try to print out a system error message
We added log, and we got a very useful clue from the errno set by the system () function: the system () function failed because "
No child Processes ". Continue looking for root cause.
Who moved, errno?
We know from the above clue that the system () function sets errno to Echild, but we can't find any information about ehild from the man Manual of the system () function. We know that the system () function executes as: fork ()->exec ()->waitpid (). Obviously Waitpid () is a major suspect, let's check the man manual to see if the function can set Echild:
Echild
(for Waitpid () or Waitid ()) The process specified by PID (Waitpid ()) or Idtype and ID (Waitid ()) does isn't exist or is not a child of the calling Proce SS. (This can happen for one's own child if the action for SIGCHLD are set to Sig_ign. See also the Linux Notes sections about threads.)
Sure enough, if the SIGCHLD signal behavior is set to Sig_ign, the waitpid () function may report a echild error because the child process could not be found. It seems that we have found a solution to the problem: Reset the SIGCHLD signal to the default value before calling the system () function, which is signal (SIGCHLD, SIG_DFL). We are excited to take a look at the Linux notes section and add code tests directly! Sweetie, the problem is solved!
Is it your style to deal with this problem?
As we rush to check in the code, a question arises: "Why did this error not have happened before"? Yes, well-run programs suddenly hang up? First of all, our code has not changed, then it must be an external factor. At the thought of external factors, we began to complain: "Certainly the other group of programs affect us!" "But complaining this is useless, if you think so, then please take out the evidence!" But static analysis is not difficult to find that this can not be the impact of other programs, other processes can not affect the way we process the signal processing.
The system () function did not go wrong before, because the Systeme () function relies on one of the characteristics of the systems, that is, the kernel initialization process when the SIGCHLD signal processing method is SIG_DFL, what does this mean? That is, the kernel discovers the process the child process terminates after sends a SIGCHLD signal to the process, the process receives this signal to use the SIG_DFL way processing, then SIG_DFL is what way? SIG_DFL is a macro that defines a signal handler function pointer, in fact the signal handler does nothing. This is exactly what the system () function requires, and the system () function first fork () a child process to execute a command command, and after execution the system () function uses the Waitpid () function to process the child.
Through the above analysis, we can be aware that the system () before the implementation of the SIGCHLD signal processing method must have changed, no longer is SIG_DFL, as to what becomes temporarily do not know, in fact, we do not need to know, we just need to remember to use System () function before the SIGCHLD signal processing mode is explicitly modified to SIG_DFL mode, while recording the original processing mode, using the system () and then set to the original processing mode. This allows us to block the impact of changes in system upgrades or signal processing patterns.
Validation conjecture
Our company uses the continuous integration + Agile development model, each day by the dedicated team responsible for automating case testing, each time called a build, we analyzed the build and the last build using the system version, found that the version is indeed upgraded. So we found the relevant team to verify, we described the problem in detail, and soon the other gave feedback, the following is the original message reply:
The Libgen has added a new sigchld to the process. to ignore it. To avoid the generation of zombie processes.
It seems our guess is right! Problem analysis Here, the solution is clear, so we modified our Pox_system () function:
a typedef void (*sighandler_t) (int);
Pox_system Int (const char *cmd_line)
sighandler_t Old_handler;
Old_handler = Signal (SIGCHLD, SIG_DFL);
RET = System (Cmd_line);
Signal (SIGCHLD, Old_handler);
I think this is the perfect solution to call system (), while using the Pox_system () function encapsulation brings great maintainability, we just need to modify one of the functions here, and no other calls need to be changed at all.
Later, looked at the other party's modified code, and sure enough to find the answer from the code:
1/* Ignore SIGCHLD to avoid zombie process */
2 if (signal (SIGCHLD, sig_ign) = = Sig_err) {
System nonblocking Mode Note: ' & ' turns the background while redirecting output. Otherwise, it becomes blocking mode. System nonblocking Mode Note: ' & ' turns the background while redirecting output. Otherwise, it becomes blocking mode. System nonblocking Mode Note: ' & ' turns the background while redirecting output. Otherwise, it becomes blocking mode.
Other considerations
Our company's code using the SVN process management, so far there are many branch, gradually, almost every branch has appeared above the problem, so I fix this problem one on each BRANCHC, almost busy a day, because some branch has been locked, Think of the merge code must find the relevant person to explain the seriousness of the problem, but also in different environments to test, I do these side to think, the system so upgrade appropriate?
First of all, because the system upgrade caused our code in the test to find the problem, then hurried to fix, causing our passive, I think this is one of their mistakes. Does the upgrade you do have to take into account the impact on other team? What's more, you're doing a system upgrade. Before upgrading, you need to do a risk assessment, to inform everyone about the possible impact, so that professional.
Furthermore, according to them, modifying the signal processing method is to avoid the zombie process, of course, the original intention is good, but such an upgrade affects the use of some functions, such as the system () function, wait () function, Waipid (), fork () function, these functions are related to the child process, If you want to use Wait () or waitpid () to process a child, you must use the method described above: SIGCHLD signal is set to SIG_DFL processing before the call (in fact, before the fork ()), after the call (in fact, wait ()/waitpid () And then set the signal processing mode to the previous value. Your system upgrades, forcing everyone to improve the code, does improve the quality of the code, but for this upgrade I do not quite agree, imagine, you have seen how many fork ()->waitpid () set the SIGCHLD signal before and after the code?
Recommendations for using the system () function
The more secure usage of calling the system () function is given above, but using the system () function is still prone to error, where is it wrong? That is the return value of the system () function, and the introduction to its return value is in the previous article. The system () function is sometimes convenient, but not abusive!
1. It is recommended that the system () function be used only to execute shell commands, because in general, the system () return value is not 0 to indicate an error;
2, it is recommended to monitor the system () function after the completion of the errno value, for error when giving more useful information;
3. It is recommended to consider the alternative function of the system () function Popen (); its usage is described in another article of mine.
Ps:
1. If the Waitpid () function returns a negative number if the signal is interrupted, continue calling the Waitpid () function.
This includes SIGINT, not violating the POSIX.1 definition.
2.system non-blocking mode Note: ' & ' turns the background while redirecting output. Otherwise, it becomes blocking mode.
3,
Both Popen and system can execute external commands.
The Popen equivalent is to create a pipe, fork, close one end of the pipe, execute exec, and return a standard IO file pointer.
The system is equivalent to calling the fork successively, exec,waitpid to execute the external command
The popen itself is non-blocking and must be blocked by reading the standard IO
The system itself is blocked.
About system calls under Linux