Errors caused by the system () function under Linux

Source: Internet
Author: User
Tags signal handler

Look at the problem first.

Simply encapsulate the system () function:

1 intpox_system(const char*cmd_line)
2 {
3     returnsystem(cmd_line);
4 }

Function call:
1 intret = 0;
2 ret = pox_system("gzip -c /var/opt/I00005.xml > /var/opt/I00005.z");
3 if(0 != ret)
4 {
5     Log("zip file failed\n");
6 }

Problem phenomenon: Every time we execute here, it will zip failed. It is always right to take the command out of the shell and execute it in the shells, in fact the code has been running for a long time and has never had a problem.

Bad log.

When we analyze log, we can only see the "Zip file failed", the information we have customized, and why fail has no clue.

Well, let's try to find out more clues first:
1 intret = 0;
2 ret = pox_system("gzip -c /var/opt/I00005.xml > /var/opt/I00005.z");
3 if(0 != ret)
4 {
5     Log("zip file failed: %s\n"strerror(errno)); //尝试打印出系统错误信息
6 }

We added log, and we got a very useful clue through the system () function set errno: the System () function failed due to "No child processes". Continue looking for root cause.

Who moved, errno?

We know from the above clue that the system () function sets errno to Echild, but we can't find any information about ehild from the man Manual of the system () function. We know that the system () function executes as: fork ()->exec ()->waitpid (). Obviously Waitpid () is a major suspect, let's check the man manual to see if the function can set Echild:

Echild (for Waitpid () or Waitid ()), the process specified by PID (Waitpid ()) or Idtype and ID (Waitid ()) does not exist or I s not a child of the calling process. (This can happen for one's own child if the action for SIGCHLD are set to Sig_ign. See also the Linux Notes sections about threads.) Sure enough, if the SIGCHLD signal behavior is set to Sig_ign, the waitpid () function may report a echild error because the child process could not be found. It seems that we have found a solution to the problem: Reset the SIGCHLD signal to the default value before calling the system () function, which is signal (SIGCHLD, SIG_DFL). We are excited to take a look at the Linux notes section and add code tests directly! Sweetie, the problem is solved!

Is it your style to deal with this problem?

As we rush to check in the code, a question arises: "Why did this error not have happened before"? Yes, well-run programs suddenly hang up? First of all, our code has not changed, then it must be an external factor. At the thought of external factors, we began to complain: "Certainly the other group of programs affect us!" "But complaining this is useless, if you think so, then please take out the evidence!" But static analysis is not difficult to find that this can not be the impact of other programs, other processes can not affect the way we process the signal processing.

The system () function did not go wrong before, because the Systeme () function relies on one of the characteristics of the systems, that is, the kernel initialization process when the SIGCHLD signal processing method is SIG_DFL, what does this mean? That is, the kernel discovers the process the child process terminates after sends a SIGCHLD signal to the process, the process receives this signal to use the SIG_DFL way processing, then SIG_DFL is what way? SIG_DFL is a macro that defines a signal handler function pointer, in fact the signal handler does nothing. This is exactly what the system () function requires, and the system () function first fork () a child process to execute a command command, and after execution the system () function uses the Waitpid () function to process the child.

Through the above analysis, we can be aware that the system () before the implementation of the SIGCHLD signal processing method must have changed, no longer is SIG_DFL, as to what becomes temporarily do not know, in fact, we do not need to know, we just need to remember to use System () function before the SIGCHLD signal processing mode is explicitly modified to SIG_DFL mode, while recording the original processing mode, using the system () and then set to the original processing mode. This allows us to block the impact of changes in system upgrades or signal processing patterns.

Validation conjecture

Our company uses the continuous integration + Agile development model, each day by the dedicated team responsible for automating case testing, each time called a build, we analyzed the build and the last build using the system version, found that the version is indeed upgraded. So we found the relevant team to verify, we described the problem in detail, and soon the other gave feedback, the following is the original message reply:

The Libgen has added a new sigchld to the process. to ignore it. To avoid the generation of zombie processes.
It seems our guess is right! Problem analysis Here, the solution is clear, so we modified our Pox_system () function:

01 typedefvoid(*sighandler_t)(int);
02 intpox_system(const char*cmd_line)
03 {
04    intret = 0;
05    sighandler_t old_handler;
06
07    old_handler = signal(SIGCHLD, SIG_DFL);
08    ret = system(cmd_line);
09    signal(SIGCHLD, old_handler);
10
11    returnret;
12 }

I think this is the perfect solution to call system (), while using the Pox_system () function encapsulation brings great maintainability, we just need to modify one of the functions here, and no other calls need to be changed at all.

Later, looked at the other party's modified code, and sure enough to find the answer from the code:

1 /* Ignore SIGCHLD to avoid zombie process */
2 if(signal(SIGCHLD, SIG_IGN) == SIG_ERR) {
3     return-1;
4 else{
5     return0;
6 }

Other considerations

Our company's code using the SVN process management, so far there are many branch, gradually, almost every branch has appeared above the problem, so I fix this problem one on each BRANCHC, almost busy a day, because some branch has been locked, Think of the merge code must find the relevant person to explain the seriousness of the problem, but also in different environments to test, I do these side to think, the system so upgrade appropriate?

First of all, because the system upgrade caused our code in the test to find the problem, then hurried to fix, causing our passive, I think this is one of their mistakes. Does the upgrade you do have to take into account the impact on other team? What's more, you're doing a system upgrade. Before upgrading, you need to do a risk assessment, to inform everyone about the possible impact, so that professional.

Furthermore, according to them, modifying the signal processing method is to avoid the zombie process, of course, the original intention is good, but such an upgrade affects the use of some functions, such as the system () function, wait () function, Waipid (), fork () function, these functions are related to the child process, If you want to use Wait () or waitpid () to process a child, you must use the method described above: SIGCHLD signal is set to SIG_DFL processing before the call (in fact, before the fork ()), after the call (in fact, wait ()/waitpid () And then set the signal processing mode to the previous value. Your system upgrades, forcing everyone to improve the code, does improve the quality of the code, but for this upgrade I do not quite agree, imagine, you have seen how many fork ()->waitpid () set the SIGCHLD signal before and after the code?

Recommendations for using the system () function

The more secure usage of calling the system () function is given above, but using the system () function is still prone to error, where is it wrong? That is the return value of the system () function, and the introduction to its return value is in the previous article. The system () function is sometimes convenient, but not abusive!

1. It is recommended that the system () function be used only to execute shell commands, because in general, the system () return value is not 0 to indicate an error;

2, it is recommended to monitor the system () function after the completion of the errno value, for error when giving more useful information;

3. It is recommended to consider the alternative function of the system () function Popen (); its usage is described in another article of mine.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.