What is a zombie process ?
First the kernel releases all the stores used by the terminating process (called the exit system call), closes all open files, and so on, but the kernel holds a certain amount of information for each terminating child process. This information includes at least the process ID, the terminating state of the process, and the CPU time used by the process, so this information is available when the parent process that terminates the child process calls wait or waitpid.
A zombie process is a process in which a process executes the exit system call, and its parent process does not have to bury it (call wait or waitpid to get its end state).
Any child process (except Init) disappears immediately after exit, leaving behind a data structure that is called an outer zombie process, waiting for the parent process to handle it. This is the stage that each child process must undergo. In addition, when the child process exits, it sends a SIGCHLD signal to its parent process.
The purpose of the zombie process?
The purpose of setting the zombie state is to maintain the information of the child process so that the parent process gets it at a later time. This information includes at least the process ID, the terminating state of the process, and the CPU time used by the process, so this information is available when the parent process that terminates the child process calls wait or waitpid. If a process terminates, and the process has a child process that is in zombie state, then the parent process ID of all its zombie child processes is reset to 1 (init process). The init process that inherits these child processes cleans them up (that is, the INIT process will wait for them, thus removing their zombie state).
How to avoid the zombie process?
- By signal (SIGCHLD, sig_ign) notifies the kernel that the end of the child process is not concerned and is reclaimed by the kernel. If you do not want the parent process to hang , you can add a statement to the parent process: signal (sigchld,sig_ign), which indicates that the parent process ignores the SIGCHLD signal, which is sent to the parent process when the child process exits.
- The parent process calls Wait/waitpid and other functions to wait for the child process to end, and if no child process exits wait causes the parent process to block . Waitpid can be returned immediately by passing Wnohang so that the parent process does not block .
- If the parent process is busy, the signal handler function can be registered with signal, and the signal handler function calls Wait/waitpid to wait for the child process to exit.
- Fork is called by two times. The parent process first calls Fork to create a child process and then waitpid waits for the child process to exit, and then quits after the child process fork a grandchild process. This way the process exits after the parent process waits to be recycled, and for the grandson process its parent process has exited so the grandchild process becomes an orphan process, the orphan process is taken over by the Init process, and Init waits for recycling after the grandchild process is over.
The first method ignores the SIGCHLD signal, which is often used for the performance of concurrent servers because the concurrent server often fork many child processes, and the child process ends up requiring the server process to wait to clean up resources. If you set this signal to ignore, it allows the kernel to transfer the zombie subprocess to the Init process, eliminating the use of a large number of zombie processes for system resources.
Zombie Process Handling Methods
1 Wait () function
#include <sys/types.h>
#include <sys/wait.h>
pid_t Wait (int *status);
once the process has called wait, it blocks itself immediately , and the wait automatically parses if a child process of the current process has exited, and if it finds such a child process that has become a zombie, wait collects the child process information and destroys it and returns If no such sub-process is found, wait is stuck here until one appears.
The parameter status is used to hold some state when the collection process exits, which is a pointer to type int. But if we don't care about how this subprocess dies, we just want to get rid of this zombie process (and in most cases we will), we can set this parameter to NULL, just like this:
PID = Wait (NULL);
If successful, wait returns the process ID of the child process being collected, and if the calling process does not have child processes, the call fails, and wait returns 1 while errno is set to Echild.
- The wait system call causes the parent process to pause execution until one of its child processes has ended.
- The PID of the child process is returned, which is usually the end of the child process
- State information allows the parent process to determine the exit state of the child process, the value returned from the main function of the child process or the exit code of the exit statement in the child process.
- If the status is not a null pointer, the state information is written to the location it points to
Some of these macros can be used to determine the exit of a child process:
2 Waitpid () function
#include <sys/types.h>
#include <sys/wait.h>
pid_t waitpid (pid_t pid, int *status, int options);
Parameters:
Status: If it is not empty, the state information is written to the location it points to, as with wait
Options: One of the most useful options for allowing changes to waitpid behavior is Wnohang, which prevents waitpid from suspending the execution of the caller
The value of options is a or of zero OR more of the following con-
Stants:
Wnohang return immediately if no child has exited.
Wuntraced also return if a child have stopped (but not traced via
Ptrace (2)). The Status for traced children which has stopped
is provided even if this option was not specified.
Wcontinued (since Linux 2.6.10)
Also return if a stopped child have been resumed by delivery
of Sigcont.
Return value: If the ID of the waiting child process is returned successfully, the failure returns-1
The interpretation of the P i d parameter for waitpid is related to its value:
PID = =-1 waits for any child process. So waitpid is equivalent to wait in this function.
PID > 0 waits for its process I d and P i d to be equal to the child process.
PID = = 0 waits for its group I d to be equal to any of the child processes that call the process's group I d. In other words, the process is in the same group as the caller process.
PID <-1 waits for any sub-process whose group I d equals the absolute value of P i D
Wait differs from Waitpid:
- Before a child process terminates, wait causes its callers to block, and Waitpid has a selection that allows the caller to not block.
- Waitpid does not wait for the first terminating child process-it has several choices that can control the specific process it waits for.
- In fact, the wait function is a special case of the Waitpid function. Waitpid ( -1, &status, 0);
Example:
The following code creates 100 child processes, but the parent process does not wait for them to end, so there are 100 zombie processes before the parent process exits.
#include <stdio.h>#include<unistd.h>intMain () {inti; pid_t pid; for(i=0; i< -; i++) {PID=Fork (); if(PID = =0) Break; } if(pid>0) {printf ("press Enter to exit ..."); GetChar (); } return 0; }
One solution is to write a SIGCHLD signal handler to call Wait/waitpid to wait for the child process to return.
#include <stdio.h>#include<unistd.h>#include<signal.h>#include<sys/types.h>#include<sys/wait.h>voidWait4children (intSigno) { intstatus; Wait (&status); } intMain () {inti; pid_t pid; Signal (SIGCHLD, Wait4children); for(i=0; i< -; i++) {PID=Fork (); if(PID = =0) Break; } if(pid>0) {printf ("press Enter to exit ..."); GetChar (); } return 0; }
However, by running the program, you will see that there are still zombie processes, and the number of zombie processes is variable. What is this for? In fact, the main reason is that the Linux signal mechanism is not queued , if at some time the number of sub-process exit will emit a SIGCHLD signal, but the parent process is too late to respond one by one, so the last parent process actually only executed a signal processing function. But once the signal processing function waits for only one child process to exit, there will eventually be some child processes that are still zombie processes.
However, it is clear that the sigchld must have a child process exit, and we can loop through the signal processing function to call the Waitpid function to wait for all the exiting subprocess. The main reason why you don't need to wait is that wait is blocked after waiting to clean up all the zombie processes.
So the best solution is as follows:
#include <stdio.h>#include<unistd.h>#include<signal.h>#include<errno.h>#include<sys/types.h>#include<sys/wait.h>voidWait4children (intSigno) { intstatus; while(Waitpid (-1, &status, Wnohang) >0); } intMain () {inti; pid_t pid; Signal (SIGCHLD, Wait4children); for(i=0; i< -; i++) {PID=Fork (); if(PID = =0) Break; } if(pid>0) {printf ("press Enter to exit ..."); GetChar (); } return 0; }
The reason for using waitpid instead of wait here is that we call waitpid within a loop to get the state of all the terminated child processes. We must specify the Wnohang option, which tells Waitpid not to block when there are child processes that have not been terminated at runtime. We cannot call wait within the loop because there is no way to prevent wait from blocking when the child process that is running has not yet terminated.
Zombie Process processing SIGCHLD signal under Linux