Android--process of signal processing for INIT process

Source: Internet
Author: User
Tags epoll signal handler

Android--process of signal processing for INIT process


In Android, when a process exits (exit), a SIGCHLD signal is sent to its parent process. When the parent process receives the signal, the system resources assigned to the child process are freed, and the parent process needs to call wait () or waitpid () to wait for the child process to end. If the parent process does not do this, and the parent process initializes without calling signal (SIGCHLD, sig_ign) to show that the processing of the SIGCHLD is ignored, the child process remains in its current exit state and does not exit completely. Such a child process can not be scheduled, only to occupy a position in the process list, save the process of PID, termination state, CPU use time and other information, we call this process "Zombie" process, that is, the zombie process.

In Linux, the purpose of setting up a zombie process is to maintain some information about the child process for subsequent queries by the parent process. Specifically, if a parent process terminates, then its parent process of all the zombie subprocess will be set to the init process (PID 1), and the INIT process is responsible for reclaiming these zombie processes (the INIT process will Wait ()/waitpid () them and clear their information in the process list).

Because the zombie process still occupies a place in the list of processes, and the maximum number of processes supported by Linux is limited, we cannot create a process after this threshold is exceeded. Therefore, it is necessary to clean up the zombie process to ensure the normal operation of the system.

Next, we analyze how the init process handles the SIGCHLD signal.

In Init.cpp, we initialize the SIGCHLD signal processing by signal_handler_init ():

void Signal_    Handler_init () {//Create a signalling mechanism for SIGCHLD. int S[2];//socketpair () creates a pair of unnamed, interconnected UNIX domain sockets if (Socketpair (Af_unix, Sock_stream | Sock_nonblock |        Sock_cloexec, 0, s) = =-1) {ERROR ("Socketpair failed:%s\n", Strerror (errno));    Exit (1);    } SIGNAL_WRITE_FD = S[0];    SIGNAL_READ_FD = s[1];    Write to SIGNAL_WRITE_FD if we catch SIGCHLD.    struct Sigaction Act;    memset (&act, 0, sizeof (ACT));    Act.sa_handler = sigchld_handler;//Sets the handle to the signal handler, when a signal is generated, writes the data to the socket created above, Epoll monitors to the FD-readable in the socket pair, invokes the registered function to handle the event Act.sa_flags = sa_nocldstop;//Sets a flag indicating that the SIGCHID signal is accepted only when the child process terminates sigaction (SIGCHLD, &act, 0);//Initialize SIGCHLD signal processing mode reap_a Ny_outstanding_children ();//Process This previously exited subprocess Register_epoll_handler (SIGNAL_READ_FD, handle_signal);} 
We use the sigaction () function to initialize the signal. In the act parameter, a signal processing function is specified: Sigchld_handler (); If a signal arrives, the function is called, and in the parameter Act we also set the SA_NOCLDSTOP flag, which indicates that the SIGCHLD signal is accepted only when the child process terminates.

In Linux, the signal is a soft interrupt, so the arrival of the signal terminates the operation being processed by the current process. Therefore, we do not adjust some non-reentrant functions in the registered signal processing function. Also, Linux does not queue the signal, in the processing of a signal no matter how many signals received, the current signal processing is complete, the kernel will only send another signal to the process, so there is a possibility of signal loss. To avoid losing signals, our registered signal processing functions should be as efficient and as fast as possible.

When we process the SIGCHLD signal, the parent process will do the wait operation, the time is longer. To solve this problem, the above signal initialization code creates a pair of unnamed and associated local sockets for inter-thread communication. The registered signal processing function is Sigchld_handler ():

static void Sigchld_handler (int) {    if (temp_failure_retry (write (SIGNAL_WRITE_FD, "1", 1)) = =-1) {        ERROR ("Write (SIGNAL_WRITE_FD) failed:%s\n ", Strerror (errno));}    }
#define TEMP_FAILURE_RETRY (exp)              ({                                           decltype (exp) _RC;                         do {                                         _RC = (exp),                             } while (_RC = =-1 && errno = = eintr);     _RC;                                     })
When a signal arrives, as long as the data is written to the socket, the process is fast, and the signal processing is transferred to the socket's response, which does not affect the processing of the next signal. At the same time, the write () function is nested within the perimeter of a do...while loop, the loop condition is write () error and the current error number is EINTR (eintr: This call is interrupted by the signal), that is, the current write () is due to the arrival of an interrupt and an error occurs. The operation executes again; in other cases, the write () function executes only once. Once the signal processing is initialized, Reap_any_outstanding_children () is called to handle the previous process exit situation:
static void Reap_any_outstanding_children () {While    (wait_for_one_process ()) {    }}
Wait_for_one_process () Main call Waitpid () waits for the child process to end, and when the service that the process represents needs to be restarted, it does some setup and cleanup work.

Finally, register the local socket with Epoll_ctl () to EPOLL_FD to listen for readability, and register the handler function for the Epoll event:

Register_epoll_handler (SIGNAL_READ_FD, handle_signal);
void Register_epoll_handler (int fd, void (*FN)) ()) {    epoll_event ev;    ev.events = epollin;//to file descriptor readable    ev.data.ptr = reinterpret_cast<void*> (FN);//Save the specified function pointer for subsequent event handling    if ( Epoll_ctl (EPOLL_FD, Epoll_ctl_add, FD, &ev) = =-1) {//to EPOLL_FD add FD to listen, such as property, Keychord, and signal event listener        ERROR ("Epoll_ctl failed:%s\n", Strerror (errno));}    }
We take the zygote process exit as an example to see the specific process of SIGCHLD signal processing. The zygote process is declared as a service in init.rc and created by the INIT process. When the zygote process exits, the SIGCHLD signal is sent to the INIT process. The preceding code has already completed the initialization of the signal, so when the signal arrives, it calls the Sigchld_handler () function, which is processed by writing a data directly to the socket and returning immediately, and the processing of the SIGCHLD is shifted to the response of the socket event. We register a local socket via EPOLL_CTL and listen for it is readable, at this point the socket has data readable at this time because of the previous write () call, and the registered handle_signal () function is called to process it now:
static void Handle_signal () {    //Clear outstanding requests.    Char buf[32];    Read (SIGNAL_READ_FD, buf, sizeof (BUF));    Reap_any_outstanding_children ();}
It will take the unique buf of the socket data and call the Reap_any_outstanding_children () function to handle the exit of the child process and the restart of the service:
static void Reap_any_outstanding_children () {While    (wait_for_one_process ()) {    }}
static bool Wait_for_one_process () {int status;    pid_t pid = Temp_failure_retry (Waitpid ( -1, &status, Wnohang));//waits for the child process to end and obtains its PID process number, Wnohang indicates that if no process is finished, it returns immediately.    if (PID = = 0) {return false;        } else if (PID = =-1) {ERROR ("Waitpid failed:%s\n", Strerror (errno));    return false;    } service* svc = Service_find_by_pid (PID);//based on PID, find the service information in the list std::string name;    if (svc) {name = android::base::stringprintf ("Service '%s ' (PID%d)", Svc->name, PID);    } else {name = android::base::stringprintf ("untracked pid%d", PID);    } NOTICE ("%s%s\n", Name.c_str (), Describestatus (status). C_STR ());    if (!svc) {return true; }//Todo:all the code from here to should is a member function on service.//if the service process does not have a Svc_oneshot flag set or Svc_resta    RT flag, the current process is killed before the new process is recreated;//To avoid the subsequent restart process, an error occurs because the current service process already exists. if (! ( Svc->flags & svc_oneshot) | | (Svc->flags & Svc_restart)) {NOTICE ("Service '%s ' (PID%d) killing anY Children in Process group\n ", Svc->name, PID);    Kill (-pid, SIGKILL);    }//Remove Any sockets we could have created. If a socket was previously created for this service process, then we need to clear out the socket for (socketinfo* si = svc->sockets; si; si = si->next) {char tmp[        128];        snprintf (TMP, sizeof (TMP), Android_socket_dir "/%s", si->name); Unlink (TMP);//delete this socket device file} if (Svc->flags & svc_exec) {////service exits completely, clears all information, and removes the service from Svc-slist info (        "Svc_exec pid%d finished...\n", svc->pid);        Waiting_for_exec = false;        List_remove (&svc->slist);        Free (svc->name);        Free (SVC);    return true;    } svc->pid = 0;    Svc->flags &= (~svc_running);    Oneshot processes go into the disabled state on exit,//except when manually restarted. If the service process has a svc_oneshot flag and there is no Svc_restart flag, then the service does not need to restart if ((Svc->flags & Svc_oneshot) &&! (    Svc->flags & Svc_restart) {svc->flags |= svc_disabled; }    Disabled and reset processes do not get restarted automatically. If the service has a Svc_reset flag, the service does not need to restart if (Svc->flags & (svc_disabled |        Svc_reset) {//From the results of the SVC_RESET flag to determine the highest priority level Svc->notifystatechange ("stopped");    return true; }//to this, we can learn that a service process in init.rc as long as there is no declaration svc_oneshot and Svc_reset flag, when the process dies, it will be restarted;//However, if a service process has a svc_critical flag and there is no svc_    Restart flag, when it crash, restarts more than 4 times, the system will automatically restart and enter recovery mode time_t now = gettime (); if ((Svc->flags & svc_critical) &&! ( Svc->flags & Svc_restart) {if (svc->time_crashed + Critical_crash_window >= now) {if (+ + svc->nr_crashed > Critical_crash_threshold) {ERROR ("CRITICAL process '%s ' exited%d times in%d Minu TEs "" Rebooting into Recovery mode\n ", Svc->name, Critical_crash_threshold, Crit                ICAL_CRASH_WINDOW/60);                Android_reboot (android_rb_restart2, 0, "recovery");            return true;  }      } else {svc->time_crashed = now;        svc->nr_crashed = 1;    }} svc->flags &= (~svc_restart);    Svc->flags |= svc_restarting;//The service plus a restart flag indicating that it needs to be restarted; the follow-up is to judge/Execute all Onrestart the commands for the this services.    struct listnode* node; List_for_each (node, &svc->onrestart.commands) {//If the service has a onrestart option, traverse the list of commands that need to be executed when the process restarts and execute command* cmd = Node_        To_item (node, struct command, clist);    Cmd->func (Cmd->nargs, Cmd->args);    } svc->notifystatechange ("restarting"); return true;}
The main points of the processing in this function are:
    1. Calling Waitpid () waits for the child process to end, and the return value of Waitpid () is the process number of the child process. If no child process exits, waitpid () returns immediately without suspending because the Wonhang flag is set. Nested Temp_failure_retry () has the same meaning as previously described, and Waitpid () is repeated when waitpid () returns an error and the error code is EINTR.
    2. Based on the PID, find the service information corresponding to the corresponding process from the Service_list list. If the definition of the service process does not have the SVC_ONESHOT flag set in Init.rc, or if the Svc_restart flag is set, the current process is killed first, and the new process is recreated, to avoid the subsequent re-creation of the process, and an error occurs because the current service process already exists.
    3. If a socket is created for the current service, the socket is cleared.
    4. If the service process has a svc_oneshot flag and there is no Svc_restart flag, the service does not need to be restarted.
    5. If the service has a Svc_reset flag, the service does not need to be restarted.
    6. If a service process has a svc_critical flag and there is no Svc_restart flag, the system restarts automatically and enters recovery mode when it crash and restarts more than 4 times.
    7. If the service is judged to require a restart, the service is added with the restart Flag svc_restarting, indicating that it needs to be restarted; Important
    8. Finally, if the service has the Onrestart option, traverse the list of commands that need to be executed when the service restarts and execute the commands

If the service represented by this child process needs to be restarted, the service is added with the svc_restarting flag.

Prior to introducing the INIT process initialization process, we analyzed that when the init process is finished, it goes into a loop incarnation as a daemon, handling services such as signal, property, and Keychord:

 while (true {if (!waiting_for_exec) {Execute_one_command ();//execute command in List of commands restart_processes ();//Start Service list        The process} int timeout =-1;            if (process_needs_restart) {timeout = (Process_needs_restart-gettime ()) * 1000;        if (Timeout < 0) Timeout = 0;        } if (!action_queue_empty () | | cur_action) {timeout = 0;        } bootchart_sample (&timeout),//bootchart is a visual way to perform the performance analysis of the boot process tools, the need for a timed wake-up process epoll_event ev;            int nr = Temp_failure_retry (epoll_wait (EPOLL_FD, &ev, 1, timeout));//Start polling, epoll_wait () Wait for event generation if (nr = =-1) {        ERROR ("Epoll_wait failed:%s\n", Strerror (errno)); } else if (nr = = 1) {((void (*) ()) ev.data.ptr) ();//Call Epoll_event Event store for function pointer handling event}} 
Where it loops calls Restart_processes () to restart the service in the Service_list list with all the svc_restarting flags (which are set in wait_for_one_process () processing):
static void    Restart_processes () {process_needs_restart = 0; Service_for_each_flags (svc_restarting, restart_service_if_needed);} void Service_for_each_flags (unsigned matchflags,                             Void (* Func) (struct service *svc)) {    struct listnode *node;    struct service *svc;     List_for_each (node, &service_list) {        svc = Node_to_item ( node, struct service, slist);        if (Svc->flags & Matchflags) {             func (SVC);        }   }} 
static void restart_service_if_needed (struct service *svc) {    time_t next_start_time = svc->time_started + 5;    if (Next_start_time <= gettime ()) {        svc->flags &= (~svc_restarting);        Service_start (SVC, NULL);        return;    }    if ((Next_start_time < Process_needs_restart) | |        (Process_needs_restart = = 0)) {        Process_needs_restart = Next_start_time;    }}
The Service_start () function is eventually called to restart a service that exits. The processing of the Service_start () is analyzed during the process of introducing the INIT process, which is not covered here.











Android--process of signal processing for INIT process

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.