Analysis of the problem of the Tomcat process exiting unexpectedly

Source: Internet
Author: User

Original link: http://hongjiang.info/why-kill-2-cannot-stop-tomcat/

Test environment feedback from a department before a section Tomcat will exit unexpectedly, we found that the actual environment is not the JVM crash, the log has a process to destroy the record, from pause to destory the entire process:

org.apache.coyote.AbstractProtocol pausePausing ProtocolHandlerorg.apache.catalina.core.StandardService stopInternalStopping service Catalinaorg.apache.coyote.AbstractProtocol stopStopping ProtocolHandlerorg.apache.coyote.AbstractProtocol destroyDestroying ProtocolHandler

From the above log can be judged:

1) Tomcat is not properly shut down via script (viaport: Shutdown command sent via 8005 Port)

Because normal shutdown (Viaport) will have such a sentence before pause warn log:

    org.apache.catalina.core.StandardServer await    A valid shutdown command was received via the shutdown port. Stopping the Server instance.    
2) Tomcat's shutdownhook is triggered and the destruction logic is executed.

And there are two cases, one is that there is a place in the application code System.exit to exit the JVM, and the other is the signal from the system ( kill -9 except that the Sigkill signal JVM will not have a chance to execute Shutdownhook)

By troubleshooting code, both the application and the middleware team have been able to troubleshoot the System.exit use of this application. That leaves only the signal; After some troubleshooting, it is found that each time tomcat unexpectedly exits coincides with the time at which the SSH session ended.

With this clue, silver when the classmate immediately looked at the other side of the test environment script, simplified as follows:

$ cat test.sh#!/bin/bashcd /data/server/tomcat/bin/./catalina.sh starttail -f /data/server/tomcat/logs/catalina.out

After Tomcat is started, the current shell process does not exit, but instead hangs over the tail process, outputting the log content to the terminal. In this case, the Java process exits if the user directly closes the SSH Terminal's window (with a mouse or shortcut). The ctrl-c Java process does not exit if the test.sh process is terminated before the SSH terminal is closed.

This is an interesting phenomenon, the catalina.sh start way that Tomcat launches the Java process to the init (Process ID 1) Under the parent process, has been test.sh divorced from the current process of the parent-child relationship, and is not related to the SSH process, why close the SSH terminal window will cause the Java process to exit?

Our speculation is that when the SSH window is closed, the current interactive shell and the running test.sh and other sub-processes to send an exit signal, find a machine equipped with SYSTEMTAP to verify, the Stap script used from the Spring students copy:

function time_str: string () {    return ctime(gettimeofday_s() + 8 * 60 * 60);}probe begin {    printdln(" ", time_str(), "BEGIN");}probe end {    printdln(" ", time_str(), "END");}probe signal.send {    if (sig_name == "SIGHUP" || sig_name == "SIGQUIT" ||         sig_name=="SIGINT" || sig_name=="SIGKILL" || sig_name=="SIGABRT") {        printd(" ", time_str(), sig_name, "[", uid(), pid(), cmdline_str(),                 "] -> [", task_uid(task), sig_pid, pid_name, "], ");        task = pid2task(pid());        while (task_pid(task) > 0) {            printd(" ", "[", task_uid(task), task_pid(task), task_execname(task), "]");            task = task_parent(task);        }        println("");    }}

The process level (Pstree) at the time of simulation is roughly the following, after Tomcat starts the Java process is out of test.sh and hangs under init:

|-sshd(1622)-+-sshd(11681)---sshd(11699)---bash(11700)---test.sh(13285)---tail(13299)

After the help of the kernel group, we found

A) when terminating the current test.sh process with CTRL-C, the system events process sends the two processes to Java and tail SIGINTSignal
SIGINT [ 0 11  ] -> [ 0 20629 tail ] SIGINT [ 0 11  ] -> [ 0 20628 java ] SIGINT [ 0 11  ] -> [ 0 20615 test.sh ] 注pid 11是events进程
b) When the SSH terminal window is closed, sshd sends to the downstream process SIGHUP, why would the Java process receive it?

But he was very busy and did not continue to assist in the analysis of the problem (he gave some speculation, but it turned out not to be).

Having determined that it was caused by signal, my doubts became:

1) why SIGINT(KILL-2) will not let the Tomcat process exit? 2) why SIGHUP(KILL-1) will let the Tomcat process exit?

My first reaction might be that the JVM has different signal processing for the OS under certain parameters (or because of some jni), looked at the JVM parameters of the application, didn't see the problem, and ruled out Tomcat using Apr/tcnative.

Let's take a look at the default, what the JVM process does SIGINT and SIGHUP what it does with Scala's REPL simulation:

scala> Runtime.getRuntime().addShutdownHook(            new Thread() { override def run() { println("ok") } })

The use and discovery of this Java process kill -2 kill -1 will cause the JVM process to exit and also trigger shutdownhook . This is also in line with Oracle's description of the hotspot virtual machine processing signal, which, in this SIGTERM SIGINT case, SIGHUP three signals will triggershutdownhook

It does not appear to be the JVM's business to continue guessing whether it is related to the state of the process. The catalina.sh script does not use start-stop-daemon such a way to start the Java process, the start parameter is executed in a simplified manner after the script is equivalent to:

eval ‘"/pathofjdk/bin/java"‘ ‘params‘ org.apache.catalina.startup.Bootstrap start ‘&‘

is simply to put Java in the background to execute. When the catalina.sh itself process exits, the Java process's ppid becomes 1

It took a lot of time to guess what might be the OS-level cause, and it didn't matter later. Spring Festival back to let less Ming and Jian Quan also analyze this problem, because they have C background, to the bottom of the system to know more, with a large half day time, constantly guessing and verification, and finally confirmed that the shell is the reason.

SIGINT(KILL-2) causes the background Java process not to exit

For simplicity, we use sleep to simulate the process when we are in interactive mode:

$ sleep 1000 & $ ps -opid,pgid,ppid,stat,cmd -C sleep  PID  PGID  PPID STAT CMD 9897  9897  9813 S    sleep 1000   

Note that the PID of the process sleep 1000 is the same as the Pgid (process group), when we kill -2 can kill sleep 1000 the process.

Now let's put the sleep process into a script that executes in the background:

$ cat a.sh#!/bin/shsleep 4400 &echo "shell exit"

After running the a.sh script, the PID of the sleep 4400 process is different from the Pgid, Pgid is the ID of its parent process, which is the a.sh process that has exited

$ ps -opid,pgid,ppid,comm -p 63376  PID  PGID  PPID COMM63376 63375     1 sleep

Then we kill -2 can't kill sleep 4400 the process.

At this point, it is very close to the reason, it must be the shell of the background process to signal_handler do what hands and feet. Handler implements a custom command to see if it is kill -2 valid:

#include <stdio.h>#include <signal.h>#include <stdlib.h>void my_handler(int sig) {    printf("handler aaa\n");    exit(0);}int main() {    signal(SIGINT, my_handler);    for(;;) { }    return 0;}

We will run the compiled a.out command in the script after the next stage:

$ cat a.sh#!/bin/sh/tmp/a.out &

kill -2it is possible to try to kill the a.out process again this time. This means that the shell signal_handler is rigged before the user logic is executed, that is, the script is set when the child process is forked. Following this lead, we learned from Google that the shell is set when it processes the signal in non-interactive mode against the background process SIGINT IGNORE .

Interaction mode and non-interactive mode have different default methods for job control

Why is the shell not set to ignore the background process signal in interactive mode SIGINT , but not in interactive mode? Still better understand, for example, we first a foreground process run too long, can ctrl-z abort, and then by bg %n putting this process into the background, the same can be a cmd & way to start the background process, by fg %n putting back to the foreground, and then ctrl-c stop it, Of course you can't ignore it SIGINT .

Why does the background process in interactive mode set a process group ID of its own? Because by default, if the process group ID of the parent process is used, the parent process propagates the received keyboard events, such as such, ctrl-c SIGINT to each member of the process group, assuming that the background process is also a member of the parent process group, because the job control needs cannot be ignored SIGINT , and you are free ctrl-c at the terminal May cause all background processes to exit, obviously this is unreasonable; so in order to avoid this interference the background process is set to its own pgid.

In non-interactive mode, job control is usually not required, so job control is also turned off by default in non-interactive mode (you can, of course, open the Job Control option in the script with the option set -m ). If the job control is not turned on, the background process in the script can prevent the parent process from propagating to the members of the group by setting the Ignore SIGINT signal, because the signal is meaningless to it.

Back to the Tomcat example, when the catalina.sh script starts with the start parameter, it is started in a non-interactive way, and the Java process is set by the shell to ignore the SIGINT signal, so at the end of the ctrl-c test.sh process, the system sends SIGINThas no effect on Java.

SIGHUP(KILL-1) causes the Tomcat process to exit

In non-interactive mode, the shell is set on the Java process SIGINT , the SIGQUIT signal is set to ignore, but the signal is not set to SIGHUP ignore. Then look at the process hierarchy at that time:

|-sshd(1622)-+-sshd(11681)---sshd(11699)---bash(11700)---test.sh(13285)---tail(13299)

After the sshd SIGHUP passes to the bash process, bash SIGHUP passes the child process to it, and for its subprocess Test.sh,bash also propagates the members of the TEST.SH process group SIGHUP . Because the Java background process inherits from the parent process catalina.sh (and also from its parent process test.sh), the Java process remains a member of the TEST.SH process group and SIGHUP exits after it is received.

If we set the open job control in the test.sh, we won't let the Java process out.

#!/bin/bashset -m  cd /home/admin/tt/tomcat/bin/./catalina.sh starttail -f /home/admin/tt/tomcat/logs/catalina.out

At this point the Java background process inherits the Pgid of the parent process catalina.sh, and catalina.sh no longer uses the TEST.SH process group, but its own PID as Pgid, After the catalina.sh process has finished executing the exit, the Java process hangs under Init, and the Java and test.sh processes are completely out of the relationship, and bash will no longer send a signal to it.

Analysis of the problem of the Tomcat process exiting unexpectedly (GO)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.