Test environment feedback for a department before the section Tomcat will quit unexpectedly, and we found out that it was not JVM crash, and that the log had the process destroyed record, from pause to destory:
Org.apache.coyote.AbstractProtocol Pause
pausing Protocolhandler
Org.apache.catalina.core.StandardService stopinternal
Stopping service Catalina
Org.apache.coyote.AbstractProtocol stop
Stopping Protocolhandler
Org.apache.coyote.AbstractProtocol Destroy
Destroying Protocolhandler
From the above log to be able to determine:
1 Tomcat is not properly closed via script (Viaport: That is, sending shutdown instructions via 8005 ports)
Because the normal shutdown (Viaport) will be in pause before there is such a sentence warn log:
Org.apache.catalina.core.StandardServer await
A valid Shutdown command was received via the shutdown port. Stopping the Server instance.
Then it's pause-> stop-> destroy
2 Tomcat's shutdownhook is triggered and the destruction logic is executed.
And there are two cases, one is the application code where the System.exit to exit the JVM, the second is the system signal (except kill-9, Sigkill signal JVM will not have the opportunity to perform Shutdownhook)
By troubleshooting the code, both the application side and the middleware team have been troubleshooting the possibility that system.exit is used in this application. That leaves only the signal. After a few checks, it was found that every time the Tomcat quits unexpectedly coincides with the end time of the SSH session.
With this clue, silver students immediately looked at the other side of the test environment script, simplified as follows:
$ cat test.sh
#!/bin/bash
cd/data/server/tomcat/bin/
./catalina.sh start
tail-f/data/server/ Tomcat/logs/catalina.out
After Tomcat is started, the current shell process does not exit, but instead hangs in the tail process, outputting the log content to the terminal. In this case, if the user closes the SSH terminal window directly (with a mouse or shortcut key), the Java process exits as well. If you ctrl-c terminate the test.sh process before you close the SSH terminal, the Java process does not exit.
This is an interesting phenomenon, catalina.sh start Tomcat hangs the Java process under the parent process of init (process ID 1). has been detached from the parent-child relationship with the current test.sh process and is not related to the SSH process, why shutting down the SSH terminal window will cause the Java process to exit?
Our guess is that when the SSH window shuts down, it sends an exit signal to the current interacting shell and the running test.sh, looking for a machine with SYSTEMTAP to verify that the STAP script used was copy from the fountain classmate:
function time_str:string () {return
CTime (gettimeofday_s () + 8 * *);
}
Probe begin {
printdln ("", Time_str (), "Begin");
}
Probe End {
printdln ("", Time_str (), "end");
}
Probe Signal.send {
if (sig_name = "Sighup" | | sig_name = "sigquit"
| | sig_name== "SIGINT" | | sig_name== "SIGKILL" | | sig_name== "SIGABRT") {
printd ("", Time_str (), Sig_name, "[", UID (), PID (), Cmdline_str (),
"]-> [", task_ui D (Task), Sig_pid, Pid_name, "],");
Task = Pid2task (PID ());
while (Task_pid (Task) > 0) {
printd ("", "[", Task_uid (Task), Task_pid (Task), Task_execname (Task), "]");
Task = task_parent (Task);
}
println ("");
}
}
The process level (Pstree) at the time of impersonation is roughly as follows: After Tomcat startup, the Java process has been disconnected from test.sh and hung under init.
|-sshd (1622)-+-sshd (11681)---sshd (11699)---bash (11700)---test.sh (13285)---tail (13299)
With the help of the kernel group, Bo Yu, we found
A the system events process sends SIGINT signals to the Java and tail two processes when terminating the current test.sh process with ctrl-c
SIGINT [0]-> [0 20629 tail]
SIGINT [0]-> [0 20628 Java]
Note PID 11 is the events process
b When you close the SSH terminal window, sshd sends Sighup to the downstream process, why does the Java process receive it?
sighup [0 11681 Sshd:hongjiang.wanghj [priv]]-> [57316 11700 bash]
sighup [57316 11700-bash]-> [57316 11700 bash]
sighup [57316 11700]-> [0 13299 tail]
sighup [57316 11700]-> [0 13298 Java]
But Bo Yu was too busy to help analyze the problem (he gave some guesses, but it turned out to be not).
Determined to be caused by signal, my doubts became:
1 why SIGINT (KILL-2) won't let the Tomcat process quit?
2 why Sighup (KILL-1) will let the Tomcat process exit?
My first reaction might be that the JVM has different signal processing to the OS under certain parameters (or because of some jni), looking at the application JVM parameters, not seeing the problem, and excluding Tomcat using Apr/tcnative.
Let's take a look at what the JVM process does with SIGINT and sighup by default, using Scala's repl to simulate:
Scala> runtime.getruntime (). Addshutdownhook (
new Thread () {override def run () {println ("OK"}})
Discovering the Java process separately with KILL-2 and KILL-1 will cause the JVM process to exit and also trigger the Shutdownhook. This also conforms to Oracle's hotspot virtual machine processing signal, reference here, sigterm,sigint,sighup three kinds of signals will trigger Shutdownhook
It doesn't seem to be the JVM, keep guessing about the status of the process? The catalina.sh script does not start the Java process in such a way as Start-stop-daemon, and the start parameter is executed in a simplified manner, which is equivalent to:
Eval ' "/pathofjdk/bin/java" ' params ' org.apache.catalina.startup.Bootstrap start ' & '
is simply to put Java in the background to execute. When Catalina.sh's own process exits, the ppid of the Java process becomes 1
It took a lot of time to speculate about what might be the OS level, and later found that it didn't matter. After the Spring festival back to let less Ming and the brook also analyzed this problem together, because they have the background of C, the bottom of the system to know more, with a large half-day, constant speculation and verification, and finally confirmed the reason for the shell.
SIGINT (KILL-2) does not cause the background Java process to exit
For simplicity, we use sleep to simulate processes when we are in interactive mode:
$ sleep 1000 &
$ ps-opid,pgid,ppid,stat,cmd-c sleep
PID pgid ppid stat cmd
Note that the PID of process Sleep 1000 is the same as the Pgid (process group), when we use KILL-2 to kill the sleep 1000 process.
Now we put the sleep process into a script that executes backstage:
$ cat a.sh
#!/bin/sh sleep
4400 &
echo "Shell exit"
After running the a.sh script, the PID of the sleep 4400 process is different from the Pgid, Pgid is the ID of its parent process, that is, the a.sh process that has exited
$ ps-opid,pgid,ppid,comm-p 63376
PID pgid ppid comm 63376 63375
1 sleep
When we use KILL-2, we can't kill the sleep 4400 process.
By this step, it is already very close to the cause, it must be the shell of the background process Signal_handler do what hands and feet. Less clearly implements a custom handler command to see if it is valid for Kill-2:
#include <stdio.h>
#include <signal.h>
#include <stdlib.h>
void My_handler (int sig) {
printf ("Handler aaa\n");
Exit (0);
}
int main () {
signal (SIGINT, my_handler);
for (;;) {} return
0;
}
We run the compiled a.out command in the script for the following way:
$ cat a.sh
#!/bin/sh
/tmp/a.out &
It is OK to try to kill the a.out process again with KILL-2. This means that the shell is fiddling with Signal_handler before executing the user logic, i.e. the script is set when the child process is fork. Following this clue, we know from Google that the shell is ignore when it processes the SIGINT signal on the background process in non-interactive mode.
The interaction mode differs from the non-interactive mode for job control (job controls) by default
Why does the shell not set the background process processing SIGINT signal to ignore in interactive mode, but not in interactive mode? It is better to understand, for example, we first a foreground process run too long, you can stop ctrl-z, and then through the BG%n put this process into the background, the same can be a cmd & way to start the background process, through FG%n put back to the foreground, and then in Ctrl-c stop it , of course, can not ignore SIGINT.
Why does the background process in interactive mode set a process group ID of its own? Because the default is to take the process group ID of the parent process, the parent process propagates incoming keyboard events such as Ctrl-c SIGINT to each member in the process group, assuming that the background process is also a member of the parent process group because the need for job control cannot ignore SIGINT. It is not reasonable for you to ctrl-c at the end of the terminal to cause all background processes to exit, so in order to avoid this interference the background process is set to its own pgid.
Rather than in interactive mode, job control is usually not required, so job control is turned off by default in Non-interactive mode (and, of course, the job Control option can be opened in the script via option set-m). Without opening the job control, the background process in the script can prevent the parent process from propagating to the members of the group by setting the Ignore SIGINT signal, because the signal is meaningless to it.
Back to the example of Tomcat, When the catalina.sh script starts with the start parameter, it is started in a non-interactive background, and the Java process is set by the shell to ignore the SIGINT signal, so the SIGINT sent by the system does not affect Java when ctrl-c ends the test.sh process.
Sighup (KILL-1) causes the Tomcat process to exit
In non-interactive mode, the shell sets the sigint,sigquit signal set to ignore the Java process, but does not set the sighup signal to ignore. And look at the process level at that time:
|-sshd (1622)-+-sshd (11681)---sshd (11699)---bash (11700)---test.sh (13285)---tail (13299)
After sshd passes sighup to the bash process, bash passes sighup to its subprocess, and the test.sh,bash of its subprocess also propagates test.sh to the members of the Sighup's process group. Because the Java background process inherits from the parent process catalina.sh (and also from its parent process test.sh), the Java process still belongs to a member of the TEST.SH process group and exits after the Sighup is received.
If we set open job control in test.sh, we won't let the Java process quit.
#!/bin/bash
set-m
cd/home/admin/tt/tomcat/bin/
./catalina.sh start
tail-f/home/admin/tt/tomcat/ Logs/catalina.out
At this time, the Java background process inherits the Pgid of the parent process catalina.sh, and catalina.sh no longer uses the TEST.SH process group, but is its own PID as the pgid,catalina.sh process after the execution of the exit, the Java process hangs under Init, Java and test The SH process is completely out of relation and bash will not send a signal to it.
The above is a small set to introduce you to the Tomcat process unexpected exit problem analysis, I hope to help you, if you have any questions welcome to my message, small series will promptly reply to everyone!