Recent team changes have taken over a bunch of unfamiliar applications. There will always be a variety of ups and downs during the expiration period, busy to the dark, the book did not read much, waste Ah waste.
Record a fault diagnosis. After writing this article, I hope that I will not write this topic any longer.
1. symptom:
A Database Synchronization application in the shanzhai fails.
2. troubleshooting:
2.1. detours
Use CONF/log4j. XML to find "A" Log File "(/home/admin/out/logs/automation_sys.log). The result has been turned over for half a day, and no problem is found.
Because it is a distributed application, it is deployed on three machines (3 slave, 1 master. S-M communication through RMI ).
So I can't see the problem on the master machine. I just need to check several slave instances. There is no error in the result log.
2.2 Positive Solution
However, when a grep error occurs in the log folder, it is easy to see the following exception (host1 is the name of the machine that has been processed ).
[0327 15:53:56 674 ERROR] [main] master.action.DefaultAction - host1 error is java.util.concurrent.ExecutionException: org.springframework.remoting.RemoteConnectFailureException: Cannot connect to remote service [rmi://host1:3099/RmiDispatcherService]; nested exception is java.rmi.ConnectIOException: error during JRMP connection establishment; nested exception is: java.net.SocketTimeoutException: Read timed out
An RMI times out. We know that packets sent to the local Nic do not go through the router (no difference with localhost), so it has little to do with the network. It is estimated that the machine load is relatively high (A 5g memory virtual machine runs 5 JVMs. But because there is no monitoring, it cannot be confirmed.
The Error Log ran to automation. log and read CONF/log4j. xml. For example:
<appender name="activex_appender" class="org.apache.log4j.RollingFileAppender"> <param name="File" value="/home/admin/out/logs/automation_sys.log" /> <layout class="org.apache.log4j.PatternLayout"> <param name="ConversionPattern" value="[%d{MMdd HH:mm:ss SSS\} %-5p] [%t] %c{3\} - %m%n" /> </layout></appender><appender name="automation_appender" class="org.apache.log4j.RollingFileAppender"> <param name="File" value="/home/admin/out/logs/automation.log" /> <layout class="org.apache.log4j.PatternLayout"> <param name="ConversionPattern" value="[%d{MMdd HH:mm:ss SSS\} %-5p] [%t] %c{3\} - %m%n" /> </layout></appender> <logger name="com.alibaba.wscrosstool" additivity="false"> <level value="info" /> <appender-ref ref="automation_appender" /></logger><root> <priority value ="info"/> <appender-ref ref="activex_appender" /></root>
Additivity = "false" causes the corresponding log to be intercepted in "automation_appender" and not put in the root logger.
3. An error occurs again:
During the process of applying for a machine and preparing to move the application out, the application does not compete.
This time we compared tricky, but none of the logs with errors under the log folder. And from a certain point in time, all logs have evaporated.
Jstack shows that the application stops at the place where an error log is output every 5s, and no log is output by a line of logs.
Because jstat-gcutil shows that GC is still being performed, it indicates that the program has no deadlock. Speechless.
I ran a step downstairs and came up to search for logs again by lsof.
$/usr/sbin/lsof -p 18481 | grep -v jar$ | grep logjava 18481 admin 306w REG 202,17 12428618 2883725 /home/admin/out/logs/automation.log.1 (deleted)java 18481 admin 307w REG 202,17 13431037 2883877 /home/admin/out/logs/automation_sys.log.1
A deleted file is found.
We know that files under Linux are not deleted immediately after they are deleted (only the mark is not displayed, and the new process cannot be read ). It will be erased only when all processes holding the handle exit (For details, refer to the test conducted by the monks ).
As a result, I searched for how to retrieve the deleted file and found it quite easily according to this post.
The subsequent errors have not been completely found, so they are not listed.
4. Conclusion
The configuration of log4j is incorrect.