Another innocent troubleshooting

Source: Internet
Author: User

Recent team changes have taken over a bunch of unfamiliar applications. There will always be a variety of ups and downs during the expiration period, busy to the dark, the book did not read much, waste Ah waste.

Record a fault diagnosis. After writing this article, I hope that I will not write this topic any longer.

1. symptom:

A Database Synchronization application in the shanzhai fails.

2. troubleshooting:

2.1. detours

Use CONF/log4j. XML to find "A" Log File "(/home/admin/out/logs/automation_sys.log). The result has been turned over for half a day, and no problem is found.

Because it is a distributed application, it is deployed on three machines (3 slave, 1 master. S-M communication through RMI ).

So I can't see the problem on the master machine. I just need to check several slave instances. There is no error in the result log.

2.2 Positive Solution

However, when a grep error occurs in the log folder, it is easy to see the following exception (host1 is the name of the machine that has been processed ).

[0327 15:53:56 674 ERROR] [main] master.action.DefaultAction - host1 error is java.util.concurrent.ExecutionException: org.springframework.remoting.RemoteConnectFailureException: Cannot connect to remote service [rmi://host1:3099/RmiDispatcherService]; nested exception is java.rmi.ConnectIOException: error during JRMP connection establishment; nested exception is:        java.net.SocketTimeoutException: Read timed out

An RMI times out. We know that packets sent to the local Nic do not go through the router (no difference with localhost), so it has little to do with the network. It is estimated that the machine load is relatively high (A 5g memory virtual machine runs 5 JVMs. But because there is no monitoring, it cannot be confirmed.

The Error Log ran to automation. log and read CONF/log4j. xml. For example:

<appender name="activex_appender" class="org.apache.log4j.RollingFileAppender">        <param name="File"                value="/home/admin/out/logs/automation_sys.log" />        <layout class="org.apache.log4j.PatternLayout">                <param name="ConversionPattern" value="[%d{MMdd HH:mm:ss SSS\} %-5p] [%t] %c{3\} - %m%n" />        </layout></appender><appender name="automation_appender" class="org.apache.log4j.RollingFileAppender">        <param name="File"                value="/home/admin/out/logs/automation.log" />        <layout class="org.apache.log4j.PatternLayout">                <param name="ConversionPattern" value="[%d{MMdd HH:mm:ss SSS\} %-5p] [%t] %c{3\} - %m%n" />        </layout></appender> <logger name="com.alibaba.wscrosstool" additivity="false">        <level value="info" />        <appender-ref ref="automation_appender" /></logger><root>        <priority value ="info"/>        <appender-ref ref="activex_appender" /></root>

Additivity = "false" causes the corresponding log to be intercepted in "automation_appender" and not put in the root logger.

3. An error occurs again:

During the process of applying for a machine and preparing to move the application out, the application does not compete.

This time we compared tricky, but none of the logs with errors under the log folder. And from a certain point in time, all logs have evaporated.

Jstack shows that the application stops at the place where an error log is output every 5s, and no log is output by a line of logs.

Because jstat-gcutil shows that GC is still being performed, it indicates that the program has no deadlock. Speechless.

I ran a step downstairs and came up to search for logs again by lsof.

$/usr/sbin/lsof -p  18481 | grep -v jar$ | grep logjava    18481 admin  306w   REG             202,17 12428618   2883725 /home/admin/out/logs/automation.log.1 (deleted)java    18481 admin  307w   REG             202,17 13431037   2883877 /home/admin/out/logs/automation_sys.log.1

A deleted file is found.

We know that files under Linux are not deleted immediately after they are deleted (only the mark is not displayed, and the new process cannot be read ). It will be erased only when all processes holding the handle exit (For details, refer to the test conducted by the monks ).

As a result, I searched for how to retrieve the deleted file and found it quite easily according to this post.

The subsequent errors have not been completely found, so they are not listed.

4. Conclusion

The configuration of log4j is incorrect.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.