Problem description:
There will always be a loadavg alarm after the project is launched
- The server runs normally after going online for a period of time. In a certain period of time (usually two hours after going online), The loadreport will suddenly increase to a height (more than 100) and gradually decrease.
- There are more than 2000 timer threads (depending on the service details, the number of threads is not necessarily large)
:
Troubleshooting process:
After the last troubleshooting (ten days of investigation), we found that this problem was caused by modifying the database connection pool configuration in a jar package.
- Use PS to find the Java Process ID __specific command: PS-Ef | grep Java
- Run PS to view threads that occupy a high CPU. __specific command: PS p $ target_pid-l-O pcpu, PID, tid, time, tname, stat, PSR | sort-n-K1-R
- Then, use the jstack PID to capture the thread status in the process. __specific command: jstack PID
- Then, the thread ID is 751511 Based on the CPU usage in step 2.
- Then, convert 751511 to hexadecimal: b7797.
- Find b7797 in the corresponding jstack output result and find the call stack of the corresponding process.
After finding the call stack, it is better to find out the specific reason:
We can find that
It is a problem with the DBCP connection pool. What task is being executed by starting a timer? The loadavg is too high because the queue is too long.
Finally, locate the Code:
The Configuration Attribute of DBCP is modified.
Set the minimum idle time to 5 in line 82.
Therefore, a timer will always scan all connections and check whether they are idle.
Because a database uses a very heavy project and may connect hundreds of databases at the same time, a large number of connections are queued for scanning and recovery.
This causes the loadavg to be too high.
Summary:
In the future, you can quickly use PS and jstack tools to locate specific threads and call stacks to locate problems.
Record A loadavg alarm and Its Troubleshooting