Recently in charge of the company's SMS Gateway maintenance and construction, with the company's business development of short-message dependence more and more serious, SMS daily send volume is more than the previous day more than 40 W send a daily increase to reach 200w send volume. Because it is the use of Java to send the bottom, the pressure increases the situation inevitably face memory problems. There is a memory leak problem with the sending volume approaching 200w.
After running a check on the system found:
1) Each restart of the system after 3-4 hours, found a little instability;
2) After 3-4 hours, an out of memory error occurred: Java.lang.OutOfMemoryError:GC overhead limit exceeded
When this problem is discovered, the monitoring log is obtained directly through JMS, and the system's memory recovery is abnormal, the GC pressure is very large and there is a significant memory backlog.
Then directly to the system's memory down analysis, found that there is a memory backlog situation:
This is a timed task of MySQL, the main function of this timer task is to do the query timeout. For example, in the case of a SQL query, JDBC will give you a time-out. To ensure that the time-out expires, you can turn off statement, which will open a timed task that protects the shutdown. If the SQL has not responded to execution in the time-out, the cancel task executes the shutdown. Because the default setting for C3P0 has a timeout of 25s (<setting name= "defaultstatementtimeout" value= "25000"/>), it means that within this 25s, in the execution of a large number of SQL cases, the Cancel Task backlog to a certain extent, it will cause system instability. (This was not the root cause, but the appearance)
But the system itself has a send queue through MySQL, itself to the MySQL operation is very much, if only to the code level of optimization is basically a drop in the bucket.
in time-critical circumstances, a short period of time to stabilize the business is the most important task. was forced to apply the interim solution according to the above situation.
Temporary programme
Judging from the above, it is basically possible to determine that the cancel task should generate a spike in a set of threads that will explode the JVM heap. However, it is not possible to do too much debugging at the end of a business run, so it is more than doubled to the JVM's memory, hoping that the system can span a wave of memory spikes. As a result, the JVM's memory is adjusted to twice times the time of failure, and the system's memory is restored to normal operation, but the highest value of the cancel task consumes more than 2G of memory. Although it is also recycled, it is not a good idea to always expand memory.
Solution Solutions
The system uses the MySQL JDBC 5.1.6 version, immediately decompile the MySQL code, found the following issues. Because the timer of the cancel task is stored statically in the connection, it means that the business cannot reclaim the cancel task memory is the root cause of the failure if the structure is queried normally by the statement.
Clear the problem after MySQL JDBC, the fundamental solution should be to query whether MySQL jdbc resolves this bug.
The fix for this bug was found in version 5.1.11, and after the update, a memory leak failure is resolved. Http://dev.mysql.com/doc/relnotes/connector-j/en/news-5-1-11.html
Thanks to Guppo and Rui Kang's discussion and assistance, so that the fault can be solved satisfactorily!
About MySQL Com.mysql.jdbc.statementimpl$canceltask memory leak problems and solutions