Phenomenon:
Flagship store freight library CPU usage 100%,load higher, resulting in subsequent requests failing.
Restart the server, CPU, load back to normal.
Trigger conditions:
(1) Linux kernel version 3.6 and below. (Most of the online machines are 2.6.32)
(2) mysql-connector-java5.1.31 version and below. (each line of business needs its own check)
(3) Mysql-client not set sockettimeout. (each line of business needs its own check)
(4) The thread that kills Mysql-server and Mysql-client is on the mysql-server side. (DBAs often kill slow queries)
When (1) (2) (3) is simultaneously available, the client connection thread will die as long as the trigger (4) is triggered. Kill a Mysql-server thread, the client will die looping one, occupying a CPU core.
Specific reasons:
Linux Kernels version 3.6 and earlier (including 2.6.32) has a bug [1] which makes requests for the amount of available B Ytes to read in a socket in close_wait state to return 1 even after the EOF have been read. This bug makes socketinputstream.available return 1 for sockets in close_wait state and causes a seemly infinite loop in M Ysqlio.clearinputstream where it attempts to read from the socket until the number of available bytes reaches 0, but there Is nothing to read.
Source: https://bugs.mysql.com/bug.php?id=73053
MySQL official website 5.1.32 version Change record, has mentioned this bug:
A bug in the Linux kernel version 3.6 and earlier caused the MysqlIO.clearInputStream()
method to enter an endless loop. This fix changes the looping condition is evaluated and in order to avoid the problem. (Bug #19022745, bug #73053)
Source: https://dev.mysql.com/doc/relnotes/connector-j/5.1/en/news-5-1-32.html
MySQL high version (bug with Linux low version kernel), attached:
Solution:
(1) Upgrade Mysql-connector-java version to 5.1.32 and above. Or
(2) Upgrade the Linux kernel version to 3.7 and above. Or
(3) The client sets the socket read timeout time. (The server thread is killed, the client line Cheng Lima is released, and no time-out is reached.) Why it can take effect, not very clear, have to know can reply to the mail)
Recommended Use (1)
the thread stack information for the analysis problem was as follows:
At Java.net.PlainSocketImpl.socketAvailable (Native Method)
At Java.net.AbstractPlainSocketImpl.available (
abstractplainsocketimpl.java:478)
-Locked <0x000000070ed04a40> (a Java.net.SocksSocketImpl)
At Java.net.SocketInputStream.available (socketinputstream.java:245)
At Com.mysql.jdbc.util.ReadAheadInputStream.fill (readaheadinputstream.java:72)
At Com.mysql.jdbc.util.ReadAheadInputStream.skip (readaheadinputstream.java:300)
At Com.mysql.jdbc.MysqlIO.clearInputStream (mysqlio.java:948)
At Com.mysql.jdbc.MysqlIO.sendCommand (mysqlio.java:2404)
At Com.mysql.jdbc.ConnectionImpl.pingInternal (Unknown Source)
At Com.mysql.jdbc.ConnectionImpl.execSQL (Unknown Source)
-Locked <0x000000070ed04c10> (a com.mysql.jdbc.JDBC4Connection)
At Com.mysql.jdbc.ConnectionImpl.execSQL (Unknown Source)
At Com.mysql.jdbc.StatementImpl.execute (Unknown Source)
-Locked <0x000000070ed04c10> (a com.mysql.jdbc.JDBC4Connection)
At Com.mysql.jdbc.StatementImpl.execute (Unknown Source)
The process of checking the problem:
1. Restart the machine and the CPU usage returns to normal. But keep a machine to do the scene.
2. Use the Jstack command to analyze the field machine with the top command, discover threads with long execution times and high CPU usage, and die loops in Mysqlio.clearinputstream ().
3. On-line access to the information, found that the Linux kernel 3.6 version and the following bug caused. But I don't know how to trigger it.
4. Ask the DBA when the DBA is at the time to kill the slow query on the mysql-server side. One of the guessing trigger conditions is a thread that kills Mysql-server with Mysql-client on the mysql-server side of the connection.
Replication process:
1. Using the Linux Deployment app service with kernel version 2.6.32, the Mysql-connector-java version used by app service is 5.1.21 and mysql-client is not set sockettimeout. (Consistent with on-line environment)
2. Changing the SQL statement in the app service actually adds hibernation, which makes it easy to view this thread on the Mysql-server side. SQL statement by select ... Change to select Sleep (10) ...
3. Invoke the HTTP interface to trigger the test query statement.
4. Log in to Mysql-server, use the command show processlist; View the query statement for the test to get the thread ID. Kill + thread ID, killing the executing thread.
5. Application server, CPU 1 core utilization rate reached 100%, and HTTP interface, has not responded to return (Application server thread dead loop).
6. Repeat steps (3), (4), one thread per kill on the Mysql-server side, and the CPU will have a core utilization rate of 100%.
Note: If the application server has patched Tcp:fix fionread/siocinq, it cannot be reproduced.
The process identified by the solution:
1. Upgrade Mysql-connector-java version to 5.1.34, other conditions unchanged. Kill the thread on the Mysql-server side, the application server CPU does not change, the HTTP interface response is returned immediately.
2. Upgrade the Linux kernel version to 3.18.48, the other conditions are the same. Kill the thread on the Mysql-server side, the application server CPU does not change, the HTTP interface response is returned immediately.
3. Set the Mysql-client end time-out period of 2 minutes, other conditions are unchanged. Kill the thread on the Mysql-server side, the application server CPU does not change, the HTTP interface response is returned immediately. The HTTP interface response was not returned until the 2-minute timeout was exceeded.
Bugs caused by Linux kernel version 3.6 and below--CPU utilization 100%