Troubleshooting of an online MySql connection exception

Source: Internet
Author: User

Troubleshooting of an online MySql connection exception

As a common database, Mysql is widely used in Internet systems. Some faults are their own bugs, and some are not. Here are examples of problems encountered in the past.

Problem

The problem was that our application went online to test the environment. During the JMeter test, it was found that the first few http request requests with low access times out at the beginning of each stress test, but not in subsequent requests for Continuous testing. The last point is that the Tomcat log does not report any errors.

The content of the stress test is that 200 threads keep sending requests to this http page. The page logic is also relatively simple and will insert a piece of data to the database at the backend, the connection pool uses Alibaba's Druid (this pitfall stays here first). tomcat runs regular web apps, and the maximum number of connections in each JDBC connection pool of each application is set to 30, that is to say, even if four tomcat servers are connected to the database together, there is not much connection at most.

Troubleshoot

Because there is no error in the tomcat log, first try to reproduce the error. It was not easy to reproduce the error because it looked random. After summing up, I found that every time a problem occurs, the application was put for one night, and the tester came to start the stress test in the morning, I started to suspect that it was related to idle, so the subsequent reproduction was done in this way, and I started to try again after being idle for half an hour.

Find log

Without logs, you need to check the JVM stack information. Reproduce the fault. On this machine, use jstack to directly capture the jvm information of the problematic tomcat.

Jps
List machine java Process numbers
Jstack javaid
Dump the stack information of the java Process

The stack information is useful:

Http-bio-8081-exec-4975 daemon prio = 10 tid = 0x00007f9d4c0000000 nid = 0x65db runnable [0x00007f9cc4544000]
Java. lang. Thread. State: RUNNABLE
At java.net. SocketInputStream. socketRead0 (Native Method)
At java.net. SocketInputStream. read (fig. java: 129)
At com. mysql. jdbc. util. ReadAheadInputStream. fill (ReadAheadInputStream. java: 114)
At com. mysql. jdbc. util. ReadAheadInputStream. readFromUnderlyingStreamIfNecessary (ReadAheadInputStream. java: 161)
At ** com. mysql. jdbc. util. ReadAheadInputStream. read (ReadAheadInputStream. java: 189)
-Locked <0x0000000684d608c8> (a com. mysql. jdbc. util. ReadAheadInputStream )**
At com. mysql. jdbc. MysqlIO. readFully (MysqlIO. java: 3014)
At com. mysql. jdbc. MysqlIO. reuseAndReadPacket (MysqlIO. java: 3467)
At com. mysql. jdbc. MysqlIO. reuseAndReadPacket (MysqlIO. java: 3456)
At com. mysql. jdbc. MysqlIO. checkErrorPacket (MysqlIO. java: 3997)
At com. mysql. jdbc. MysqlIO. sendCommand (MysqlIO. java: 2468)
At com. mysql. jdbc. ConnectionImpl. pingInternal (ConnectionImpl. java: 4092)
At com. mysql. jdbc. ConnectionImpl. ping (ConnectionImpl. java: 4069)
At sun. reflect. GeneratedMethodAccessor94.invoke (Unknown Source)
At sun. reflect. DelegatingMethodAccessorImpl. invoke (DelegatingMethodAccessorImpl. java: 25)
At java. lang. reflect. Method. invoke (Method. java: 597)
At com. alibaba. druid. pool. vendor. MySqlValidConnectionChecker. isValidConnection (MySqlValidConnectionChecker. java: 98)
At com. alibaba. druid. pool. druid=actdatasource. testConnectionInternal (druid=actdatasource. java: 1235)
At com. alibaba. druid. pool. DruidDataSource. getConnectionDirect (DruidDataSource. java: 928)
At com. alibaba. druid. pool. DruidDataSource. getConnection (DruidDataSource. java: 882)
At com. alibaba. druid. pool. DruidDataSource. getConnection (DruidDataSource. java: 872)
At com. alibaba. druid. pool. DruidDataSource. getConnection (DruidDataSource. java: 97)
At org. springframework. jdbc. datasource. DataSourceTransactionManager. doBegin (DataSourceTransactionManager. java: 202)
At org. springframework. transaction. support. AbstractPlatformTransactionManager. getTransaction (AbstractPlatformTransactionManager. java: 372)
At org. springframework. transaction. interceptor. TransactionAspectSupport. createTransactionIfNecessary (TransactionAspectSupport. java: 417)
At org. springframework. transaction. interceptor. TransactionAspectSupport. invokeWithinTransaction (TransactionAspectSupport. java: 255)
At org. springframework. transaction. interceptor. TransactionInterceptor. invoke (TransactionInterceptor. java: 94)
At org. springframework. aop. framework. ReflectiveMethodInvocation. proceed (ReflectiveMethodInvocation. java: 179)
At org. springframework. aop. framework. JdkDynamicAopProxy. invoke (JdkDynamicAopProxy. java: 207)
At com. sun. proxy. $ Proxy27.insert (Unknown Source)

You can see that the HTTP request is stuck in the database read location from the front-end container until the database is read, and it is not a problem in the JDBC driver code, and is located in the socket read location:

Com. mysql. jdbc. util. ReadAheadInputStream. read (ReadAheadInputStream. java: 189)
-Locked <0x0000000684d608c8> (a com. mysql. jdbc. util. ReadAheadInputStream )*

After searching for this error, the only valuable one is a bug reported on the mysql website N years ago. The same error is not fixed. Change the JDBC connection string:

UseReadAheadInput = false & useUnbufferedInput = false

This only prevents the socket from pre-reading the network buffer, but the Mysql connection has actually been disconnected at this time. I don't know whether the web application is disconnected or the Mysql is disconnected.

Connection Pool timeout

Because the log shows that the application client cannot read anything on the socket, the tcp connection between the application and mysql is definitely broken. Therefore, check the connection timeout settings between the application connection pool and mysql.

Application connection pool settings

Name = "maxWait" value = "60000"
Get connection wait 60 seconds at most
Name = "testWhileIdle" value = "true"
Test idle connection
Name = "minEvictableIdleTimeMillis" value = "300000"
Name = "timeBetweenEvictionRunsMillis" value = "60000"
The Destroy thread detects the connection interval.

Application connection pool settings are not automatically disconnected.

Mysql connection timeout settings

Show global variables like '% timeout %'

The timeout time for mysql to maintain the connection is 28800, that is, 8 hours. The database will not disconnect the connection.

At this point, the problem is fixed and the two sides are not disconnected. Why is the client disconnected after several minutes of being idle?

There is also a doubt that the same code does not change the database, and this problem is completely absent in another pure test environment.

Find network problems

Currently, the key issue is network problems in the online environment. So I asked my O & M colleagues to check whether there were any abnormal configurations in linux on the database machine. The result was no.

During this period, I also wondered why I used the connection pool of Alibaba druid. Now, if I set the connection pool to be checked every minute, there will still be invalid connections when I get the connection.

Solution

After two days of intermittent tossing, I went to consult with my colleagues from other departments with dead horses. As a result, my brother said that he was stuck on socketRead after being idle? Then I asked if the application and database are on different network segments. I suggest you check the firewall settings for tcp persistent connection timeout.

At this time, it is basically a problem with the firewall settings. After investigation, we found that the persistent connection of the Huawei switch between the two CIDR blocks timed out for 3 minutes, because the connection pool of java applications is maintained for as long as possible (several hours, lower than the maximum 8-hour setting of the database), the firewall considers that connections over 3 minutes are problematic, the application and mysql do not know that the tcp connection has been disconnected.

The failure also exposes the connection processing logic of the Ali Druid open source connection pool. The connection pool does not use a separate thread to check whether all connections are disconnected, after checking the code, we found that the connection was only tested to be valid when the connection was obtained. The processing logic was not rigorous by the old c3p0 model. Then we changed the application connection pool to c3p0.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.