Cause
The cause is because the log often reported the lock waiting time-out error, and this is interlocking, a lock waiting will directly trigger another lock wait, so the harm is very serious, the impact is very far-reaching. Find out why the discovery was C3P0
reported DEADLOCK
, as shown:
Analysis
It can be seen ScatteredAcquireTask
that the task of getting the connection, all the cards are not moving. It's obviously impossible to get a new database connection. Just the day before the architectural adjustment-from the application of direct to the middle of the Mysql
change to add a layer Cobar
(about Cobar
, it is a Mysql
proxy middleware, to deal with the Sub-Library). Guessing is the problem of switching to Cobar, but what is the problem? Let's take a look at the following:
The first is to switch to Cobar
the previous server structure diagram:
N-Station application –> 1 MySQL
Here's what to add after Cobar:
N Taiwan Application –> 2 sets of Cobar-n MySQL
So what's the difference between the two? I did a local test, C3P0
set the initial number of connections to 5000, that is, to simulate a large number of connection requests to the database response, to see if there will be a ScatteredAcquireTask
stuck situation. 5,000 connections were built, but unexpectedly, I show processlist
didn't see the thread growth when I looked at the MySQL thread, but just the same. I am almost aware of the problem here, in order to confirm this point, I used a direct connection to the database up to 5,000 connections, this time the database connection decisively went up. Well, see here everyone should have guessed it, and establish a Cobar
connection, does not mean to Mysql
establish a connection, in fact, in our application to Mysql
this part of the road there are two "pools", one is our application and Cobar
between the database connection pool, there is a Cobar and Mysql
Connection pool between the. There is no bottleneck between the number of connections between the application and the Cobar, and we know that they are communicated with each other NIO
. But Cobar
between and? Mysql
OH, because Cobar
only half of the realization NIO
, so and Mysql
between the walk BIO
.
It is also important to note that our company's Cobar
servers are built and maintained by DBAs. So all configurations are opaque to our back-end development, and we don't know what to configure for our unfamiliar Cobar
configurations. But with just the conjecture, it is almost known that the bottleneck is Cobar
and Mysql
between.
Solve
Follow their own conjecture, decisive Baidu a bit Cobar
of configuration parameters, found that there is a parameter named PoolSize
, used to set Cobar
the connection pool size with the back-end data source. Turn it up, and the problem will be solved.
And afterwards in the view Cobar
of the alarm
log, found that the time period of the problem is exactly the following log, but also more confirmed my guess!
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
Remember to use Cobar to step on the pit