Just cut to the chase:
Generally speaking, the slow query we see is not likely to cause the hang station, at most the application response slows down
But this happened to me today, a slow query to get the whole site hung up
look at this SQL sheet first:
# query_time:70.472013 lock_time:0.000078 rows_sent:7915203 rows_examined:15984089 rows_affected:0
# bytes_sent:1258414478
Use Js_sku;
SET timestamp=1465850117;
SELECT
ss_id, ss_sa_id, ss_si_id, Ss_av_zid, Ss_av_fid, Ss_artno,
Ss_av_zvalue, Ss_av_fvalue, Ss_av_zpic, Ss_av_fpic, Ss_number,
Ss_sales, Ss_cprice, Ss_price, Ss_stock, Ss_orderid, Ss_status,
Ss_add_time, Ss_lastmodify
From Js_sgoods_sku
WHERE ss_si_id = 0 and ss_status > 0
ORDER BY
Ss_orderid DESC, Ss_av_fid ASC;
Posted here is the MySQL slow log information, query time with up to 70s!!
See slow query Our general first reaction is that this statement does not use the index? Or is the index unreasonable? Then we'll look at the execution plan:
Mysql> explain SELECT
-> ss_id, ss_sa_id, ss_si_id, Ss_av_zid, Ss_av_fid, Ss_artno,
-> Ss_av_zvalue, Ss_av_fvalue, Ss_av_zpic, Ss_av_fpic, Ss_number,
-> ss_sales, Ss_cprice, Ss_price, Ss_stock, Ss_orderid, Ss_status,
-> Ss_add_time, Ss_lastmodify
-> from Js_sgoods_sku
-> WHERE ss_si_id = 0 and ss_status > 0
-> ORDER BY
-> Ss_orderid DESC, Ss_av_fid ASC;
+----+-------------+---------------+------+---------------+----------+---------+-------+---------+------------- ----------------+
| ID | Select_type | Table | Type | Possible_keys | Key | Key_len | Ref | Rows | Extra |
+----+-------------+---------------+------+---------------+----------+---------+-------+---------+------------- ----------------+
| 1 | Simple | Js_sgoods_sku | Ref | ss_si_id | ss_si_id | 4 | Const | 9516091 | The Using where; Using Filesort |
+----+-------------+---------------+------+---------------+----------+---------+-------+---------+------------- ----------------+
1 row in Set (0.00 sec)
This seems to use the index, but why scan to the line still so much? So let's look at the structure of the table and expect to find something valuable:
We see the following available information:
KEY ' ss_si_id ' (' ss_si_id ', ' Ss_av_zid ', ' Ss_av_fid ') USING btree,
' ss_si_id ' int (one) unsigned not NULL DEFAULT ' 0 ' COMMENT ' corresponds to js_sgoods_info.si_id ',
We see that the index seems to be more acceptable, but we see the ss_si_id this field is actually the primary key of the Goods_info table, which means that its discrete degree should be very large, that is, the degree of differentiation is very large.
In fact, we can basically think of this as a result of a lot of ss_si_id=0 in this table, but we can further confirm our conjecture:
1. First we can make sure our statistics are okay.
2. Second, we can further verify our conjecture by how much data we have on the value of Count ss_si_id=0.
So let's look at the statistics for this index first:
Xiean@localhost:js_sku 03:27:42>show index from Js_sgoods_sku;
+---------------+------------+----------+--------------+-------------+-----------+-------------+----------+---- ----+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | index_comment |
+---------------+------------+----------+--------------+-------------+-----------+-------------+----------+---- ----+------+------------+---------+---------------+
| Js_sgoods_sku | 0 | PRIMARY | 1 | ss_id | A | 18115773 | NULL | NULL | | Btree | | |
| Js_sgoods_sku | 1 | ss_si_id | 1 | ss_si_id | A | 1811577 | NULL | NULL | | Btree | | |
| Js_sgoods_sku | 1 | ss_si_id | 2 | Ss_av_zid | A | 6038591 | NULL | NULL | | Btree | | |
| Js_sgoods_sku | 1 | ss_si_id | 3 | Ss_av_fid | A | 18115773 | NULL | NULL | | Btree | | |
| Js_sgoods_sku | 1 | idx_001 | 1 | ss_sa_id | A | 3623154 | NULL | NULL | | Btree | | |
+---------------+------------+----------+--------------+-------------+-----------+-------------+----------+---- ----+------+------------+---------+---------------+
Then you can see the following questions:
Our ss_si_id This field is not what we see on the surface because the primary key of a table is associated, its cardinality value should be close to the PRIMARY value. But the difference is relatively large, is not the statistical information of the index is accurate? Then we try to collect the statistics for the following index:
Xiean@localhost:js_sku 03:27:47>analyze table Js_sgoods_sku;
+----------------------+---------+----------+----------+
| Table | Op | Msg_type | Msg_text |
+----------------------+---------+----------+----------+
| Js_sku.js_sgoods_sku | Analyze | Status | OK |
+----------------------+---------+----------+----------+
But, we look at the statistics for these indexes again:
Xiean@localhost:js_sku 03:28:14>show index from Js_sgoods_sku;
+---------------+------------+----------+--------------+-------------+-----------+-------------+----------+---- ----+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | index_comment |
+---------------+------------+----------+--------------+-------------+-----------+-------------+----------+---- ----+------+------------+---------+---------------+
| Js_sgoods_sku | 0 | PRIMARY | 1 | ss_id | A | 18621349 | NULL | NULL | | Btree | | |
| Js_sgoods_sku | 1 | ss_si_id | 1 | ss_si_id | A | 1551779 | NULL | NULL | | Btree | | |
| Js_sgoods_sku | 1 | ss_si_id | 2 | Ss_av_zid | A | 6207116 | NULL | NULL | | Btree | | |
| Js_sgoods_sku | 1 | ss_si_id | 3 | Ss_av_fid | A | 18621349 | NULL | NULL | | Btree | | |
| Js_sgoods_sku | 1 | idx_001 | 1 | ss_sa_id | A | 3724269 | NULL | NULL | | Btree | | |
+---------------+------------+----------+--------------+-------------+-----------+-------------+----------+---- ----+------+------------+---------+---------------+
We can see that the ss_si_id dispersion (cardinality) does not increase but there is a downward trend, because this information is collected from a partial page, and each page top data distribution is not the same, resulting in our index collection of statistical information will change.
Well, here we can assume that our statistics are not invalidated, so we'll look at the difference in the data:
+--------------++----------++------------------+
| Ss_si_id=0; | | COUNT (*) | | 7994788/19048617 |
+--------------++----------++------------------+
| 7994788 | | 19048617 | | 0.4197 |
+--------------++----------++------------------+
Well, do not see do not know, a look startled: we this table inside there are a large number of ss_si_id=0, accounting for the entire table of data volume of 41%!!!
OK, so here's the question, and then we need to know, why does this SQL statement cause the hanging station?
We see some information by watching the application server's monitoring: Our Goods_service This service exception: The exception is as follows:
1. CPU long occupation of 100% +
2. JSTATCK pid cannot dump memory stack information, must force dump-f
3. Dump the memory information found that the process inside all the threads are in BLOCKED state
4. Through jstat-gcutil see FGC quite frequently, about 10s FGC once
5. Memory footprint exceeds allocated memory
So the ultimate reason is because the query query a large number of slow queries (up to 700w rows of data), resulting in goods_service memory surge, the presence of services can not respond, further deterioration is to hang up
OK, we know why the hang up, then how do we solve this problem?
Now that we know it is due to query ss_si_id=0, then we have to block out this SQL is not good. Shielding methods can have a variety of:
1. Our procedural logic to judge this type of query if there are inquiries ss_si_id=0 all blocked off
2. We change the SQL configuration file, modify the SQL statement
We found that the DB server has a large number of this slow query, and the DB server load has soared from 0.xx to 50 +, and the number of connections has also soared, if not processed in time, estimated DB server also hung up
Then we finally take the following approach:
1. Coordinate with research and development modify SQL statements We added a condition to this where condition: and ss_si_id <> 0, block the SQL in the MySQL line plan layer;
2.DBA to kill this query, to avoid the DB server down machine situation, of course, this use of our Pt-kill tools, we have to say that this tool is quite useful
Summary (experience and lessons learned):
1. Similar to this query default value of SQL, we should be from the source to eliminate such inquiries
2. Limit query result set size to avoid service death due to too large query result set