MySQL slow SQL query causes the entire website to go down.

Source: Internet
Author: User

MySQL slow SQL query causes the entire website to go down.

Go straight to the question:

Generally, the slow query we see does not result in the suspension of the site. At most, the application response is slow.
However, this happened to me today. A slow query crashed the entire website.
Let's take a look at this SQL statement:

# Query_time: 70.472013 Lock_time: 0.000078 Rows_sent: 7915203 Rows_examined: 15984089 Rows_affected: 0
# Bytes_sent: 1258414478
Use js_sku;
SET timestamp = 1465850117;
SELECT
Ss_id, ss_sa_id, ss_si_id, ss_av_zid, ss_av_fid, ss_artno,
Ss_av_zvalue, ss_av_fvalue, ss_av_zpic, ss_av_fpic, ss_number,
Ss_sales, ss_cprice, ss_price, ss_stock, ss_orderid, ss_status,
Ss_add_time, ss_lastmodify
FROM js_sgoods_sku
WHERE ss_si_id = 0 AND ss_status> 0
ORDER
Ss_orderid DESC, ss_av_fid ASC;
The information posted here is the mysql slow log, and the query time is up to 70 s !!
When we see slow queries, the first response is that this statement does not use an index? Or is the index unreasonable? Then we will look at the execution plan:

Mysql> explain SELECT
-> Ss_id, ss_sa_id, ss_si_id, ss_av_zid, ss_av_fid, ss_artno,
-> Ss_av_zvalue, ss_av_fvalue, ss_av_zpic, ss_av_fpic, ss_number,
-> Ss_sales, ss_cprice, ss_price, ss_stock, ss_orderid, ss_status,
-> Ss_add_time, ss_lastmodify
-> FROM js_sgoods_sku
-> WHERE ss_si_id = 0 AND ss_status> 0
-> ORDER
-> Ss_orderid DESC, ss_av_fid ASC;
+ ---- + ------------- + --------------- + ------ + --------------- + ---------- + --------- + ------- + --------- + --------------------------- +
| Id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+ ---- + ------------- + --------------- + ------ + --------------- + ---------- + --------- + ------- + --------- + --------------------------- +
| 1 | SIMPLE | js_sgoods_sku | ref | ss_si_id | 4 | const | 9516091 | Using where; Using filesort |
+ ---- + ------------- + --------------- + ------ + --------------- + ---------- + --------- + ------- + --------- + --------------------------- +
1 row in set (0.00 sec)

This seems to have used indexes, but why are there so many rows scanned? Then let's look at the table structure, hoping to find something valuable:
We can see the following available information:
KEY 'ss _ si_id '('ss _ si_id', 'ss _ av_zid ', 'ss _ av_fid') using btree,
'Ss _ si_id 'int (11) unsigned not null default '0' comment' corresponds to js_sgoods_info.si_id ',

We can see that the index seems acceptable, but we can see that the ss_si_id field is actually the primary key of the goods_info table, that is, it should be highly discrete, that is, the distinction is very high.
In fact, we can basically consider this step as a result of many ss_si_id = 0 in our table, but we can further confirm our conjecture:

1. First, we can determine that our statistics are correct.
2. Then we count the data of the value ss_si_id = 0 to further verify our conjecture.

Check the following index statistics:
Xiean @ localhost: js_sku 03:27:42> show index from js_sgoods_sku;
+ --------------- + ------------ + ---------- + -------------- + ------------- + ----------- + ------------- + ---------- + -------- + ------------ + ----------- + --------------- +
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+ --------------- + ------------ + ---------- + -------------- + ------------- + ----------- + ------------- + ---------- + -------- + ------------ + ----------- + --------------- +
| Js_sgoods_sku | 0 | PRIMARY | 1 | ss_id | A | 18115773 | NULL | BTREE |
| Js_sgoods_sku | 1 | ss_si_id | 1 | ss_si_id | A | 1811577 | NULL | BTREE |
| Js_sgoods_sku | 1 | ss_si_id | 2 | ss_av_zid | A | 6038591 | NULL | BTREE |
| Js_sgoods_sku | 1 | ss_si_id | 3 | ss_av_fid | A | 18115773 | NULL | BTREE |
| Js_sgoods_sku | 1 | IDX_001 | 1 | ss_sa_id | A | 3623154 | NULL | BTREE |
+ --------------- + ------------ + ---------- + -------------- + ------------- + ----------- + ------------- + ---------- + -------- + ------------ + ----------- + --------------- +

You can see the following problems:
Our ss_si_id field does not show on the surface because it is associated with the PRIMARY key of a table, its Cardinality value should be close to the PRIMARY value. The difference is big. Is the index statistics inaccurate? Then we try to re-collect the index statistics:
Xiean @ localhost: js_sku 03:27:47> analyze table js_sgoods_sku;
+ ---------------------- + --------- + ---------- +
| Table | Op | Msg_type | Msg_text |
+ ---------------------- + --------- + ---------- +
| Js_sku.js_sgoods_sku | analyze | status | OK |
+ ---------------------- + --------- + ---------- +

But, we can view the statistics of these indexes again:
Xiean @ localhost: js_sku 03:28:14> show index from js_sgoods_sku;
+ --------------- + ------------ + ---------- + -------------- + ------------- + ----------- + ------------- + ---------- + -------- + ------------ + ----------- + --------------- +
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+ --------------- + ------------ + ---------- + -------------- + ------------- + ----------- + ------------- + ---------- + -------- + ------------ + ----------- + --------------- +
| Js_sgoods_sku | 0 | PRIMARY | 1 | ss_id | A | 18621349 | NULL | BTREE |
| Js_sgoods_sku | 1 | ss_si_id | 1 | ss_si_id | A | 1551779 | NULL | BTREE |
| Js_sgoods_sku | 1 | ss_si_id | 2 | ss_av_zid | A | 6207116 | NULL | BTREE |
| Js_sgoods_sku | 1 | ss_si_id | 3 | ss_av_fid | A | 18621349 | NULL | BTREE |
| Js_sgoods_sku | 1 | IDX_001 | 1 | ss_sa_id | A | 3724269 | NULL | BTREE |
+ --------------- + ------------ + ---------- + -------------- + ------------- + ----------- + ------------- + ---------- + -------- + ------------ + ----------- + --------------- +

We can see that the discrete degree (Cardinality) of ss_si_id does not increase but fluctuates downward, because the information is collected by PAGE, and the data distribution on each page is different, as a result, the statistics collected by this index will change.

Well, here we can think that our statistics are not invalid, so we can look at the data separately:

+ -------------- ++ ---------- ++ -------------------- +
| Ss_si_id = 0; | count (*) | 7994788/19048617 |
+ -------------- ++ ---------- ++ -------------------- +
| 7994788 | 19048617 | 0.4197 |
+ -------------- ++ ---------- ++ -------------------- +

Yes, I don't know. I was shocked at the first glance: a large number of ss_si_id = 0 exists in our table, accounting for 41% of the total table data size !!!


Okay, the problem is found. Next we need to know why this SQL statement will cause the website to crash?

We can see some information through monitoring the application server: Our goods_service service is abnormal: the exception is as follows:

1. cpu usage of more than 100% for a long time
2. jstatck pid cannot dump memory stack information, and must force dump-F
3. dump memory information found that all threads in this process are in the BLOCKED status.
4. Through jstat-gcutil, we can see that FGC is quite frequent, and FGC is performed once every 10 s.
5. memory usage exceeds the allocated memory

The final cause is that a large amount of data (up to rows of data) is queried by the slow query above, resulting in a soaring memory of goods_service and service failure to respond. The further deterioration is the failure to occupy


OK. Now that we know why the account fails, how can we solve this problem?
Since we know it is because ss_si_id = 0 is queried, we can block this SQL statement. There are multiple blocking methods:
1. The program logic checks whether all ss_si_id = 0 queries of this type are blocked.
2. Modify the SQL configuration file and SQL statements.

We found that a large number of slow queries exist on the DB server, and the load on the DB server has soared from 0.xx to 50 +, and the number of connections subsequently soared sharply. If not, it is estimated that the DB server has crashed.

 

We finally adopt the following solutions:
1. When O & M works with R & D to modify SQL statements, we add a condition in the WHERE condition: AND ss_si_id <> 0, which is blocked at the MySQL row plan layer;
2. DBA enables the kill query statement to prevent the DB server from going down. Of course, this tool uses our pt-kill tool and has to be said to be quite useful.


Summary (Experience and Lessons ):
1. for SQL statements similar to the default value, we should prevent such queries from the source.
2. restrict the size and size of the query result set to avoid service death due to the large size of the query result set

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.