Mysql tuning process
A few days ago, I made a database query, which was slow. I made a query and recorded it here to facilitate later reading,
1) first review the query Index
(Tue Jun 27 12:33:24 2017) db_1> show keys from xxxx ;=> (same as show index from xxxx) + -------------- + ------------ + hour + -------------- + hour + ----------- + hour + ---------- + -------- + ------ + ------------ + --------- + hour + | Table | Non_unique | Key_name | hour | Column_name | Collation cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | + -------------- + ---------- + metrics + ----------- + metrics + ---------- + -------- + ------------ + --------- + metrics --------------- + | xxxx | 0 | PRIMARY | 1 | iId | A | 1337960 | NULL | BTREE | xxxx | 1 | index_dPayTime | 1 | dPayTime | A | 1337960 | NULL | BTREE | xxxx | 1 | index_sUserName | 1 | sUserName | A | 445986 | NULL | BTREE | xxxx | 1 | index_sPlatName | 1 | iProxyId | A | 19 | NULL | BTREE | xxxx | 1 | index_sPlatName | 2 | iServerId | A | 1359 | NULL | BTREE | | xxxx | 1 | index_sPlatName | 3 | id | A | 1337960 | NULL | YES | BTREE | xxxx | 1 | index_dPayDate | 1 | dPayDate | A | 19 | NULL | BTREE | xxxx | 1 | Index_iPlatServerId_iPlayerId | 1 | iPlatServerId | A | 3126 | NULL | BTREE | xxxx | 1 | Index_iPlatServerId_iPlayerId | 2 | iPlayerId | A | 334490 | NULL | BTREE | xxxx | 1 | index_dPayDate_yue_iProxyId | 1 | dPayDate_yue | A | 19 | NULL | BTREE | | xxxx | 1 | index_dPayDate_yue_iProxyId | 2 | iProxyId | A | 545 | NULL | BTREE | xxxx | 1 | index_dPayDate_yue_iProxyId | 3 | iPayType | A | 545 | NULL | YES | BTREE |
Type description:
1. Table name.
2. Non_unique: if the index cannot contain duplicate words, the value is 0. If possible, the value is 1.
3. Name of the Key_name Index
4. The column serial number in the Seq_in_index index, starting from 1.
5. Column_name column name.
6. How is the Collation column stored in the index. In mysql, there are values 'A' (ascending) or NULL (unclassified ).
7. estimate the number of unique values in the Cardinality index. You can update analyze table or myisamchk-a by running analyze table. The base number is counted based on the statistical data stored as an integer. Therefore, this value is not required to be accurate even for small tables. The larger the base, the larger the chance for MySQL to use the index for union.
8. If the Sub_part column is partially indexed, it is the number of indexed characters. If the entire column is indexed, the value is NULL.
9. Packed indicates how the keywords are compressed. If it is not compressed, It is NULL.
10. Null: If the column contains NULL, YES is contained. If NO, the column contains NO.
11. Index_type used index methods (BTREE, FULLTEXT, HASH, RTREE ).
12. Comment index comments
Note:
1. The greater the cardinality value, the higher the chance of hitting this index. 2. Which index can be forcibly hit?
3 Cardinality is an estimate instead of an accurate value. Basically, you cannot get an accurate value. In actual applications, Cardinality/n_row_in_table should be as close as possible to 1,
If it is very small, you need to consider whether it is necessary to create this index. Therefore, it is necessary to add B + tree indexes to fields when accessing highly selective fields and extracting a small portion of data from the table.
If you want to view more detailed information about Cardinality, you can view it in: http://www.cnblogs.com/olinux/p/5140615.html
2) A colleague performs a query online:
select * from t_xxxx where pay_id in ( select pay_id from t_xxxx where dPayTime>="2017-06-21 12:12:31" AND dPayTime<="2017-06-28 12:12:31" group by pay_id having count(pay_id) > 1)
Then I ran a half-day failover and asked me what happened. I looked at it:
explain select * from t_xxxx where pay_id in ( select pay_id from t_xxxx where dPayTime>="2017-06-21 12:12:31" AND dPayTime<="2017-06-28 12:12:31" group by pay_id having count(pay_id) > 1)
Expected result:
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+--------------------+--------------+-------+----------------+----------------+---------+------+---------+----------------------------------------------+| 1 | PRIMARY | t_xxxx | ALL | NULL | NULL | NULL | NULL | 1395826 | Using where || 2 | DEPENDENT SUBQUERY | t_xxx | range | index_dPayTime | index_dPayTime | 8 | NULL | 176284 | Using where; Using temporary; Using filesort |
This SQL statement contains subqueries and parent queries. If we query subqueries first, we can see that the execution is very fast:
Select pay_id from t_xxxx where dPayTime> = "12:12:31" AND dPayTime <= "12:12:31" group by pay_id having count (pay_id)> 1;
Empty set (0.42 sec)
Execution completed in 0.42 seconds
You can see the following in explain:
+----+-------------+--------------+-------+----------------+----------------+---------+------+--------+----------------------------------------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+--------------+-------+----------------+----------------+---------+------+--------+----------------------------------------------+| 1 | SIMPLE | t_xxxx | range | index_dPayTime | index_dPayTime | 8 | NULL | 176284 | Using where; Using temporary; Using filesort |+----+-------------+--------------+-------+----------------+----------------+---------+------+--------+----------------------------------------------+
Then, we try the parent query and replace the query with the result of the subquery:
Select * from t_xxxx where pay_id in (-> '20160301', '20160301', '20160301 ');
3 rows in set (0.50 sec)
Query completed in 0.56 seconds
Then it takes only one second to end the query.
However, this is not the case. After 10 minutes of execution, we still did not find (1 hundred and 0.5 million pieces of data). This is not a simple 1 + 1, but a multiplication. If we use in to operate, the number of rows affected by the subquery is multiplied by the number of rows affected by the parent query. The product quantity is very large, so the query takes a long time.
Therefore, we recommend that you change the table to a joined table ==>
select * from t_xxxx tpp1, (select pay_id from t_xxxx where dPayTime>="2017-06-21 12:12:31" AND dPayTime<="2017-06-28 12:12:31" group by pay_id
having count(pay_id) > 1) tpp2
where tpp1.pay_id=tpp2.pay_id and dPayTime>="2017-06-21 12:12:31" AND dPayTime<="2017-06-28 12:12:31"
If the table is joined, the table is joined only after the calculation of the two is completed. However, the subquery results are very few, with only a few results. Therefore, join tables are computed quickly and completed in 2 seconds.
Here, we should also note that if the query time span is large and the number of items is large, the index will not be hit and the common query will be automatically used.
explain select * from t_xxxx tpp1, (select pay_id from t_xxxx where dPayTime>="2017-06-01 12:12:31" AND dPayTime<="2017-06-28 12:12:31" group by pay_id having
count(pay_id) > 1) tpp2 where tpp1.pay_id=tpp2.pay_id and dPayTime>="2017-06-01 12:12:31" AND dPayTime<="2017-06-28 12:12:31"
+ ---- + ------------- + -------------- + -------- + ---------------- + ------ + --------- + Keys + | id | select_type | table | type | possible_keys | key | key_len | ref | rows | extra | + ---- + ------------- + -------------- + -------- + ---------------- + ------ + --------- + hour + | 1 | PRIMARY | <derived2> | system | NULL | NULL | 1 | 1 | PRIMARY | tpp1 | ALL | index_dPayTime | NULL | 1396694 | Using where | 2 | DERIVED | t_xxxx | ALL | index_dPayTime | NULL | NULL | 1396694 | Using where; using temporary; Using filesort | + ---- + ------------- + -------------- + -------- + ---------------- + ------ + --------- + hour +
No index hit
Therefore, we should try our best to optimize it. It is slow to query without hitting the index.