Some optimizations that MySQL can make when the association is complex.
Yesterday, an optimization of complex associated SQL statements was processed. The optimization of such SQL statements usually takes into account the following four points:
1. The result set returned by the query. Generally, there are few returned result sets in the query, and they are confident to be optimized;
2. The selection of the driver table is crucial. by viewing the execution plan, you can see the driver table selected by the optimizer. the rows in the execution plan can roughly reflect the problem;
3. Sort out the associations between tables and check whether appropriate indexes exist in the associated fields;
4. Use the straight_join keyword to strengthen the Association Sequence between tabulation, so that we can easily verify some conjecture;
SQL:
Execution time:
mysql> select c.yh_id,-> c.yh_dm,-> c.yh_mc,-> c.mm,-> c.yh_lx,-> a.jg_id,-> a.jg_dm,-> a.jg_mc,-> a.jgxz_dm,-> d.js_dm yh_js-> from a, b, c-> left join d on d.yh_id = c.yh_id-> where a.jg_id = b.jg_id-> and b.yh_id = c.yh_id-> and a.yx_bj = ‘Y'-> and c.sc_bj = ‘N'-> and c.yx_bj = ‘Y'-> and c.sc_bj = ‘N'-> and c.yh_dm = '006939748XX' ;1 row in set (0.75 sec)
In this SQL query, only one row of data is returned, but the execution takes 750 ms. view the execution plan:
mysql> explain-> select c.yh_id,-> c.yh_dm,-> c.yh_mc,-> c.mm,-> c.yh_lx,-> a.jg_id,-> a.jg_dm,-> a.jg_mc,-> a.jgxz_dm,-> d.js_dm yh_js-> from a, b, c-> left join d on d.yh_id = c.yh_id-> where a.jg_id = b.jg_id-> and b.yh_id = c.yh_id-> and a.yx_bj = ‘Y'-> and c.sc_bj = ‘N'-> and c.yx_bj = ‘Y'-> and c.sc_bj = ‘N'-> and c.yh_dm = '006939748XX' ;+—-+————-+——-+——–+——————+———+———+————–+——-+————-+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+—-+————-+——-+——–+——————+———+———+————–+——-+————-+| 1 | SIMPLE | a | ALL | PRIMARY,INDEX_JG | NULL | NULL | NULL | 52616 | Using where || 1 | SIMPLE | b | ref | PRIMARY | PRIMARY | 98 | test.a.JG_ID | 1 | Using index || 1 | SIMPLE | c | eq_ref | PRIMARY | PRIMARY | 98 | test.b.YH_ID | 1 | Using where || 1 | SIMPLE | d | index | NULL | PRIMARY | 196 | NULL | 54584 | Using index |+—-+————-+——-+——–+——————+———+———+————–+——-+————-+
There are two prominent performance bottlenecks in the execution plan:
| 1 | SIMPLE | a | ALL | PRIMARY,INDEX_JG | NULL | NULL | NULL | 52616 | Using where || 1 | SIMPLE | d | index | NULL | PRIMARY | 196 | NULL | 54584 | Using index |
Because d is a left join table, the drive table does not select d. Let's take a look at the size of Table a, B, and Table c:
mysql> select count(*) from c;+———-+| count(*) |+———-+| 53731 |+———-+mysql> select count(*) from a;+———-+| count(*) |+———-+| 53335 |+———-+mysql> select count(*) from b;+———-+| count(*) |+———-+| 105809 |+———-+
Because table B has a larger data volume than the other two tables, and table B has no query and filtering conditions, B may be excluded from the drive table;
The optimizer actually chooses Table a as the driving table, but why not table c as the driving table? Let's analyze:
Stage 1: Table a as the driving table
A-> B-> c-> d:
(1): a. jg_id = B. jg_id-> (B Index: PRIMARY KEY ('jg _ id', 'yh _ id '))
(2): B. yh_id = c. yh_id-> (c Index: PRIMARY KEY ('yh _ id '))
(3): c. yh_id = d. yh_id-> (d index: primary key ('js _ dm', 'yh _ id '))
Because there is no yh_id index on Table d, the index is added to table d:
alter table d add index ind_yh_id(yh_id);
Execution Plan:
+—-+————-+——-+——–+——————+———–+———+————–+——-+————-+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+—-+————-+——-+——–+——————+———–+———+————–+——-+————-+| 1 | SIMPLE | a | ALL | PRIMARY,INDEX_JG | NULL | NULL | NULL | 52616 | Using where || 1 | SIMPLE | b | ref | PRIMARY | PRIMARY | 98 | test.a.JG_ID | 1 | Using index || 1 | SIMPLE | c | eq_ref | PRIMARY | PRIMARY | 98 | test.b.YH_ID | 1 | Using where || 1 | SIMPLE | d | ref | ind_yh_id | ind_yh_id | 98 | test.b.YH_ID | 272 | Using index |+—-+————-+——-+——–+——————+———–+———+————–+——-+————-+
Execution time:
1 row in set (0.77 sec)
After adding an index to Table d, the number of rows scanned in table d dropped to 272 (initially: 54584)
| 1 | SIMPLE | d | ref | ind_yh_id | ind_yh_id | 98 | test.b.YH_ID | 272 | Using index |
Stage 2: Drive Table c
D
^
|
C-> B->
Because yh_dm has high filtering conditions on Table c, we create an index on yh_dm:
mysql> select count(*) from c where yh_dm = '006939748XX';+———-+| count(*) |+———-+| 2 |+———-+
Add an index:
alter table c add index ind_yh_dm(yh_dm)
View the execution plan:
+—-+————-+——-+——–+——————-+———–+———+————–+——-+————-+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+—-+————-+——-+——–+——————-+———–+———+————–+——-+————-+| 1 | SIMPLE | a | ALL | PRIMARY,INDEX_JG | NULL | NULL | NULL | 52616 | Using where || 1 | SIMPLE | b | ref | PRIMARY | PRIMARY | 98 | test.a.JG_ID | 1 | Using index || 1 | SIMPLE | c | eq_ref | PRIMARY,ind_yh_dm | PRIMARY | 98 | test.b.YH_ID | 1 | Using where || 1 | SIMPLE | d | ref | ind_yh_id | ind_yh_id | 98 | test.b.YH_ID | 272 | Using index |+—-+————-+——-+——–+——————-+———–+———+————–+——-+————-+
Execution time:
1 row in set (0.74 sec)
After the index is added to Table c, the index is still not taken up, and the execution plan is still using Table a as the driving table. So let's analyze why Table a is used as the driving table?
1): c. yh_id = B. yh_id-> (PRIMARY KEY ('jg _ id', 'yh _ id '))
A. if table c is used as the driving table, table c is associated with table B because table B does not have the index of the yh_id field, because table B has a large amount of data, therefore, the optimizer considers that if Table c is used as the driving table, it will have a large association with table B (here we can use straight_join to force Table c as the driving table );
B. if table a is the driving table, table a is associated with table B because of the index of the jg_id field in table B, therefore, the optimizer considers that the cost of using a as the driving table is less than that of using c as the driving board;
Therefore, if we want to use table C as the driving table, we only need to add the yh_id index on Table B:
alter table b add index ind_yh_id(yh_id);
2): B. jg_id = a. jg_id-> (PRIMARY KEY ('jg _ id '))
3): c. yh_id = d. yh_id-> (KEY 'ind _ yh_id '('yh _ id '))
Execution Plan:
+—-+————-+——-+——–+——————-+———–+———+————–+——+————-+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+—-+————-+——-+——–+——————-+———–+———+————–+——+————-+| 1 | SIMPLE | c | ref | PRIMARY,ind_yh_dm | ind_yh_dm | 57 | const | 2 | Using where || 1 | SIMPLE | d | ref | ind_yh_id | ind_yh_id | 98 | test.c.YH_ID | 272 | Using index || 1 | SIMPLE | b | ref | PRIMARY,ind_yh_id | ind_yh_id | 98 | test.c.YH_ID | 531 | Using index || 1 | SIMPLE | a | eq_ref | PRIMARY,INDEX_JG | PRIMARY | 98 | test.b.JG_ID | 1 | Using where |+—-+————-+——-+——–+——————-+———–+———+————–+——+————-+
Execution time:
1 row in set (0.00 sec)
We can see that the rows in the execution plan has been greatly reduced, and the execution time has also been reduced from ms to 0 ms;