About MySQL subquery and its Optimization _ MySQL

Source: Internet
Author: User
When talking about MySQL subqueries and their optimization, DBAs or developers who have used oracle or other relational databases have such experience. in subqueries, they all think that the database has been optimized, it is good to choose to drive table execution and port this experience to the mysql database. However, unfortunately, mysql may make you disappointed in the processing of subqueries, we have encountered some cases in our production system, such:

SELECT i_id, sum(i_sell) AS i_sellFROM table_dataWHERE i_id IN(SELECT i_id FROM table_data WHERE Gmt_create >= '2011-10-07 00:00:00')GROUP BY i_id;

(Note: The business logic of SQL can be used as an example: first, we can query the sales volume of the 100 new books sold on the 10-07, and then query the sales volume of the 100 new books sold throughout the year ).

The performance problem of this SQL statement lies in the weakness of the mysql Optimizer in processing subqueries.

The mysql Optimizer will rewrite the subquery when processing the subquery. In general, we want to complete the subquery results from the inside out, and then use the subquery to drive the external query tables to complete the query; however, mysql will first scan all the data in the external table, and each data entry will be uploaded to the subquery for association with the subquery. if the external table is large, performance issues will occur;

For the above query, because the table table_data contains million data, and there are a large number of duplicate data records in the subquery, this requires nearly connections, due to a large number of associations, this SQL statement has not been executed for several hours, so we need to rewrite the SQL statement:

SELECT t2.i_id, SUM(t2.i_sell) AS soldFROM(SELECT DISTINCT i_id FROM table_data WHERE gmt_create >= '2011-10-07 00:00:00') t1,table_data t2WHERE t1.i_id = t2.i_idGROUP BY t2.i_id;

We change the subquery to join, and add distinct to the subquery to reduce the number of times t1 is associated with t2;

After the transformation, the SQL execution time is reduced to less than Ms.

The optimization of mysql subqueries has not been very friendly and has been criticized by the industry. it is also one of the most frequently encountered problems in SQL optimization. when mysql is processing subqueries, it will rewrite the subquery. generally, we want to complete the subquery results from the inner to the outer, that is, to use the subquery to drive the external query table, the query is completed, but on the contrary, subqueries are not executed first. Today, we hope to introduce some practical cases to deepen our understanding of mysql subqueries. The following describes a complete case study and its analysis and optimization processes and ideas.

1. case:

User feedback: the database response is slow, and many service updates are stuck. log on to the database and observe the SQL statements that have been executed for a long time;

| 10437 | usr0321t9m9 | 10.242.232.50: 51201 | oms | Execute | 1179 | SendingSql: SELECT tradedto0 _. * FROM a1 tradedto0_WHERE tradedto0 _. tradestatus = '1' AND (tradedto0 _. tradeoid IN (SELECT orderdto1 _. tradeoidFROM a2 orderdto1_WHERE orderdto1 _. proname LIKE '% ?? % 'OR orderdto1 _. procode LIKE' % ?? % ') AND tradedto0 _. undefine4 = '1' AND tradedto0 _. invoicetype = '1' AND tradedto0 _. tradestep = '0' AND (tradedto0 _. orderCompany LIKE '200') order by tradedto0 _. tradesign ASC, tradedto0 _. makertime desc limit 15;
2. symptom: updates to other tables are blocked.
UPDATE a1SET tradesign='DAB67634-795C-4EAC-B4A0-78F0D531D62F',markColor=' #CD5555',memotime='2012-09- 22', markPerson='??'WHERE tradeoid IN ('gy2012092204495100032') ;

In order to restore the application as soon as possible, after the SQL statements executed for a long time are killed, the application returns to normal;

3. analyze the execution plan:
db@3306 :explainSELECT tradedto0_.*FROM a1 tradedto0_WHERE tradedto0_.tradestatus='1'AND (tradedto0_.tradeoid IN(SELECT orderdto1_.tradeoid FROM a2 orderdto1_ WHERE orderdto1_.proname LIKE '%??%' OR orderdto1_.procode LIKE '%??%'))AND tradedto0_.undefine4='1'AND tradedto0_.invoicetype='1'AND tradedto0_.tradestep='0'AND (tradedto0_.orderCompany LIKE '0002%')ORDER BY tradedto0_.tradesign ASC,tradedto0_.makertime DESC LIMIT 15;+----+--------------------+------------+------+---------------+------+---------+------+-------+-----| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+--------------------+------------+------+---------------+------+---------+------+-------+-----| 1 | PRIMARY | tradedto0_ | ALL | NULL | NULL | NULL | NULL | 27454 | Using where; Using filesort || 2 | DEPENDENT SUBQUERY | orderdto1_ | ALL | NULL | NULL | NULL | NULL | 40998 | Using where |+----+--------------------+------------+------+---------------+------+---------+------+-------+-----

From the execution plan, we start to optimize it step by step:

First, let's take a look at the second row of the execution plan, that is, the part of the subquery. orderdto1 _ scans the entire table. let's see if we can add an appropriate index:

A. overwrite indexes:

db@3306:alter table a2 add index ind_a2(proname,procode,tradeoid);ERROR 1071 (42000): Specified key was too long; max key length is 1000 bytes

The maximum key length limit for adding a composite index is exceeded:

B. view the field definitions of the table:
db@3306 :DESCa2 ;+---------------------+---------------+------+-----+---------+-------+| FIELD | TYPE| NULL | KEY | DEFAULT | Extra |+---------------------+---------------+------+-----+---------+-------+| OID | VARCHAR(50) | NO | PRI | NULL| || TRADEOID| VARCHAR(50) | YES| | NULL| || PROCODE | VARCHAR(50) | YES| | NULL| || PRONAME | VARCHAR(1000) | YES| | NULL| || SPCTNCODE | VARCHAR(200)| YES| | NULL| |
C. view the average length of table fields:
db@3306 :SELECT MAX(LENGTH(PRONAME)),avg(LENGTH(PRONAME)) FROM a2;+----------------------+----------------------+| MAX(LENGTH(PRONAME)) | avg(LENGTH(PRONAME)) |+----------------------+----------------------+|95| 24.5588 |
D. narrow down the field length
ALTER TABLE MODIFY COLUMN PRONAME VARCHAR(156);

Then, analyze the execution plan:

db@3306 :explainSELECT tradedto0_.*FROM a1 tradedto0_WHERE tradedto0_.tradestatus='1'AND (tradedto0_.tradeoid IN(SELECT orderdto1_.tradeoid FROM a2 orderdto1_ WHERE orderdto1_.proname LIKE '%??%' OR orderdto1_.procode LIKE '%??%'))AND tradedto0_.undefine4='1'AND tradedto0_.invoicetype='1'AND tradedto0_.tradestep='0'AND (tradedto0_.orderCompany LIKE '0002%')ORDER BY tradedto0_.tradesign ASC,tradedto0_.makertime DESC LIMIT 15;+----+--------------------+------------+-------+-----------------+----------------------+---------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+--------------------+------------+-------+-----------------+----------------------+---------+| 1 | PRIMARY | tradedto0_ | ref | ind_tradestatus | ind_tradestatus | 345 | const,const,const,const | 8962 | Using where; Using filesort || 2 | DEPENDENT SUBQUERY | orderdto1_ | index | NULL | ind_a2 | 777 | NULL | 41005 | Using where; Using index |+----+--------------------+------------+-------+-----------------+----------------------+---------+

The performance is still not good. The key is that the number of rows scanned in the two tables is not reduced (8962*41005). The index added above does not have much effect. now, view the execution result of the t table:

db@3306 :SELECT orderdto1_.tradeoidFROM t orderdto1_WHERE orderdto1_.proname LIKE '%??%'OR orderdto1_.procode LIKE '%??%'; EmptySET (0.05 sec)

The result set is empty. Therefore, the result set of table t must be used as the driving table;

4. rewrite the subquery:

Through the test and verification above, the performance of common mysql subqueries is poor, which is a natural weakness of mysql subqueries. you need to rewrite the SQL statement as the join statement:

SELECT tradedto0_.*FROM a1 tradedto0_ ,(SELECT orderdto1_.tradeoid FROM a2 orderdto1_ WHERE orderdto1_.proname LIKE '%??%' OR orderdto1_.procode LIKE '%??%')t2WHERE tradedto0_.tradestatus='1'AND (tradedto0_.tradeoid=t2.tradeoid)AND tradedto0_.undefine4='1'AND tradedto0_.invoicetype='1'AND tradedto0_.tradestep='0'AND (tradedto0_.orderCompany LIKE '0002%')ORDER BY tradedto0_.tradesign ASC, tradedto0_.makertime DESC LIMIT 15;
5. view the execution plan:
db@3306 :explainSELECT tradedto0_.*FROM a1 tradedto0_ ,(SELECT orderdto1_.tradeoid FROM a2 orderdto1_ WHERE orderdto1_.proname LIKE '%??%' OR orderdto1_.procode LIKE '%??%')t2WHERE tradedto0_.tradestatus='1'AND (tradedto0_.tradeoid=t2.tradeoid)AND tradedto0_.undefine4='1'AND tradedto0_.invoicetype='1'AND tradedto0_.tradestep='0'AND (tradedto0_.orderCompany LIKE '0002%')ORDER BY tradedto0_.tradesign ASC,tradedto0_.makertime DESC LIMIT 15;+----+-------------+------------+-------+---------------+----------------------+---------+------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+------------+-------+---------------+----------------------+---------+------+| 1 | PRIMARY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Impossible WHERE noticed after reading const tables || 2 | DERIVED | orderdto1_ | index | NULL | ind_a2 | 777 | NULL | 41005 | Using where; Using index |+----+-------------+------------+-------+---------------+----------------------+---------+------+
6. Execution time:
db@3306 :SELECT tradedto0_.*FROM a1 tradedto0_ ,(SELECT orderdto1_.tradeoid FROM a2 orderdto1_ WHERE orderdto1_.proname LIKE '%??%' OR orderdto1_.procode LIKE '%??%')t2WHERE tradedto0_.tradestatus='1'AND (tradedto0_.tradeoid=t2.tradeoid)AND tradedto0_.undefine4='1'AND tradedto0_.invoicetype='1'AND tradedto0_.tradestep='0'AND (tradedto0_.orderCompany LIKE '0002%')ORDER BY tradedto0_.tradesign ASC, tradedto0_.makertime DESC LIMIT 15; EmptySET (0.03 sec)

Reduced to milliseconds;

7. conclusion:

1. mysql subqueries have obvious weaknesses in the execution plan and must be rewritten.

Refer:

A. mysql subquery encountered in the production database: http://hidba.org /? P = 412

B. built-in builtin InnoDB, subquery blocking update: http://hidba.org /? P = 456

2. in table structure design, do not use the large varchar (N) fields at will, leading to the inability to use indexes.

Refer:

A. JDBC memory management-varchar2 (4000) impact: http://hidba.org /? P = 31

B. restrictions on large fields in innodb: http://hidba.org /? P = 144

C. innodb using the large field text, blob optimization recommendations: http://hidba.org /? P = 551

8. Refer:

[1] mysql subquery http://hidba.org /? P = 412

[2] talking about mysql subquery http://hidba.org /? P = 624

[3] weakness http://hidba.org for mysql subqueries /? P = 260

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.