A case of MySQL sub-query optimization

Source: Internet
Author: User
Tags mysql query

written in the previous words:
    1. In slow-scan optimization 1 and 2 have repeatedly emphasized the importance of explain, but sometimes the naked eye can not see how the explain Results to guide optimization, this time also need some other basic knowledge of the support, and even need to understand the MySQL implementation principle, such as sub-query slow-scan optimization .
    2. When you see the "DEPENDENT subquery" in the Select_type field in the SQL execution plan, you must be mentally!

Why--mysql's subqueries are sometimes bad--

Intro: Why is such a subquery so slow?

The following example is a slow check, and the online execution time is quite exaggerated. Why is it?

SELECT Gid,count (ID) as COUNT

From shop_goods G1

WHERE status =0 and GID in (

SELECT gid from shop_goods G2 WHERE SID in (1519066,1466114,1466110,1466102,1466071,1453929)

)

GROUP by GID;

Its implementation plan is as follows, please watch the keyword "DEPENDENT subquery":

ID select_type table Type possible_keys key Key_len ref r oWS Extra
------  ------------------  ------  --------------  --------------------------------------  ------------  -------  ------ ------  -----------
1 PRIMARYG1Index (NULL) Idx_gid 5 (NULL)850672Using where
2DEPENDENT subqueryG2Index_subquery id_shop_goods,idx_sid,idx_gid Idx_gid 5 func 1 Using where

the basics: What Dependent subquery means

The official meaning is:

subquery: The first select in a subquery;

DEPENDENT subquery: The first select in a subquery depends on the outside query .

In other words, the subquery's query to G2 depends on the outer G1 query .

What do you mean? It means two steps:

The first step, MySQL, is based on the select Gid,count (ID) from the shop_goods where status=0 GROUP by GID; gets a large result set T1, the amount of data is the rows in the It's =850672.

In the second step, each record in the above large result set T1 will consist of a new query statement with the subquery SQL: Select GID from Shop_goods where Sid in (15...blabla. ) and gid=%t1.gid%. is equal to saying that the subquery will be executed 850,000 times ... Even though the two-step query uses an index, it's not too slow to blame.

As a result, the execution efficiency of a subquery is constrained by the number of records in the outer query, which is not as good as splitting into two separate query sequences .

optimization Strategy 1:

If you do not want to split into two independent queries, you can also query with temporary table tables as follows:

SELECT g1.gid,count (1)

From Shop_goods G1,(select gid from Shop_goods WHERE SID in (1519066,1466114,1466110,1466102,1466071,1453929)) G2

where g1.status=0 and g1.gid=g2.gid

GROUP by G1.gid;

The same result can be obtained, and it is in the millisecond level.

Its plan of implementation is:

ID select_type table Type possible_keys key Key_len ref rows Extra
------  -----------  --------------  ------  -------------------------  -------------  -------  -----------  ------  ----- --------------------------
1 PRIMARY<derived2>All (NULL) (NULL) (NULL)-Using temporary; Using Filesort
1 PRIMARY G1 ref idx_gid Idx_gid 5 G2.gid 1 Using where
2 DERIVED Shop_goods Range Id_shop_goods,idx_sid id_shop_goods 5 (NULL) the Using where; Using Index

The official meaning of DERIVED is:

DERIVED: Used in cases where subqueries are found in the FROM clause. MySQL executes these subqueries recursively, putting the results in a temporary table.

dba Point of Reference: the weaknesses of MySQL subqueries

Hidba the Tao ( reference Resource 3):

When MySQL is processing a subquery, it overwrites the subquery.

Normally, we want to finish the query by the inner-to-outer, and then use the subquery to drive the outer query table.

For example:

SELECT * FROM Test where Tid in (select Fk_tid from Sub_test where gid=10)

Usually we feel that the order in which SQL is executed is:

the Fk_tid (2,3,4,5,6) record is obtained from the Sub_test table according to the GID,

and then to test, bring into the tid=2,3,4,5,6, get query data.

But the actual MySQL is handled in the following ways:

SELECT * FROM test where exists (

SELECT * from Sub_test where gid=10 and Sub_test.fk_tid=test.tid

)

MySQL will scan all the data in test, each data will be sent to the subquery associated with the sub_test, the subquery will not be executed first, so if the test table is large, then performance will be problematic.

"High-performance MySQL," a book of ideas cited

Section 4.4 of the high-performance MySQL "MySQL query optimizer limitations (limitations of the MySQL Optimizer)", section 4.4.1, "Associating subqueries (correlated subqueries)" There are similar discourses:

MySQL sometimes optimizes sub-queries very poorly, especially in the WHERE clause in the () subquery. ......

For example, in the Sakila database Sakila.film table to find all film, these film actoress including Penelope Guiness (actor_id = 1). It can be written like this:

Mysql> SELECT * from Sakila.film

, WHERE film_id in (

SELECT film_id from sakila.film_actor WHERE actor_id = 1);

Mysql> EXPLAIN SELECT * from sakila.film ...;

+----+--------------------+------------+--------+------------------------+

| ID | Select_type | Table | Type | Possible_keys |

+----+--------------------+------------+--------+------------------------+

| 1 | PRIMARY | Film | All | NULL |

| 2 | DEPENDENT subquery | Film_actor | Eq_ref | primary,idx_fk_film_id |

+----+--------------------+------------+--------+------------------------+

depending on the output of the explain, MySQL will scan the full table for film tables and perform subqueries on each row found, which is a bad performance . Fortunately, it's easy to rewrite it as a join query:

Mysql> SELECT film.* from Sakila.film

-INNER JOIN sakila.film_actor USING (film_id)

--WHERE actor_id = 1;

Another method is to manually produce the in () list by using Group_concat () to execute a subquery as a separate query. Sometimes it's faster than join . (Note: You may wish to try on our library SELECT Goods_id,group_concat (CAST (id as char))

From Bee_shop_goods

WHERE shop_id in (1519066,1466114,1466110,1466102,1466071,1453929)

GROUP by goods_id;)

MySQL has been criticized for this particular type of subquery execution plan.

when subqueries are good

MySQL is not always optimized for queries that are poorly tuned. And sometimes it's optimized. Here's an example:

Mysql> EXPLAIN SELECT film_id, language_id from Sakila.film

-WHERE Not EXISTS (

SELECT * from Sakila.film_actor

--WHERE film_actor.film_id = film.film_id

) G

...... (Note: Please read the "high-performance MySQL" for specific text.)

Yes, subqueries are not always optimized badly, specific problems specific analysis, but don't forget explain.

A case of MySQL sub-query optimization

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.