Nesting Queries Poor optimization
As I mentioned above, a table query is more efficient than a nested query, regardless of the particular situation. Although the two queries express the same meaning, although your plan is to tell the server what to do and then let it decide what to do, sometimes you have to tell it to change. Otherwise the optimizer might do something stupid. I've been in this situation lately. These tables are three-tiered relationships: category, subcategory, and item. There are thousands of records in the category table, hundreds of records in the subcategory table, and millions of entries in the Item table. You can ignore the category table, I just explain the background, the following query statement does not relate to it. This is the statement that created the table:
[SQL]
Copy Code code as follows:
CREATE TABLE subcategory (
ID int NOT NULL PRIMARY key,
category int NOT NULL,
Index (category)
) Engine=innodb;
CREATE TABLE Item (
ID int NOT NULL auto_increment primary key,
subcategory int NOT NULL,
Index (subcategory)
) Engine=innodb;
And I filled in some sample data into the table.
[SQL]
Copy Code code as follows:
INSERT into subcategory (ID, category)
Select I, i/100 from
where I <= 300000;
Insert into item (subcategory)
Select ID
From (
Select ID, rand () * as num_rows from subcategory
) as X
Cross Join number
where I <= num_rows;
Create temporary table T as
Select subcategory from Item
Group BY subcategory
Having count (*) = 19
Limit 100;
Insert into item (subcategory)
Select subcategory
From T
Cross Join number
where I < 2000;
Again, these statements take a little time to run and are not suitable to run in a production environment. The idea is to insert random rows of data into the item so that subcategory has 1 to 2018 item. This is not the actual full data, but the effect is the same.
I want to find out all the subcategory in a category with the item number greater than 2000. First, I find a subcategory item number greater than 2000, and then use its category in the next query. This is the specific query statement:
[SQL]
Copy Code code as follows:
Select C.id
From subcategory as C
Inner JOIN item as I on i.subcategory = C.id
GROUP BY C.id
Having count (*) > 2000;
--Choose one of the results, then
SELECT * FROM subcategory where id =????
--Result:category = 14
I got an appropriate value of 14, which is used in the following query. This is the statement used to query subcategory for all of the item numbers in category 14 greater than 2000:
[SQL]
Copy Code code as follows:
Select C.id
From subcategory as C
Inner JOIN item as I on i.subcategory = C.id
where c.category = 14
GROUP BY C.id
Having count (*) > 2000;
In my sample data, the result of the query is 10 rows, and it's done in more than 10 seconds. Explain shows good use of the index; from the size of the data, it's pretty good. The query plan is to traverse and compute the target record on the index. So far, very good.
This assumes that I want to remove all the fields from the subcategory. I can think of the above query as nested, and then use the join, or select MAX (since the grouping set corresponds to the value is unique), but also written in the same as the following, there are wood?
[SQL]
Copy Code code as follows:
SELECT * FROM subcategory
where ID in (
Select C.id
From subcategory as C
Inner JOIN item as I on i.subcategory = C.id
where c.category = 14
GROUP BY C.id
Having count (*) > 2000
);
After running this query, it is estimated that the sun will sink into the Earth from dawn to sunset. I don't know how long it will run, because I'm not going to let it run indefinitely. You might think that, in terms of the statement, it would: A to compute the query inside, find the 10 values, B, and continue to find the 10 records, and find it very fast in the primary index. Wrong, this is actually the query plan:
[SQL]
Copy Code code as follows:
1. Row ***************************
Id:1
Select_type:primary
Table:subcategory
Type:all
Possible_keys:null
Key:null
Key_len:null
Ref:null
rows:300783
Extra:using where
2. Row ***************************
Id:2
Select_type:dependent subquery
Table:c
Type:ref
Possible_keys:primary,category
Key:category
Key_len:4
Ref:const
rows:100
Extra:using where; Using index; Using temporary; Using Filesort
3. Row ***************************
Id:2
Select_type:dependent subquery
Table:i
Type:ref
Possible_keys:subcategory
Key:subcategory
Key_len:4
Ref:c.id
Rows:28
Extra:using Index
How do you not know how to analyze the MySQL statement query plan, please see the general meaning: MySQL plan to execute the query from the outside, not from the inside out. I'll introduce each part of the query one at a A.
The query outside simply becomes the select * from subcategory. Although the query inside has a constraint on subcategory (WHERE category = 14), for some reason MySQL does not apply it to the outside query. I didn't know it was God's horse. All I know is that it scans the entire table (that's what type:all means) and doesn't use any indexes. This is a scan on a table that is recorded on 10 tens of thousands of rows.
In the outside query, each row is executed once inside the query, although no value is used inside the query, because the query inside is "optimized" to refer to the outside query. According to this analysis, the query plan becomes a nested loop. Every time the query in the outside loop, the query is executed once inside. The following is a query plan after the optimizer overrides:
[SQL]
Copy Code code as follows:
SELECT * from subcategory as S
Where <in_optimizer> (
S.id,<exists> (
Select C.id
From subcategory as C
Join Item as I
where ((i.subcategory = c.id) and (c.category = 14))
GROUP BY C.id
Having ((count (0) > 2000)
and (<cache> (s.id) = <ref_null_helper> (c.id)))
)
You can get the optimized query by taking the show warnings behind the explain EXTENDED. Note the external domain that is pointed to in the HAVING clause.
I do not intend to criticize MySQL's optimization strategy for this example. It is well known that MySQL has not been able to optimize nested queries in some cases, a problem that has been widely reported. I want to point out that it is necessary for the developer to check the query statements to make sure they are not badly tuned. In most cases, it is not necessary to be safe to avoid nesting-especially where ... In () and WHERE ... Not in statement.
My own principle is "have a doubt, explain see". If the face is a large data table, I will naturally have questions.
How to force queries inside to execute first
The statement in the previous section crashes only because MySQL executes it as a related statement, not as an unrelated statement from inside to outside. It's also a way for MySQL to execute the query first, as a temporary table to avoid huge performance overhead.
MySQL implements nested queries from temporary tables (to some extent misinformation derived tables). This means that MySQL executes the query first, stores the results in a temporary table, and uses it in other tables. This is how I expected to write this query. The query statement is modified as follows:
[SQL]
Copy Code code as follows:
SELECT * FROM subcategory
where ID in (
Select ID from (
Select C.id
From subcategory as C
Inner JOIN item as I on i.subcategory = C.id
where c.category = 14
GROUP BY C.id
Having count (*) > 2000
) as X
);
All I do is embed the original nested query. MySQL will think that the innermost is a separate nested query to execute first, and now only wrap the outside nesting, it has been loaded into a temporary table, only a small number of records, so much faster. In this analysis, this is a rather stupid optimization method, rather than rewrite it as a join method. Again, lest someone else see it, clean it out as a redundant code.
There are situations where you can use this optimization method, such as MySQL throwing an error, and the table of nested queries is modified elsewhere (another article, MySQL Select, updates the same table at the same time). Unfortunately, this approach is helpless for temporary tables that can only be used once in a query.
SOURCE http://blog.csdn.net/afeiqiang/article/details/8620038