Mysql nested query and Table query optimization methods

Source: Internet
Author: User

Poor Optimization of nested queries
As mentioned above, join table queries are more effective than nested queries without special considerations. Although the two queries share the same meaning, even though your plan is to tell the server what to do and then let it decide how to do it, sometimes you have to tell it how to change it. Otherwise, the Optimizer may be stupid. I have encountered such a situation recently. These tables have three levels of hierarchical relationships: category, subcategory, and item. There are thousands of records in the category table, hundreds of records in the subcategory table, and millions of records in the item table. You can ignore the category table. I just want to explain the background and the following query statements do not involve it. Here is the statement for creating a table:

[SQL]
Copy codeThe Code is as follows:
Create table subcategory (
Id int not null primary key,
Category int not null,
Index (category)
) Engine = InnoDB;

Create table item (
Id int not null auto_increment primary key,
Subcategory int not null,
Index (subcategory)
) Engine = InnoDB;
 
I added some sample data to the table.

[SQL]
Copy codeThe Code is as follows:
Insert into subcategory (id, category)
Select I, I/100 from number
Where I <= 300000;

Insert into item (subcategory)
Select id
From (
Select id, rand () * 20 as num_rows from subcategory
) As x
Cross join number
Where I <= num_rows;

Create temporary table t
Select subcategory from item
Group by subcategory
Having count (*) = 19
Limit 100;

Insert into item (subcategory)
Select subcategory
From t
Cross join number
Where I <2000;
 
Once again, it takes a little time to run these statements, which is not suitable for running in the product environment. The idea is to insert the data of the random number of rows into the item, so that subcategory has an item between 1 and 2018. This is not the actual complete data, but the effect is the same.

I want to find all subcategory with more than 2000 items in a category. First, I found a subcategory item with a number greater than 2000 and used its category in the following query. Here are the specific query statements:

[SQL]
Copy codeThe Code is as follows:
Select c. id
From subcategory as c
Inner join item as I on I. subcategory = c. id
Group by c. id
Having count (*)> 2000;

-- Choose one of the results, then
Select * from subcategory where id = ????
-- Result: category = 14
 
I got a suitable value of 14, which will be used in the following queries. This is the statement used to query subcategory where the number of all items in category 14 is greater than 2000:

[SQL]
Copy codeThe Code is as follows:
Select c. id
From subcategory as c
Inner join item as I on I. subcategory = c. id
Where c. category = 14
Group by c. id
Having count (*)> 2000;
 
In my sample data, there are 10 rows of records in the query results, and it takes more than 10 seconds to complete. The EXPLAIN statement shows that the index is used well. The data size is quite good. The query plan is to traverse the index and calculate the target record. So far, very good.

This assumes that I want to retrieve all fields from subcategory. I can use the preceding query as a nested query, and then use JOIN or SELECT MAX (since the value of the grouping set is unique), but it is also written as the same as the following?

[SQL]
Copy codeThe Code is as follows:
Select * from subcategory
Where id in (
Select c. id
From subcategory as c
Inner join item as I on I. subcategory = c. id
Where c. category = 14
Group by c. id
Having count (*)> 2000
);
 
It is estimated that the query will sink from dawn to sunset. I don't know how long it will take, because I have no plans to keep it running endlessly. You may think that, in a statement, it will: a) Calculate the query and find the 10 values. B) continue to find the 10 records, in addition, it will be very fast to search for the primary index. Error: this is the actual query plan:

[SQL]
Copy codeThe Code is as follows:
* *************************** 1. row ***************************
Id: 1
Select_type: PRIMARY
Table: subcategory
Type: ALL
Possible_keys: NULL
Key: NULL
Key_len: NULL
Ref: NULL
Rows: 300783
Extra: Using where
* *************************** 2. row ***************************
Id: 2
Select_type: DEPENDENT SUBQUERY
Table: c
Type: ref
Possible_keys: PRIMARY, category
Key: category
Key_len: 4
Ref: const
Rows: 100
Extra: Using where; Using index; Using temporary; Using filesort
* *************************** 3. row ***************************
Id: 2
Select_type: DEPENDENT SUBQUERY
Table: I
Type: ref
Possible_keys: subcategory
Key: subcategory
Key_len: 4
Ref: c. id
Rows: 28
Extra: Using index
 
If you are not familiar with how to analyze mysql statement query plans, see: mysql plans to execute queries from the outside to the inside, rather than from the inside to the outside. I will introduce each part of the query one by one.

The external query is simply changed to SELECT * FROM subcategory. Although the query in it has a constraint on subcategory (WHERE category = 14), mysql does not apply it to external queries for some reason. I don't know why. I only know that it scans the entire table (this is what type: ALL means) and does not use any indexes. This is a scan of tens of thousands of rows of records on the table.

In an external query, each row is queried once. Although no value is used in the query, the query is "optimized" to reference an external query. According to this analysis, the query plan becomes a nested loop. Each cycle of an external query is executed. The following is the query plan after the optimizer overwrites:

[SQL]
Copy codeThe Code is as follows:
Select * from subcategory as s
Where <in_optimizer> (
S. id, <exists> (
Select c. id
From subcategory as c
Join item as I
Where (I. subcategory = c. id) and (c. category = 14 ))
Group by c. id
Having (count (0)> 2000)
And (<cache> (s. id) = <ref_null_helper> (c. id ))))
)
 
You can use show warnings after explain extended to obtain the optimized query. Pay attention to the external domain pointed to in the HAVING clause.

In this example, I did not intentionally attack mysql's optimization strategy. It is well known that mysql cannot optimize nested queries in some cases, and this issue has been widely reported. I would like to point out that it is necessary for developers to check query statements to ensure they are not poorly optimized. IN most cases, avoid using nested statements, especially WHERE... IN () and WHERE... not in, unless necessary.

My own principle is "if you have any questions, please EXPLAIN it ". If you are dealing with a big data table, I will naturally have questions.

How to force the query to be executed first
The statement board in the previous section is executed only because mysql treats it as a related statement from the external to the internal, rather than as an irrelevant statement from the internal to the external. There is also a way for mysql to execute the query first, which is implemented as a temporary table to avoid huge performance overhead.

Mysql uses a temporary table to implement nested queries (derived tables that are transferred by a volume to some extent ). This means that mysql executes the query and stores the result in a temporary table, and then uses it in other tables. This is the execution method I expect when writing this query. The query statement is modified as follows:
[SQL]
Copy codeThe Code is as follows:
Select * from subcategory
Where id in (
Select id from (
Select c. id
From subcategory as c
Inner join item as I on I. subcategory = c. id
Where c. category = 14
Group by c. id
Having count (*)> 2000
) As x
);
 
All I do is wrap the nested query in the original nested query. Mysql considers that an independent nested query is executed first, and now only the nested outside is left. It has been loaded into a temporary table with only a few records, so it is much faster. According to this analysis, this is a stupid optimization method; it is better to rewrite it into the join method. In addition, it should not be seen by others as redundant code.

In some cases, this optimization method can be used, for example, mysql throws an error, and the nested query table is modified elsewhere ). Unfortunately, this method is powerless if the temporary table can only be used once in the query statement.

Source http://blog.csdn.net/afeiqiang/article/details/8620038

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.