Optimize mysql nested queries and table queries

Source: Internet
Author: User

Optimization of mysql nested queries and nested query of table queries is poorly optimized as I mentioned above. Without special considerations, table queries are more effective than nested queries. Although the two queries share the same meaning, even though your plan is to tell the server what to do and then let it decide how to do it, sometimes you have to tell it how to change it. Otherwise, the Optimizer may be stupid. I have encountered such a situation recently. These tables have three levels of hierarchical relationships: category, subcategory, and item. There are thousands of records in the category table, hundreds of records in the subcategory table, and millions of records in the item table. You can ignore the category table. I just want to explain the background and the following query statements do not involve it. This is the statement for creating a table: www.2cto.com [SQL] create table subcategory (id int not null primary key, category int not null, index (category) engine = InnoDB; create table item (id int not null auto_increment primary key, subcategory int not null, index (subcategory) engine = InnoDB; I also enter some sample data in the table www.2cto.com [SQL] insert into subcategory (id, category) select I, I/100 from number where I <= 300000; insert into item (Subcategory) select id from (select id, rand () * 20 as num_rows from subcategory) as x cross join number where I <= num_rows; create temporary table t as select subcategory from item group by subcategory having count (*) = 19 limit 100; insert into item (subcategory) select subcategory from t cross join number where I <2000; once again, it takes a little time to run these statements, which is not suitable for running in the product environment. The idea is to insert the data of the random number of rows into the item, so that subcategory has an item between 1 and 2018. This is not the actual complete data, but the effect is the same. I want to find all subcategory with more than 2000 items in a category. First, I found a subcategory item with a number greater than 2000 and used its category in the following query. This is a specific query statement: [SQL] select c. id from subcategory as c inner join item as I on I. subcategory = c. id group by c. id having count (*)> 2000; -- choose one of the results, then select * from subcategory where id = ???? -- Result: category = 14 I got a suitable value of 14, which will be used in the following queries. This is the statement used to query subcategory with more than 2000 Items in category 14: [SQL] select c. id from subcategory as c inner join item as I on I. subcategory = c. id where c. category = 14 group by c. id having count (*)> 2000; in my sample data, the query results have 10 rows of records, and it takes more than 10 seconds to complete. The EXPLAIN statement shows that the index is used well. The data size is quite good. The query plan is to traverse the index and calculate the target record. So far, very good. This assumes that I want to retrieve all fields from subcategory. I can use the preceding query as a nested query, and then use JOIN or SELECT MAX (since the value of the grouping set is unique), but it is also written as the same as the following? [SQL] select * from subcategory where id in (select c. id from subcategory as c inner join item as I on I. subcategory = c. id where c. category = 14 group by c. id having count (*)> 2000); after this query is run, it is estimated that it will sink from dawn to sunset. I don't know how long it will take, because I have no plans to keep it running endlessly. You may think that, in a statement, it will: a) Calculate the query and find the 10 values. B) continue to find the 10 records, in addition, it will be very fast to search for the primary index. Error: this is the actual query plan: [SQL] *************************** 1. row ************************** id: 1 select_type: PRIMARY table: subcategory type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 300783 Extra: using where **************************** 2. row ************************* id: 2 select_type: dependent subquery table: c type: ref possible_keys: PRIMARY, category key: category key_len: 4 Ref: const rows: 100 Extra: Using where; Using index; Using temporary; using filesort **************************** 3. row ************************* id: 2 select_type: dependent subquery table: I type: ref possible_keys: subcategory key: subcategory key_len: 4 ref: c. id rows: 28 Extra: Using index: you are not familiar with how to analyze mysql statement query plans. Please refer to the following description: mysql plans to execute queries from outside to outside, rather than from inside to outside. I will introduce each part of the query one by one. The external query is simply changed to SELECT * FROM subcategory. Although the query in it has a constraint on subcategory (WHERE category = 14), mysql does not apply it to external queries for some reason. I don't know why. I only know that it scans the entire table (this is what type: ALL means) and does not use any indexes. This is a scan of tens of thousands of rows of records on the table. In an external query, each row is queried once. Although no value is used in the query, the query is "optimized" to reference an external query. According to this analysis, the query plan becomes a nested loop. Each cycle of an external query is executed. The following is the query plan after the optimizer is rewritten: [SQL] select * from subcategory as s where <in_optimizer> (s. id, <exists> (select c. id from subcategory as c join item as I where (I. subcategory = c. id) and (c. category = 14) group by c. id having (count (0)> 2000) and (<cache> (s. id) = <ref_null_helper> (c. () You can use show warnings after the explain extended command to obtain the optimized query. Pay attention to the external domain pointed to in the HAVING clause. In this example, I did not intentionally attack mysql's optimization strategy. It is well known that mysql cannot optimize nested queries in some cases, and this issue has been widely reported. I would like to point out that it is necessary for developers to check query statements to ensure they are not poorly optimized. IN most cases, avoid using nested statements, especially WHERE... IN () and WHERE... not in, unless necessary. My own principle is "if you have any questions, please EXPLAIN it ". If you are dealing with a big data table, I will naturally have questions. How to force the query in the above section to execute the statement Hitting Board first, because mysql treats it as a related statement to execute from the outside, rather than as an irrelevant statement to execute from the inside out. There is also a way for mysql to execute the query first, which is implemented as a temporary table to avoid huge performance overhead. Mysql uses a temporary table to implement nested queries (derived tables that are transferred by a volume to some extent ). This means that mysql executes the query and stores the result in a temporary table, and then uses it in other tables. This is the execution method I expect when writing this query. Modify the query statement as follows: www.2cto.com [SQL] select * from subcategory where id in (select id from (select c. id from subcategory as c inner join item as I on I. subcategory = c. id where c. category = 14 group by c. id having count (*)> 2000) as x); all I do is wrap the nested query. Mysql considers that an independent nested query is executed first, and now only the nested outside is left. It has been loaded into a temporary table with only a few records, so it is much faster. According to this analysis, this is a stupid optimization method; it is better to rewrite it into the join method. In addition, it should not be seen by others as redundant code. In some cases, this optimization method can be used, for example, mysql throws an error, and the nested query table is modified elsewhere ). Unfortunately, this method is powerless if the temporary table can only be used once in the query statement. Source http://blog.csdn.net/afeiqiang/article/details/8620038

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.