In the MySQL in statement is the appearance and the inner table as a hash connection, and the EXISTS statement is the external loop loop, each loop loop and then query the internal table. exists and in performance can not be generalize, according to the size of the table, index to determine.
A
If the two table size of the query is equal, then the in and exists are not very different.
If one of the two tables is smaller and one is a large table, then the subquery table is large with exists, and the subquery table is small in:
Example: Table A (small table), table B (large table)
1:
SELECT * from A where CC in (select CC from B) is inefficient and uses the index of the CC column on table A.
The above query will first identify all cc from B, because B is a large table, so the select CC from B will produce many CC (and will be full table scan B), and then these CC values in order one from a table out the corresponding records, so only use the index of a table, The main overhead is on full table scan B.
SELECT * from A where exists (select cc from B where cc=a.cc) is efficient and uses the index of the CC column on table B.
The above query will first scan a full table, get all the records of a, and then use the CC value in each record as the query criteria in the B table query, because of the index, so the query quickly, avoid the large table full table scan, only scanned the small table A, so the efficiency is higher.
2:
SELECT * from B where cc in (select CC from A) is highly efficient and uses the index of the CC column on table B;
Similarly, the above query only scans the full table for a, and large table queries are indexed.
SELECT * from B where exists (select cc from A where cc=b.cc) is inefficient and uses the index of the CC column on table A.
When both A and B tables do not have an index on the CC column, both in and exists require a full table scan of table A and table B, with little difference in query speed.
Not-in and not-exists if the query statement uses not-in to perform a full-table scan of the outer surface, the index is not used, and the index on the table is still used by not Extsts's subquery. so no matter how big the table is, use Not exists is better than not in be quick .
Two Optimize GROUP BY and order by
Grouping and sorting is a CPU-intensive operation in MySQL, and when we use the Explain tool to view SQL, we often see that the extra column will appear using Temporary,using filesort (staging table and file sort). Because the resulting data is not already grouped and sorted, additional space is needed to sort the group, and we can optimize it by creating an appropriate index, because the index is already sorted. To illustrate:
Suppose there is a table A, there are several data: The field is a (primary key), B,c,d.
A B c D
1 1 2 3
2 2 3 1
3 1 2 2
4 1 2 5
5 1 2 4
We execute SELECT * from A where b<2 order by D Desc
Because the B field is not indexed, a full table scan is required, then the data is obtained after the scan
1 1 2 3
3 1 2 2
4 1 2 5
5 1 2 4
Now that the D field is out of order, MySQL needs to temporarily open a memory space (temporary table) to save the scanned results and then sort the D field (file sort).
If we add an index to the D field, we can scan the whole table in the order of the index, and the result of the scan is the result of sorting.
The index of the D field is this:
5 4
4 5
3 1
2 3
1 2
The previous number of each row above is the value of the D field, followed by the value of the primary key A of the row where the D field is located. Scans the entire table in the order of the D field index, first scans the a=4 rows, discovers the b<2 of the row, satisfies the condition, and then retains
4 1 2 5
Next scan a=5 line, find this line of b<2, satisfy the condition, then keep
4 1 2 5
5 1 2 4
。。。。。。。。。
Next scan a=2 line, find this line of b=2, not satisfied, then discard
4 1 2 5
5 1 2 4
1 1 2 3
3 1 2 2
Records that have been scanned are records that match the criteria and do not require temporary tables and sorting. For Gropup, you can also use this method to optimize, because the fields are sorted and can be grouped directly in the obtained records.
Three
Several principles:
- Small table (record within tens of thousands of) query, regardless of index, the query performance is not small.
- and large tables do join, the associated fields must be indexed
- Join operations generally use small tables to drive large tables
- Try not to use Like,regex,or
- Try not to calculate on the column of the query, such as where length (detail) >15 detail field is longer than 15, this condition will be full table scan, even if detail has an index.
- Try not to index large segments, such as detail varchar (256), which consumes CPU, consumes space, and has low query efficiency. If the business allows, you can build a prefix index.
MySQL statement optimization