I used an example of MySQLleftjoin optimization to deepen my understanding of the query plan. Today, I encountered a leftjoin optimization problem. After an afternoon, I checked a lot of information, I have a better understanding of the MySQL query plan and query optimization, and made a simple record: selectc. * from?_info_originalcleftjoin=_inf
I used an example of MySQL left join optimization to deepen my understanding of the query plan. Today I encountered a problem with left join optimization. After one afternoon, I checked a lot of information, I have a better understanding of the MySQL query plan and query optimization, and make a simple record: select c. * from pai_info_original c left join pai_inf
Deepen the understanding of the query plan from an example of MySQL left join Optimization
Today, I encountered a left join optimization problem. After an afternoon, I checked a lot of information and got a better understanding of MySQL's query plan and query optimization. I made a simple record:
Select c. * from hotel_info_original c
Left join pai_info_collection h
On c. cmd_type = h. cmd_type and c. cmd_id = h. cmd_id
Where h. Rule _id is null
This SQL statement is used to query records that are not found in Table h in Table c. Therefore, it comes to the concept of using left join (returning all records on the left, if the right table does not meet the matching condition, the corresponding row of the record returns null) to meet the requirement. However, this query is very slow. First, let's look at the query plan:
Rows indicates the number of rows to be scanned in the previous step. The number of rows to be scanned in this SQL statement is 35773*8134, which is a very large number. The numbers of records in the c and h tables are 40000 + and 10000 +, respectively. This is almost the overhead of the Cartesian product of the two tables (select * from c, h ).
So I checked the principle of implementing join in MySQL on the Internet. MySQL used an algorithm called nested loop join internally. The Nested Loop Join actually uses the result set of the driving table as the basic data of the Loop, and then uses the data in the result set one by one as the filter condition to query the data in the next table, and then merges the results. If the third Join clause is used, the Join result set of the first two tables is used as the basic cyclic data, and the data in the third table is queried again through the cyclic query condition, basically, MySQL uses the easiest-to-understand algorithm to implement join. Therefore, the selection of the driver table is very important. The small data size of the driver table can significantly reduce the number of scanned rows.
In general, why is join more efficient than left join? Many people say that they do not understand the reason, but the cloud is everywhere. I realized a little this afternoon. Generally, the two tables involved in the joint query will have a small limit. For join, MySQL selects a small table as the driving table without other filtering conditions, however, left join is generally used as a large table to join small tables. The feature of left join determines that MySQL uses a large table as the driving table, which means the efficiency is much worse, if I change the preceding SQL statement
Select c. * from hotel_info_original c
Join into _info_collection h
On c. cmd_type = h. cmd_type and c. cmd_id = h. cmd_id
The query plan is as follows:
Obviously, MySQL selects a small table as the driving table, and then uses indexes (partition _id, partition _type) to instantly reduce the number of orders of magnitude .....
In addition, I understand a general rule about left join today, that is, if the where condition contains the non-null condition of the right table (except that it is null ), the left join statement is equivalent to the join statement and can be directly rewritten to the join statement.
Postscript:
As you view MySQL reference manual, you can learn more about this problem. MySQL divides the join operation into system/const/eq_ref/ref/range/index/ALl when executing the join operation. The connection efficiency is from the beginning to the end.
For my first SQL, the connection type is index, which is almost the result of full table scan. However, I am surprised that I declared the unique key in the column (partition _id, partition _type). According to the official documentation, the connection type should be eq_ref,
This problem has been plaguing me for two days. I have not found any articles on google and stackoverflow that can explain this problem. Why can't I solve this problem? With the determination to solve this problem, I read the MySQL official documentation again today.
In the query optimization section, we can see that MySQL can more efficiently use indexes on columns declared to have the same type and size. I felt that I had found the problem, so I changed the encoding and collation (character _ type, character _id) of the original and collection tables to the unified utf8_general_ci, run the query plan of the first SQL statement again and get the following results:
The connection type has been optimized from index to ref. If the hotel_type is declared as not null, it can be optimized to eq_ref, but the effect is not significant. After optimization, this SQL statement can be run within 0.01ms.
How to optimize left join:
1. Try to filter rows in the condition to make the driving table smaller. Use a small table to drive the big table.
2. An index (primary key, unique index, and prefix index) must be added to the condition column of the right table. It is best to set the type to range or above (ref, eq_ref, const, system)
3. Ignore the preceding two points. Do not use left join ~~!