SQL statement optimization: optimization of JOIN, LEFT JOIN and RIGHT JOIN statements; sqljoin
In database applications, we often need to perform multi-table queries on the database. However, when the data volume is large, multi-table queries have a huge impact on the execution efficiency, therefore, pay special attention to the use of JOIN, left join, and right join statements;
Principles of SQL statement join:
The join Operation in the database is actually an association between a table and another table. Many errors are interpreted as: first put the two tables into a dikar product and then throw them to the memory, the where and having conditions are used to filter data slowly. In fact, the database is not so stupid, which occupies a large amount of memory and is inefficient. For example, we only need some rows in one table and some rows in another table. If the full table is used as the dikar product, this overhead is too large. The real practice is, based on the conditions in each table, traverse a table, traverse other tables, find the conditions that meet the final conditions, and send them to the client until all the final data is checked, it is called nested loop query.
1. left join and right join Optimization
In MySQL, the implementation of a left join B join_condition is as follows:
1. Table B depends on the tables on which table A and all Table A depends.
2. Table A depends on all tables except the left join table (B)
3. join_condition determines how to read Table B. The where condition is useless for B.
4. The standard where clause is jointly optimized with left join.
5. If A row in A meets the where and having conditions, if B does not, it will be filled with null.
Right join is similar to left join. This position can be exchanged.
The conversion principle between left join and normal JOIN is the where condition. If the generated null row returns false, it can be directly converted to normal join.
For example:
SELECT * FROM t1 LEFT JOIN t2 ON (column1) WHERE t2.column2=5;
Will be converted:
SELECT * FROM t1, t2 WHERE t2.column2=5 AND t1.column1=t2.column1;
Note: because the condition t2.column2 = 5 is set, all generated rows whose t2 is null are invalid.
This optimization will be very fast, because it is equivalent to converting external connections into equivalent connections, with less scanning and judgment for many rows.
Nested Loop JOIN algorithm ---- Nested-Loop Join
A simple nested loop algorithm is to find a row from a table based on the table's conditions, then find the data in the next table, and then return to the first table to find the rows that meet the conditions.
For example, there are three tables t1, t2, and t3. Their join types are:
Table Join Typet1 ranget2 reft3 ALL
The final pseudo code is
for each row in t1 matching range {
for each row in t2 matching reference key {
for each row in t3 {
if row satisfies join conditions,
send to client } } }
That is, table t1 is scanned by range, table t2 is associated with table t1, and table t3 is the full table scan.
Note: first, locate a row based on the condition range of table t1 and match it with table t2. Then, find the rows that meet the condition of table t3.
Block Nested Loop JOIN algorithm ---- Block Nested-Loop Join
The application of this algorithm is as follows: since the previous nested algorithm reads a row of a table, it reads The following table. As a result, the internal table is read many times, the database uses the join buffer to store intermediate results. when reading the internal table, it finds a row and compares it with the data in the cache to improve efficiency. For example, when I read 10 rows from the External table and then read the internal table, I compared the data with these 10 rows.
The conditions for MySQL to use join buffer are:
1. The system variable join_buffer_size determines the buffer size used by each join.
2. When the join type is index or all, the join buffer can be used.
3. A join buffer is allocated for each join, that is, an SQL statement may use multiple join buffers.
4. The join buffer is not allocated to the first subtable.
5. Only columns to be referenced will be placed in the join buffer, not the whole row.
The generated pseudo code is:
For each row in t1 matching range {
for each row in t2 matching reference key {
store used columns from t1, t2 in join buffer stores columns used by t1 and t2 in the join buffer if buffer is full {
for each row in t3 {for each t1, t2 combination in join buffer {
if row satisfies join conditions, send to client} empty buffer }}
if buffer is not empty {
for each row in t3 {for each t1, t2 combination in join buffer {
if row satisfies join conditions, send to client }}}
Note: The data is stored in the join buffer in the second loop, which exactly confirms the above 4th points.