Unconsciously played two years of MySQL, found that a lot of people say that MySQL compared to Oracle, the optimizer does a poor job, in fact, in a way, indeed, But after all, MySQL only to 5.7 version, Oracle has developed to 12c, today I looked at the MySQL connection algorithm, well, now still do not support hash join, only nested-loop join, then today summed up my learning experience it.
Nested-loop Join basic algorithm implementation, pseudo code is this:
For each row in T1 matching range {For each
row in T2 matching reference key {for each
row in T3 {
if row SA Tisfies join conditions,
Send to Client
}
}
This code is very simple, although I do not write code, but I can understand. Here, suppose there are three tables, T1, T2, T3, this code, which shows the range, ref, and all in the explain plan, is shown in the SQL Execution plan layer, T3 will perform a full table scan, and I saw a very evil optimization SQL method in this place today, straight-join:http://hidba.ga/2014/09/26/join-query-in-mysql/, which mentions the concept of a driver table, then the driver table is the T3 table in Pseudocode, Bovenri said MySQL will automatically select the smallest result set as the driver table, as the algorithm analysis, so select the driver table is really the least cost. So it's also mentioned here that by narrowing the drive table result set for connection optimization, the resulting set of smaller drive tables can actually reduce the number of loops.
Of course, MySQL itself on the basis of this algorithm, the evolution of the block Nested-loop join algorithm, in fact, basically and the above algorithm is no different, pseudo code as follows:
For each row in T1 matching range {
for each row in T2 matching reference key {
store used columns to T1, T2 in Join buffer
If buffer is full {
for each row in T3 {
for each T1, T2 combination in join buffer {
if row s Atisfies join conditions,
Send to Client
}
}
empty buffer
}
}
if buffer isn't Empty {For each
row in T3 {for each
T1, T2 combination in join buffer {
If row satisfies join conditions,
Send to Client
}
}
In this algorithm, the data in the outer loop is cached in the join buffer, and the data in the table round buffer in the inner loop is compared to reduce the number of cycles, which can increase the efficiency. The official website has a example, I a bit does not understand: If has 10 rows to be cached in the buffer, these 10 lines are passed to the inner loop, all the inner loop's line will compare with this 10 rows in the buffer. The original text is like this:
For example, if ten rows are read into a buffer and the ' buffer is ' passed to the next inner loop, each row read in the inner Loop can is compared against all rows in the buffer
If s refers to T1, the size of the T2 combination in the cache, and C is the number of these combinations in buffer, the number of T3 tables scanned should be:
(S * C)/join_buffer_size + 1
According to this formula, the larger the join_buffer_size, the smaller the number of scans, if the join_buffer_size to be able to cache all the previous row combinations, then it is the best performance, then the increase will have no effect.
In the case of indexing, MySQL will try to use the index Nested-loop join algorithm, in some cases, may join the column is no index, then MySQL's choice is definitely not the first introduction of the simple Nested-loop join algorithm , because the algorithm is too rough to look straight. Complex SQL with larger data may not run out of results for years, if you don't believe it, it's too young too. Or inside can give you some SQL run to see.
The disadvantage of the simple nested-loop join algorithm is that it scans the inner table too many times, resulting in too large a scan record. The improvement of the block Nested-loop join algorithm compared to simple nested-loop join is that it can reduce the number of scans in the inner table, and even like the hash join algorithm, it only needs to scan the table once.