MySQL Nested-Loop Join algorithm learning, nested-loopjoin
After playing MySQL for more than two years, I found that many people say that MySQL is inferior to Oracle in terms of optimizer. In fact, to some extent, it is true, however, after all, MySQL is only available in version 5.7, and Oracle has been developed to 12c. Today I have looked at the MySQL connection algorithm. Well, Hash Join is not supported yet, only Nested-Loop Join is available. Let's summarize my learning experience today.
The basic Algorithm Implementation of Nested-Loop Join is as follows:
for each row in t1 matching range { for each row in t2 matching reference key { for each row in t3 { if row satisfies join conditions, send to client } }}
This code is very simple. Although I do not write much code, I still understand it. Here we assume there are three tables, t1, t2, t3, and this Code respectively show the range, ref, and ALL in the explain Plan, which are shown in the SQL Execution Plan layer, t3 will perform a full table scan. Today I saw a cool SQL Optimization Method in this place, Straight-join: optimize. We also mentioned that, by narrowing down the result set of the driver table for connection optimization, we can see that the driver table with a small result set can indeed reduce the number of loops.
Of course, on the basis of this algorithm, MySQL introduced the Block Nested-Loop join algorithm. In fact, it is basically no different from the above algorithm. The pseudo code is as follows:
for each row in t1 matching range { for each row in t2 matching reference key { store used columns from t1, t2 in join buffer if buffer is full { for each row in t3 { for each t1, t2 combination in join buffer { if row satisfies join conditions, send to client } } empty buffer } }}if buffer is not empty { for each row in t3 { for each t1, t2 combination in join buffer { if row satisfies join conditions, send to client } }}
This algorithm caches the data of the outer loop in the join buffer, and compares the data in the buffer of the table round in the inner loop to reduce the number of cycles, thus improving the efficiency. There is a example on the official website, which I do not understand: If 10 rows are cached in the buffer, these 10 rows are passed to the inner loop, all rows in the inner loop are compared with the 10 rows in the buffer. The original article is as follows:
For example, if 10 rows are read into a buffer and the buffer is passed to the next inner loop, each row read in the inner loop can be compared against all 10 rows in the buffer
If S refers to the size of t1 and t2 combinations in the cache, and C is the number of these combinations in the buffer, the number of times the t3 table is scanned should be:
(S * C)/join_buffer_size + 1
According to this formula, the larger join_buffer_size, the smaller the number of scans. If join_buffer_size is used to cache all the previous row combinations, this is the best performance time, it will have no effect if it is increased later.
All the learning is done by reading the documentation.