Summary of the Nested-Loop Join algorithm in MySQL,

Source: Internet
Author: User

Summary of the Nested-Loop Join algorithm in MySQL,

After playing MySQL for more than two years, I found that many people say that MySQL is inferior to Oracle in terms of optimizer. In fact, to some extent, it is true, however, after all, MySQL is only available in version 5.7, and Oracle has been developed to 12c. Today I have looked at the MySQL connection algorithm. Well, Hash Join is not supported yet, only Nested-Loop Join is available. Let's summarize my learning experience today.

The basic Algorithm Implementation of Nested-Loop Join is as follows:

for each row in t1 matching range { for each row in t2 matching reference key {  for each row in t3 {   if row satisfies join conditions,   send to client  } }}

This code is very simple. Although I do not write much code, I still understand it. Here we assume there are three tables, t1, t2, t3, and this Code respectively show the range, ref, and ALL in the explain Plan, which are shown in the SQL Execution Plan layer, t3 will perform a full table scan. Today I saw a cool SQL Optimization Method in this place, Straight-join: optimize. We also mentioned that, by narrowing down the result set of the driver table for connection optimization, we can see that the driver table with a small result set can indeed reduce the number of loops.

Of course, on the basis of this algorithm, MySQL introduced the Block Nested-Loop join algorithm. In fact, it is basically no different from the above algorithm. The pseudo code is as follows:

for each row in t1 matching range { for each row in t2 matching reference key {  store used columns from t1, t2 in join buffer  if buffer is full {   for each row in t3 {    for each t1, t2 combination in join buffer {     if row satisfies join conditions,     send to client    }   }   empty buffer  } }}if buffer is not empty { for each row in t3 {  for each t1, t2 combination in join buffer {   if row satisfies join conditions,   send to client  } }}

This algorithm caches the data of the outer loop in the join buffer, and compares the data in the buffer of the table round in the inner loop to reduce the number of cycles, thus improving the efficiency. There is a example on the official website, which I do not understand: If 10 rows are cached in the buffer, these 10 rows are passed to the inner loop, all rows in the inner loop are compared with the 10 rows in the buffer. The original article is as follows:

For example, if 10 rows are read into a buffer and the buffer is passed to the next inner loop, each row read in the inner loop can be compared against all 10 rows in the buffer
If S refers to the size of t1 and t2 combinations in the cache, and C is the number of these combinations in the buffer, the number of times the t3 table is scanned should be:

(S * C)/join_buffer_size + 1

According to this formula, the larger join_buffer_size, the smaller the number of scans. If join_buffer_size is used to cache all the previous row combinations, this is the best performance time, it will have no effect if it is increased later.

When there is an Index, MySQL will try to use the Index Nested-Loop Join algorithm. In some cases, the Join column may not have an Index, at this time, MySQL's choice will never be the first Simple Nested-Loop Join algorithm introduced, because the algorithm is too rough to bear in the face. Complex SQL statements with a larger data volume may fail to run for several years. If you do not believe them, it is too young too simple. Or you can run some SQL statements.

The disadvantage of the Simple Nested-Loop Join algorithm is that it scans too many internal tables, resulting in too many scanned records. Compared with the Simple Nested-Loop Join algorithm, the Block Nested-Loop Join algorithm reduces the number of inner table scans. It can be the same as the Hash Join algorithm and only needs to scan the inner table once.

Articles you may be interested in:
  • Usage Analysis of mysql left join, right join, and inner join
  • Full usage of MySQL JOIN
  • How to update data during mysql multi-table join
  • Mysql not in, left join, is null, not exists efficiency issue records
  • Analysis of Mysql Join syntax and Performance Optimization

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.