Add an index for connections in MySQL

Last Update:2018-12-04 Source: Internet

Author: User

Tags mysql manual

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I will first use a simple example to illustrate how to add an index to a connection in MySQL, and then look at a challenging example.

Simple 3-table join

The table structure is very simple. Three tables tbla, TBLB, and tblc have three fields: col1, col2, and col3.
Connect Three tables without Indexes

Select
*
From
Tbla,
TBLB,
Tblc
Where
Tbla. col1 = TBLB. col1
And tbla. col2 = tblc. col1;

The explain result is as follows:

Java code

+ ------- + ------ + ------------- + ------ + --------- + ------ + ------------- +
| Table | type | possible_keys | key | key_len | ref | rows | extra |
+ ------- + ------ + ------------- + ------ + --------- + ------ + ------------- +
| Tbla | all | null | 1000 |
| TBLB | all | null | 1000 | using where |
| Tblc | all | null | 1000 | using where |
+ ------- + ------ + ------------- + ------ + --------- + ------ + ------------- +

   +-------+------+---------------+------+---------+------+------+-------------+   | table | type | possible_keys | key  | key_len | ref  | rows | Extra       |   +-------+------+---------------+------+---------+------+------+-------------+   | tblA  | ALL  | NULL          | NULL |    NULL | NULL | 1000 |             |   | tblB  | ALL  | NULL          | NULL |    NULL | NULL | 1000 | Using where |   | tblC  | ALL  | NULL          | NULL |    NULL | NULL | 1000 | Using where |   +-------+------+---------------+------+---------+------+------+-------------+

Finally, in the MySQL Manual (7.2.1 ):
Tables are listed in the order they are read by MySQL during query processing. MySQL scans multiple connections once (Single-Sweep multi-join) to resolve all connections. This means that MySQL reads a row from the first table, finds a matching row in the second table, and then in 3rd tables. After all the tables are processed, It outputs the selected columns and returns the table list until it finds a table with more matching rows. Read the next row from the table and continue processing the next table.
As stated in the manual, MySQL reads the first table (tnla), then the second table (TBLB), and the third table (tblc), as output in explain. The values in the previous table are used to find the rows in the current table. In our example, the value in tbla is used to find the matching row in TBLB, and then the value of TBLB is used to find the row in tblc. When a complete scan ends (the results are found in the table tbla, TBLB, and tblc), MySQL does not return tbla, it goes to TBLB to check whether more rows match the value of the current tbla. If yes, it takes out this line and then finds the matching in tblc. Remember that the basic principle of MySQL connection is very important: the values in the previous table are used to find the rows in the current table.

Create an index based on the original principle

I understand that MySQL uses the value obtained from tbla to search for rows in TBLB. How do we create indexes to help MySQL? Therefore, we need to know what it needs. Consider connecting tbla and TBLB: they are connected through "tbla. col1 = TBLB. col1. We already have the tbla. col1 value, so MySQL needs a TBLB. col1 value to complete the equivalent operation. Therefore, if MySQL requires TBLB. col1, we will add the index to TBLB. col1. After the addition, this is the new explain result:

Java code

+ ------- + ------ + --------------- + ---------- + --------- + ----------- + ------ + ------------- +
| Table | type | possible_keys | key | key_len | ref | rows | extra |
+ ------- + ------ + --------------- + ---------- + --------- + ----------- + ------ + ------------- +
| Tbla | all | null | 1000 |
| TBLB | ref | ndx_col1 | ndx_col1 | 5 | tbla. col1 | 1 | using where |
| Tblc | all | null | 1000 | using where |
+ ------- + ------ + --------------- + ---------- + --------- + ----------- + ------ + ------------- +

+-------+------+---------------+----------+---------+-----------+------+-------------+| table | type | possible_keys | key      | key_len | ref       | rows | Extra       |+-------+------+---------------+----------+---------+-----------+------+-------------+| tblA  | ALL  | NULL          | NULL     |    NULL | NULL      | 1000 |             || tblB  | ref  | ndx_col1      | ndx_col1 |       5 | tblA.col1 |    1 | Using where || tblC  | ALL  | NULL          | NULL     |    NULL | NULL      | 1000 | Using where |+-------+------+---------------+----------+---------+-----------+------+-------------+

As shown above, MySQL now uses the ndx_col1 index to connect TBLB to tbla. That is to say, when MySQL is looking for a row in TBLB, it uses the ndx_col1 index to directly obtain the matched row through the value of tbla. col1, instead of performing a table scan as before. This is why the ref column of TBLB says "tabla. col1 ". Tblc still uses table scanning, which can be solved in the same way. View MySQL requirements: From the SQL statement "tbla. col2 = tblc. col1", we can see that it requires tblc. col1 because we already have tbla. col2. After adding an index to this column, explain:

Java code

+ ------- + ------ + --------------- + ---------- + --------- + ----------- + ------ + ------------- +
| Table | type | possible_keys | key | key_len | ref | rows | extra |
+ ------- + ------ + --------------- + ---------- + --------- + ----------- + ------ + ------------- +
| Tbla | all | null | 1000 |
| TBLB | ref | ndx_col1 | ndx_col1 | 5 | tbla. col1 | 1 | using where |
| Tblc | ref | ndx_col1 | ndx_col1 | 5 | tbla. col2 | 1 | using where |
+ ------- + ------ + --------------- + ---------- + --------- + ----------- + ------ + ------------- +

+-------+------+---------------+----------+---------+-----------+------+-------------+| table | type | possible_keys | key      | key_len | ref       | rows | Extra       |+-------+------+---------------+----------+---------+-----------+------+-------------+| tblA  | ALL  | NULL          | NULL     |    NULL | NULL      | 1000 |             || tblB  | ref  | ndx_col1      | ndx_col1 |       5 | tblA.col1 |    1 | Using where || tblC  | ref  | ndx_col1      | ndx_col1 |       5 | tblA.col2 |    1 | Using where |+-------+------+---------------+----------+---------+-----------+------+-------------+

More complex queries

In practice, we will not encounter the SQL statement we just encountered. So you may want to look at this:

Select
Count (TBLB. a_id) as correct,
Tbla. type,
Tbla. se_type
From
Tbla,
TBLB,
Tblc,
Tbld
Where
Tbla. ex_id = tblc. ex_id
And tblc. st_ex_id = TBLB. st_ex_id
And TBLB. q_num = tbla. q_num
And TBLB. se_num = tbla. se_num
And tbld. ex_id = tbla. ex_id
And tbld. Exp <> TBLB. se_num
And TBLB. ans = tbla. ans
And tbla. ex_id = 1001
And tblc. r_id = 542
Group
Tbla. type,
Tbla. se_type;

At first glance, it is very complicated: There are four tables, an aggregate function, 9 where conditions, and a group. The greatness of explain is that we can ignore this now. We can only read two tables at a time to determine what MySQL needs at each step. This is an actual query, but the field name has some changes. Explain result:

Java code

+ ------- + -------- + ------------- + --------- + --------------- + ------- + ------------------------------------------------ +
| Table | type | possible_keys | key | key_len | ref | rows | extra |
+ ------- + -------- + ------------- + --------- + --------------- + ------- + ------------------------------------------------ +
| Tbla | all | null | 1080 | using where; using temporary; using filesort |
| TBLB | all | null | 87189 | using where |
| Tblc | eq_ref | primary | 4 | TBLB. st_ex_id | 1 | using where |
| Tbld | eq_ref | primary | 4 | tbla. ex_id | 1 | using where |
+ ------- + -------- + ------------- + --------- + --------------- + ------- + ------------------------------------------------ +

+-------+--------+---------------+---------+---------+---------------+-------+----------------------------------------------+| table | type   | possible_keys | key     | key_len | ref           | rows  | Extra                                        |+-------+--------+---------------+---------+---------+---------------+-------+----------------------------------------------+| tblA  | ALL    | NULL          | NULL    |    NULL | NULL          |  1080 | Using where; Using temporary; Using filesort || tblB  | ALL    | NULL          | NULL    |    NULL | NULL          | 87189 | Using where                                  || tblC  | eq_ref | PRIMARY       | PRIMARY |       4 | tblB.st_ex_id |     1 | Using where                                  || tblD  | eq_ref | PRIMARY       | PRIMARY |       4 | tblA.ex_id    |     1 | Using where                                  |+-------+--------+---------------+---------+---------+---------------+-------+----------------------------------------------+

The impact of the connection depends on the result set. The result set is the query result. For connections, an estimation result set is used to multiply the number of rows that MySQL predicts to read from each table. As an estimate, this is more biased towards the bad situation, because the where condition usually reduces the number of rows. However, the query result set contains 94 million rows. This is why there is no index connection. Thousands of rows multiply by thousands of rows and you will have a result set of millions.
So what does the query need now? Start with tbla and TBLB. In SQL:

And TBLB. q_num = tbla. q_num
And TBLB. se_num = tbla. se_num
And TBLB. ans = tbla. ans

MySQL requires at least one of q_num, se_num, and ans. I chose to add indexes to se_num and q_num because I need them in almost all other queries. Compromise is part of optimization. Most people do not have time to find the optimal index solution for each query. They can only find the best solution for most cases. The result of adding indexes (se_num, q_num) and explain to TBLB:

Java code

+ ------- + -------- + --------------- + ------------- + --------- + ------------------------ + ------ + ---------------------------------------------- +
| Table | type | possible_keys | key | key_len | ref | rows | extra |
+ ------- + -------- + --------------- + ------------- + --------- + ------------------------ + ------ + ---------------------------------------------- +
| Tbla | all | null | 1080 | using where; using temporary; using filesort |
| TBLB | ref | ndx_secn_qn | 2 | tbla. se_num, tbla. q_num | 641 | using where |
| Tblc | eq_ref | primary | 4 | TBLB. st_ex_id | 1 | using where |
| Tbld | eq_ref | primary | 4 | tbla. ex_id | 1 | using where |
+ ------- + -------- + --------------- + ------------- + --------- + ------------------------ + ------ + ---------------------------------------------- +

+-------+--------+---------------+-------------+---------+------------------------+------+----------------------------------------------+| table | type   | possible_keys | key         | key_len | ref                    | rows | Extra                                        |+-------+--------+---------------+-------------+---------+------------------------+------+----------------------------------------------+| tblA  | ALL    | NULL          | NULL        |    NULL | NULL                   | 1080 | Using where; Using temporary; Using filesort || tblB  | ref    | ndx_secn_qn   | ndx_secn_qn |       2 | tblA.se_num,tblA.q_num |  641 | Using where                                  || tblC  | eq_ref | PRIMARY       | PRIMARY     |       4 | tblB.st_ex_id          |    1 | Using where                                  || tblD  | eq_ref | PRIMARY       | PRIMARY     |       4 | tblA.ex_id             |    1 | Using where                                  |+-------+--------+---------------+-------------+---------+------------------------+------+----------------------------------------------+

Now the result set is reduced by 99.3% to 692280 rows. But why stop here? We can easily solve tbla table scanning. Because it is the first table, we do not need to add an index for the join, which has been done on TBLB. In general, adding an index to the first table can treat it as a query only for this table. In this example, tbla is: "And tbla. ex_id = 1001 ". We only need to add the ex_id index:

Java code

+ ------- + -------- + --------------- + ------------- + --------- + ------------------------ + ------ + ---------------------------------------------- +
| Table | type | possible_keys | key | key_len | ref | rows | extra |
+ ------- + -------- + --------------- + ------------- + --------- + ------------------------ + ------ + ---------------------------------------------- +
| Tbla | ref | ndx_ex_id | 4 | const | 1 | using where; using temporary; using filesort |
| TBLB | ref | ndx_secn_qn | 2 | tbla. se_num, tbla. q_num | 641 | using where |
| Tblc | eq_ref | primary | 4 | TBLB. st_ex_id | 1 | using where |
| Tbld | eq_ref | primary | 4 | tbla. ex_id | 1 | using where |
+ ------- + -------- + --------------- + ------------- + --------- + ------------------------ + ------ + ---------------------------------------------- +

+-------+--------+---------------+-------------+---------+------------------------+------+----------------------------------------------+| table | type   | possible_keys | key         | key_len | ref                    | rows | Extra                                        |+-------+--------+---------------+-------------+---------+------------------------+------+----------------------------------------------+| tblA  | ref    | ndx_ex_id     | ndx_ex_id   |       4 | const                  |    1 | Using where; Using temporary; Using filesort || tblB  | ref    | ndx_secn_qn   | ndx_secn_qn |       2 | tblA.se_num,tblA.q_num |  641 | Using where                                  || tblC  | eq_ref | PRIMARY       | PRIMARY     |       4 | tblB.st_ex_id          |    1 | Using where                                  || tblD  | eq_ref | PRIMARY       | PRIMARY     |       4 | tblA.ex_id             |    1 | Using where                                  |+-------+--------+---------------+-------------+---------+------------------------+------+----------------------------------------------+

Now the result set is 641 rows. Compared with the first 94 million, it can be said that the decrease is 100%. If we continue to study this query, we can also remove temp table and filesort, but now the query is very fast and we have explained how to add an index for the connection. Although it was very troublesome to look at this query at first, we can see that the whole process is not difficult as long as we read two tables independently each time and add indexes for MySQL requirements.

Conclusion

Adding an index to a complex connection requires two things:

1. No matter how complicated the SQL statement is, read only two tables in the explain statement at a time.

2. The values in the previous table already exist. Our job is to use indexes to help MySQL use these values in the current table to find matching rows.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More