Tutorial _ MySQL on MySQL index and slow query optimization shared by Meituan technical team

Last Update:2018-04-27 Source: Internet

Author: User

Tags mysql index

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article mainly introduces the MySQL index and slow query optimization tutorials shared by the technical team of Meituan network, and analyzes some optimization solutions based on the actual disk IO situation, which is very recommended! If you need it, you can refer to MySQL, which has become the first choice for most Internet companies thanks to its outstanding performance, low cost, and rich resources. Although the performance is outstanding, the so-called "good horse with good saddle", how to better use it, has become a required course for development engineers, we often see requirements from job descriptions such as "proficient in MySQL", "SQL statement optimization", and "understanding Database Principles. We know that the general application system has a read/write ratio of around, and insert operations and general update operations rarely encounter performance problems. The most common problems are also the most likely problems, as there are still some complex query operations, the optimization of query statements is obviously the top priority.
I have been optimizing slow queries in the core business System Department of Meituan since March 13, July. over 10 systems have been used to solve and accumulate hundreds of slow query cases. With the increase in business complexity, you may encounter a variety of strange and incredible problems. This article aims to explain the principles of database indexing and how to optimize slow queries from the perspective of developers.

Thoughts on a slow query

select  count(*) from  task where  status=2   and operator_id=20839   and operate_time>1371169729   and operate_time<1371174603   and type=2;

The system user responded that a feature was getting slower and slower, so the engineer found the above SQL.
I was excited and found out, "This SQL needs to be optimized. I need to add an index to each field"
I was surprised to ask, "Why do I need to add an index to each field ?"
"Adding all the queried fields to the index will make the query faster." The engineer is full of confidence.
"In this case, you can create a joint index. because it is the leftmost prefix match, operate_time needs to be placed at the end, and other related queries need to be used. a comprehensive evaluation is required."
"Joint Index? What is the leftmost prefix? Comprehensive Evaluation ?" Engineers can't help thinking.
In most cases, we know that indexes can improve query efficiency, but how should we create indexes? What is the order of indexes? Many people only know about it. In fact, it is not difficult to understand these concepts, and the indexing principle is far less complex than imagined.

MySQL indexing principles
Index purpose
The purpose of indexing is to improve the query efficiency. it can be analogous to a dictionary. if you want to query the word "mysql", you must locate the m letter and find the y letter from the bottom down, find the remaining SQL. If there is no index, you may need to read all the words to find what you want. what if I want to find the words starting with m? Or a word starting with ze? Do you think this cannot be done without an index?

Indexing principles
In addition to dictionaries, examples of indexes that can be seen everywhere in daily life, such as train stations, trains, and directories of books. They work in the same way. by constantly narrowing down the scope of data to be obtained, we can filter the final results and turn random events into ordered events, that is, we always lock data through the same search method.
The same is true for databases, but it is much more complex because not only equi-type queries, but also range queries (>, <, between, in) and fuzzy queries (like) and Union query (or. What methods should the database choose to deal with all the problems? Let's look back at the dictionary example. can we divide the data into segments and then perform segmented queries? In the simplest case, 1 to 1000 are divided into the first segment, 100 to 101 are divided into the second segment, and 200 to 201 are divided into the third segment ...... in this way, you only need to find the third row of 250th data records. all of a sudden, you can delete 90% invalid data records. But what if it is a 10 million record? what is better to divide it? The average complexity of the search tree is lgN, which has good query performance. However, we ignore a key issue here. the complexity model is based on the same operation cost each time. the database implementation is complicated and the data is stored on the disk. to improve performance, you can read part of the data into the memory each time for computing. because we know that the disk access cost is about 100,000 times the memory access cost, it is difficult for a simple search tree to meet complex application scenarios.

Disk I/O and pre-read
As mentioned above, access to a disk is a brief introduction of disk I/O and pre-reading. The data read from a disk depends on mechanical motion, the time spent reading data can be divided into three parts: track time, rotation delay, and transmission time. track time refers to the time required to move the magnetic arm to the specified track, mainstream disks are generally less than 5 ms. the rotation delay is the disk speed we often hear about. for example, if a disk is 7200 rpm, it means it can be converted 7200 times per minute, that is, it can be converted 120 times per second, the rotation delay is 1/120/2 = 4.17 ms. The transmission time refers to the time when data is read from the disk or written to the disk. generally, it is a few milliseconds at, which is negligible relative to the first two times. The time for one disk access, that is, the time for one disk IO is about 5 + 4.17 = 9 ms. it sounds pretty good, however, you need to know that a 500-MIPS machine can execute 0.5 billion commands per second, because the commands rely on the nature of electricity. In other words, the execution time of one IO can execute 0.4 million commands, databases tend to have 100,000 million or even tens of millions of data records. each time it takes 9 MS, it is clearly a disaster. It is a comparison of computer hardware latency for your reference:

Considering that disk I/O is a very high operation, the computer operating system has made some optimizations. when I/O is performed, not only the data of the current disk address, instead, the adjacent data is read to the memory buffer, because the local preread principle tells us that when the computer accesses the data of an address, the adjacent data will soon be accessed. Each data read by IO is called a page ). The specific size of data on a page is related to the operating system, which is generally 4 k or 8 k. that is, when we read data on a page, I/O actually occurs, this theory is very helpful for designing the index data structure.

Index data structure
I have discussed the indexing examples in my life, the basic principles of indexing, the complexity of databases, and the knowledge about operating systems. The purpose is to let everyone know, any data structure is not created out of thin air. it must have its background and application scenarios. Let's summarize what we need to do with this data structure. it is actually very simple: the number of disk I/O operations is controlled to a very small order of magnitude every time data is searched, preferably the constant order of magnitude. Then we thought about whether a highly controllable multi-path search tree can meet our needs? In this way, the B + tree came into being.

B + tree

For example, it is a B + tree. for the definition of B + tree, see B + tree. here we only talk about some key points. The light blue block is called a disk block, each disk block contains several data items (in dark blue) and pointers (in yellow). For example, disk block 1 contains data items 17 and 35, including pointers P1, P2, and P3, p1 indicates a disk block smaller than 17, P2 indicates a disk block between 17 and 35, and P3 indicates a disk block larger than 35. Real data exists in leaf nodes: 3, 5, 9, 10, 13, 15, 28, 29, 36, 60, 75, 79, 90, and 99. Non-leaf nodes only store real data, but only data items directing the search direction. for example, data items 17 and 35 do not actually exist in the data table.

B + tree search process
If you want to query data item 29, Disk Block 1 is first loaded from the disk to the memory. at this time, IO occurs. in the memory, use binary search to determine that 29 is between 17 and 35, the P2 pointer of disk block 1 is locked. the memory time is negligible because it is very short (compared to the disk IO, the address of the P2 pointer of disk block 1 is used to load disk block 3 from disk to memory. when the second IO occurs, the P2 pointer of disk block 3 is locked between 26 and 30, the pointer is used to load the disk block 8 to the memory, and the third IO occurs. at the same time, the binary search is performed in the memory to locate 29 and end the query, with a total of three I/O operations. The real situation is that the layer-3 B + tree can represent millions of data. if only three IO operations are required for millions of data queries, the performance improvement will be huge. If no index is available, if I/O occurs for each data item, a total of millions of I/O operations are required. Obviously, the cost is very high.

B + tree nature
1. through the above analysis, we know that the number of I/O operations depends on the height h of the number of B +. assume that the data in the current data table is N, and the number of data items in each disk block is m, h = partition (m + 1) N. when the data volume is N, the larger the value of m, the smaller the value of h. m = the size of the disk block/data item, the size of a disk block, that is, the size of a data page, is fixed. if the data items occupy less space, the more data items, and the lower the tree height. This is why each data item, that is, the index field should be as small as possible. for example, int occupies 4 bytes, which is half less than bigint8 bytes. This is also why the B + tree requires that the actual data be placed on the leaf node instead of the inner node. once placed on the inner node, the data items in the disk block will be greatly reduced, leading to an increase in the tree. When the data item is equal to 1, it is degraded into a linear table.
2. when the data items in the B + tree are composite data structures, such as (name, age, sex), the B + number creates a search tree from left to right, for example, when data such as (Zhang San, 20, F) is retrieved, the B + tree compares the name to determine the next search direction, if the names are the same, compare age and sex in sequence, and finally obtain the retrieved data. However, when (20, F) does not contain name data, B + tree does not know which node to query next, because name is the first comparison factor when a search tree is created. you must search by name before you can know where to query next. For example, when data such as (Michael, F) is retrieved, the B + tree can use name to specify the search direction, but the next field age is missing, therefore, we can only find all the data whose name is equal to Michael Jacob and then match the data whose name is F. This is a very important attribute, that is, the leftmost matching feature of the index.

Slow query optimization
The principle of MySQL indexing is boring. you only need to have a perceptual understanding, and you do not need to understand it thoroughly or deeply. Let's look back at the slow query we mentioned at the beginning. after learning about the indexing principle, do you have any ideas? Summarize the basic principles of indexes.

Index creation principles
1. the leftmost prefix matching principle is very important. mysql always matches the right until it encounters a range query (>, <, between, like) to stop matching, for example, if a = 1 and B = 2 and c> 3 and d = 4 is created for an index in the order of (a, B, c, d), d cannot use the index, if an index (a, B, d, c) is created, the order of a, B, and d can be adjusted as needed.
2. = and in can be out of order. for example, a = 1 and B = 2 and c = 3 (a, B, c) indexes can be created in any order, the mysql Query Optimizer helps you optimize it into a recognizable form of indexes.
3. select a column with a high degree of discrimination as the index. the formula for differentiation is count (distinct col)/count (*), indicating the proportion of fields that are not repeated. the larger the proportion, the fewer records we scan, the differentiation of the unique key is 1, while some state and gender fields may be 0 in front of big data. someone may ask, is there any experience with this proportion? Different use cases make it hard to determine this value. generally, we require more than 0.1 join fields, that is, to scan 10 records on average.
4. index columns cannot be used for calculation and keep the columns "clean". for example, if from_unixtime (create_time) = '2017-05-29 ', indexes cannot be used. The reason is very simple, the B + tree stores the field values in the data table. However, when searching, you must apply functions to all elements for comparison. Obviously, the cost is too high. Therefore, the statement should be written

create_time = unix_timestamp('2014-05-29');

5. extend the index as much as possible. do not create a new index. For example, if the table already has an index of a and now you want to add an index of (a, B), you only need to modify the original index.

Return to the start slow query
According to the leftmost matching principle, the index of the first SQL statement should be the combined index of status, operator_id, type, and operate_time. The order of status, operator_id, and type can be reversed, so I will say that all the related queries of this table will be found and will be comprehensively analyzed;
For example, the following query

select * from task where status = 0 and type = 12 limit 10;select count(*) from task where status = 0 ;

Therefore, it is very correct to create an index (status, type, operator_id, operate_time) because it can cover all situations. This uses the leftmost matching principle of the index.

Query optimization artifact-explain command
I believe you are familiar with the explain command. for details about the usage and field meanings, refer to the explain-output command on the official website. here we need to emphasize that rows are the core indicator, and most of the small rows statements must be executed quickly (with exceptions, ). Therefore, optimization statements are basically optimizing rows.

Basic steps for slow query optimization
0. run it first to see if it is really slow. pay attention to setting SQL _NO_CACHE
1. query the where condition in a single table and lock the minimum returned record table. This statement applies the where statement of the query statement to the table with the minimum number of records returned from the table and starts to check. each field in a single table is queried separately to see which field has the highest degree of discrimination.
2. explain to check whether the execution plan is as expected in 1 (query starting from a table with fewer locked Records)
3. order by limit SQL statements give priority to SORTED tables
4. understand business application scenarios
5. refer to the indexing principles when adding indexes
6. the observed results do not meet the expectation and continue to be analyzed from 0.

Several slow query cases
The following examples illustrate how to analyze and optimize slow queries.

Complex statement writing
In many cases, we write SQL only to implement functions. this is only the first step. different statement writing methods often have essential differences in efficiency, this requires us to have a clear understanding of the mysql execution plan and indexing principles. please refer to the following statements

select  distinct cert.emp_id from  cm_log cl inner join  (   select     emp.id as emp_id,     emp_cert.id as cert_id    from     employee emp    left join     emp_certificate emp_cert       on emp.id = emp_cert.emp_id    where     emp.is_deleted=0  ) cert    on (     cl.ref_table='Employee'      and cl.ref_oid= cert.emp_id   )    or (     cl.ref_table='EmpCertificate'      and cl.ref_oid= cert.cert_id   ) where  cl.last_upd_date >='2013-11-07 15:03:00'   and cl.last_upd_date<='2013-11-08 16:00:00';

0. run the command first. 53 records are 1.87 seconds old, and no aggregate statements are used, which is slow.

53 rows in set (1.87 sec)

1. explain

+----+-------------+------------+-------+---------------------------------+-----------------------+---------+-------------------+-------+--------------------------------+| id | select_type | table   | type | possible_keys          | key          | key_len | ref        | rows | Extra             |+----+-------------+------------+-------+---------------------------------+-----------------------+---------+-------------------+-------+--------------------------------+| 1 | PRIMARY   | cl     | range | cm_log_cls_id,idx_last_upd_date | idx_last_upd_date   | 8    | NULL       |  379 | Using where; Using temporary  || 1 | PRIMARY   | 
 
   | ALL  | NULL              | NULL         | NULL  | NULL       | 63727 | Using where; Using join buffer || 2 | DERIVED   | emp    | ALL  | NULL              | NULL         | NULL  | NULL       | 13317 | Using where          || 2 | DERIVED   | emp_cert  | ref  | emp_certificate_empid      | emp_certificate_empid | 4    | meituanorg.emp.id |   1 | Using index          |+----+-------------+------------+-------+---------------------------------+-----------------------+---------+-------------------+-------+--------------------------------+

Briefly describe the execution plan. first, mysql scans the cm_log table to obtain 379 records based on the idx_last_upd_date index. then, the table looks up and scans 63727 records, which are divided into two parts. derived indicates the construction of the table, that is, a table that does not exist can be simply understood as a result set formed by a statement. the following number indicates the statement ID. Derived2 indicates that the query with ID = 2 constructs a virtual table and returns 63727 records. Let's take a look at what the statement ID = 2 has written and returned such a large amount of data. First, the full table scans 13317 records in the employee table, and then associates the emp_certificate_empid with the emp_certificate table according to the index, rows = 1 indicates that only one record is locked for each Association, which is more efficient. After obtaining the information, associate the information with the 379 records of cm_log according to the rules. From the execution process, we can see that too much data is returned, and most of the returned data cm_log is not used, because cm_log only locks 379 records.
How to optimize it? We can see that we still need to join cm_log after running. can we join cm_log before? It is not difficult to analyze the statement carefully. The basic idea is that if the ref_table of cm_log is EmpCertificate, it is associated with the emp_certificate table. if the ref_table is Employee, it can be split into two parts, join with union. Note that union is used here instead of union all because the original statement has "distinct" to obtain the unique record, and union exactly has this function. If the original statement does not have distinct and does not need to be duplicated, we can directly use union all, because using union requires deduplication, which will affect SQL performance.
The optimized statements are as follows:

select  emp.id from  cm_log cl inner join  employee emp    on cl.ref_table = 'Employee'    and cl.ref_oid = emp.id where  cl.last_upd_date >='2013-11-07 15:03:00'   and cl.last_upd_date<='2013-11-08 16:00:00'   and emp.is_deleted = 0 unionselect  emp.id from  cm_log cl inner join  emp_certificate ec    on cl.ref_table = 'EmpCertificate'    and cl.ref_oid = ec.id inner join  employee emp    on emp.id = ec.emp_id where  cl.last_upd_date >='2013-11-07 15:03:00'   and cl.last_upd_date<='2013-11-08 16:00:00'   and emp.is_deleted = 0

2. you do not need to understand the business scenario. you only need to modify the statement to keep the result consistent with the statement before the transformation.

3. existing indexes can be satisfied without the need to create indexes

4. experiment with the modified statement. it only takes 10 ms to reduce it by nearly 200 times!

+----+--------------+------------+--------+---------------------------------+-------------------+---------+-----------------------+------+-------------+| id | select_type | table   | type  | possible_keys          | key        | key_len | ref          | rows | Extra    |+----+--------------+------------+--------+---------------------------------+-------------------+---------+-----------------------+------+-------------+| 1 | PRIMARY   | cl     | range | cm_log_cls_id,idx_last_upd_date | idx_last_upd_date | 8    | NULL         | 379 | Using where || 1 | PRIMARY   | emp    | eq_ref | PRIMARY             | PRIMARY      | 4    | meituanorg.cl.ref_oid |  1 | Using where || 2 | UNION    | cl     | range | cm_log_cls_id,idx_last_upd_date | idx_last_upd_date | 8    | NULL         | 379 | Using where || 2 | UNION    | ec     | eq_ref | PRIMARY,emp_certificate_empid  | PRIMARY      | 4    | meituanorg.cl.ref_oid |  1 |       || 2 | UNION    | emp    | eq_ref | PRIMARY             | PRIMARY      | 4    | meituanorg.ec.emp_id |  1 | Using where || NULL | UNION RESULT | 
 
   | ALL  | NULL              | NULL       | NULL  | NULL         | NULL |       |+----+--------------+------------+--------+---------------------------------+-------------------+---------+-----------------------+------+-------------+53 rows in set (0.01 sec)

Define application scenarios
The purpose of this example is to subvert our cognition of column discrimination. In general, we think that columns with a higher degree of discrimination are more likely to lock fewer records, but in some special circumstances, this theory has limitations.

select  * from  stage_poi sp where  sp.accurate_result=1   and (   sp.sync_status=0    or sp.sync_status=2    or sp.sync_status=4  );

0. First, let's see how long it takes to run. 951 pieces of data are 6.22 seconds, which is really slow.

951 rows in set (6.22 sec)

1. explain first, and rows reaches 3.61 million. type = ALL indicates full table scan.

+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+| id | select_type | table | type | possible_keys | key | key_len | ref | rows  | Extra    |+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+| 1 | SIMPLE   | sp  | ALL | NULL     | NULL | NULL  | NULL | 3613155 | Using where |+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+

2. the number of records returned for all fields is applied to query, because 951 records have been returned for a single table query 0.

3. let the explain rows approach 951 as far as possible

Check the number of records with accurate_result = 1.

select count(*),accurate_result from stage_poi group by accurate_result;+----------+-----------------+| count(*) | accurate_result |+----------+-----------------+|   1023 |       -1 || 2114655 |        0 ||  972815 |        1 |+----------+-----------------+

We can see that the discrimination of the accurate_result field is very low. The entire table has only three values:-1, 0, and 1, and the index cannot lock a very small amount of data.

Let's take a look at the sync_status field.

select count(*),sync_status from stage_poi group by sync_status;+----------+-------------+| count(*) | sync_status |+----------+-------------+|   3080 |      0 || 3085413 |      3 |+----------+-------------+

The same discrimination is also very low. According to theory, it is not suitable for index creation.

After the problem is analyzed, it seems that the table cannot be optimized. the discrimination between the two columns is very low. even if an index is added, it can only adapt to this situation, and it is difficult to make general optimization, for example, when the distribution of sync_status 0 and 3 is very average, the lock records are also millions of records.

4. contact the business party to see the application scenarios. The business side uses this SQL statement to scan qualified data every five minutes. after processing, the field sync_status is changed to 1, and the number of qualified records in five minutes is not too large, about 1000. After understanding the application scenarios of the business side, it is easy to optimize this SQL statement because the business side ensures data imbalance. if you add an index, you can filter out most unnecessary data.

5. use the following statement to create an index based on the index creation rules:

alter table stage_poi add index idx_acc_status(accurate_result,sync_status);

6. observe the expected results and find that only 200 ms is needed, which is more than 30 times faster.

952 rows in set (0.20 sec)

Let's review the problem analysis process. single-table queries are relatively Optimized. most of the time, you only need to add the fields in the where condition to the index according to the rules, if it is just such a "brainless" optimization, it is clear that some columns with low discrimination should not be indexed and will be indexed, which will seriously affect the performance of insertion and update, other query statements may also be affected. Therefore, it is critical that we use SQL statements with a low pace of 4th. only by knowing this business scenario can we better analyze and optimize query statements.

Statements that cannot be optimized

select  c.id,  c.name,  c.position,  c.sex,  c.phone,  c.office_phone,  c.feature_info,  c.birthday,  c.creator_id,  c.is_keyperson,  c.giveup_reason,  c.status,  c.data_source,  from_unixtime(c.created_time) as created_time,  from_unixtime(c.last_modified) as last_modified,  c.last_modified_user_id from  contact c inner join  contact_branch cb    on c.id = cb.contact_id inner join  branch_user bu    on cb.branch_id = bu.branch_id    and bu.status in (     1,   2)   inner join   org_emp_info oei      on oei.data_id = bu.user_id      and oei.node_left >= 2875      and oei.node_right <= 10802      and oei.org_category = - 1   order by   c.created_time desc limit 0 ,   10;

Several steps
0. check how long the statement runs first. it takes 13 seconds for 10 records, which is intolerable.

10 rows in set (13.06 sec)

1. explain

+----+-------------+-------+--------+-------------------------------------+-------------------------+---------+--------------------------+------+----------------------------------------------+| id | select_type | table | type  | possible_keys            | key           | key_len | ref           | rows | Extra                    |+----+-------------+-------+--------+-------------------------------------+-------------------------+---------+--------------------------+------+----------------------------------------------+| 1 | SIMPLE   | oei  | ref  | idx_category_left_right,idx_data_id | idx_category_left_right | 5    | const          | 8849 | Using where; Using temporary; Using filesort || 1 | SIMPLE   | bu  | ref  | PRIMARY,idx_userid_status      | idx_userid_status    | 4    | meituancrm.oei.data_id  |  76 | Using where; Using index           || 1 | SIMPLE   | cb  | ref  | idx_branch_id,idx_contact_branch_id | idx_branch_id      | 4    | meituancrm.bu.branch_id |  1 |                       || 1 | SIMPLE   | c   | eq_ref | PRIMARY               | PRIMARY         | 108   | meituancrm.cb.contact_id |  1 |                       |+----+-------------+-------+--------+-------------------------------------+-------------------------+---------+--------------------------+------+----------------------------------------------+

From the execution plan perspective, mysql first queries the org_emp_info table to scan 8849 records, then uses the index idx_userid_status to associate the branch_user table, uses the index idx_branch_id to associate the contact_branch table, and finally the primary key to associate the contact table.
The number of rows returned is very small, and no exception is found. Let's take a look at the statement and find the order by + limit combination following it. Will it be a big deal of sorting? So we simplified the SQL statement and removed the order by and limit values to see how many records are used for sorting.

select count(*)from  contact c inner join  contact_branch cb    on c.id = cb.contact_id inner join  branch_user bu    on cb.branch_id = bu.branch_id    and bu.status in (     1,   2)   inner join   org_emp_info oei      on oei.data_id = bu.user_id      and oei.node_left >= 2875      and oei.node_right <= 10802      and oei.org_category = - 1

+----------+| count(*) |+----------+|  778878 |+----------+1 row in set (5.19 sec)

We found that 778878 records were locked before sorting. if we sorted 0.7 million result sets, it would be disastrous. no wonder it was so slow. can we change our mind to sort the records according to the created_time of the contact first, will join be faster?
Therefore, you can use straight_join to optimize the statement below.

selectc.id,c.name,c.position,c.sex,c.phone,c.office_phone,c.feature_info,c.birthday,c.creator_id,c.is_keyperson,c.giveup_reason,c.status,c.data_source,from_unixtime(c.created_time) as created_time,from_unixtime(c.last_modified) as last_modified,c.last_modified_user_idfromcontact cwhereexists (select1fromcontact_branch cbinner joinbranch_user buon cb.branch_id = bu.branch_idand bu.status in (1,2)inner joinorg_emp_info oeion oei.data_id = bu.user_idand oei.node_left >= 2875and oei.node_right <= 10802and oei.org_category = - 1wherec.id = cb.contact_id)order byc.created_time desc limit 0 ,10;

Verify that the effect is expected to increase by more than 13000 times within 1 ms!

"sql10 rows in set (0.00 sec)

I thought this was a success, but we missed a detail in the previous analysis. In theory, the overhead is the same for sorting first, then joining first, and then sorting, why is there a limit! The general execution process is as follows: mysql first obtains the first 10 records by index sorting, and then performs join Filtering. when 10 records are found to be less than 10 records, it then goes to 10 records and joins again, this is obviously a disaster when there are a lot of data filtered by the internal join operation. in extreme cases, no data can be found on the inner layer, and mysql still takes 10 results at a time, almost traversed this data table!
In an SQL Test with different parameters

select  sql_no_cache  c.id,  c.name,  c.position,  c.sex,  c.phone,  c.office_phone,  c.feature_info,  c.birthday,  c.creator_id,  c.is_keyperson,  c.giveup_reason,  c.status,  c.data_source,  from_unixtime(c.created_time) as created_time,  from_unixtime(c.last_modified) as last_modified,  c.last_modified_user_id  from  contact c  where  exists (   select     1       from     contact_branch cb        inner join     branch_user bu                 on cb.branch_id = bu.branch_id                 and bu.status in (        1,      2)             inner join      org_emp_info oei                      on oei.data_id = bu.user_id                      and oei.node_left >= 2875                      and oei.node_right <= 2875                      and oei.org_category = - 1             where      c.id = cb.contact_id         )      order by   c.created_time desc limit 0 ,   10;

Empty set (2 min 18.99 sec)

2 min 18.99 sec! It is much worse than the previous situation. Due to mysql's nested loop mechanism, this situation is basically not optimized. This statement can only be handed over to the application system to optimize its own logic.
Through this example, we can see that not all statements can be optimized. However, when we optimize SQL use case regression, some extreme conditions may cause more serious consequences than the original ones. Therefore, first: Do not expect all statements to be optimized through SQL; second, do not be too confident. optimize statements only for specific cases, ignoring more complex situations.

The slow query cases are analyzed here. the above are just some typical cases. During the optimization process, we encountered more than 1000 rows, involving 16 Table join "spam SQL", and database differences when online or offline connections led to slow queries and application crashes, we have also encountered varchar equality comparison, not single quotation marks, and Cartesian product query, which directly killed the slave database. Many cases are actually just the accumulation of experience. if we are familiar with the internal principles of the query optimizer and index, it is very easy to analyze these cases.

Words written below
This article introduces MySQL indexing principles and some methodologies for optimizing slow queries using a slow query case. it also gives a detailed analysis of typical cases. In fact, after such a long period of statement optimization, we found that the optimization at any database level is not comparable to the optimization of the application system. it is also MySQL, which can be used to support Google/FaceBook/Taobao applications, however, your personal website may not be supported. Use the popular words recently: "query is easy, optimization is not easy, and you can write and cherish it !"

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More