The Mysql index and slow query optimization tutorials shared by the American Team Network technology Group _mysql

Source: Internet
Author: User
Tags mysql query mysql index

With its excellent performance, low cost and rich resources, MySQL has become the preferred relational database for most Internet companies. Although excellent performance, but the so-called "good horse with a good saddle", how to better use it, has become a compulsory course for development engineers, we often see from the job description such as "Proficient in MySQL", "SQL statement optimization", "understand the principle of the database" and other requirements. We know that the general application system, reading and writing ratio of about 10:1, and insert operation and general update operations rarely appear performance problems, the most encountered, but also the most problematic, or some complex query operations, so the query statement optimization is obviously the most important.
I from July 13 onwards, has been in the United States Mission core business system to do slow query optimization work, a total of more than 10 systems, cumulative solution and accumulated hundreds of slow query cases. With the complexity of the business upgrade, the problems encountered strange, all sorts of, incredible. This article aims to explain the principles of database indexing and how to optimize slow queries in terms of development engineers.

Thinking caused by a slow query

Select
  Count (*) 
from
  task 
where
  status=2 and 
  operator_id=20839 and 
  operate_time> 1371169729 and 
  operate_time<1371174603 and 
  type=2;

The system user response has a function that is getting slower, so the engineer finds the SQL above.
And I was in the mood to find me, "This SQL needs to be optimized and give me an index of each field."
I was surprised to ask, "Why do I need to index every field?" ”
"It's quicker to index the fields of the query." Engineers are confident.
"This can be done with a federated index, because it is the leftmost prefix match, so operate_time needs to be put in the end, and other related queries need to be taken, and a comprehensive evaluation is needed." ”
"Federated index?" Most left prefix match? A comprehensive assessment? The engineer couldn't help falling into a deep meditation.
In most cases, we know that indexes can improve query efficiency, but how do we build an index? What is the order of the indexes? Many people only know about it. In fact, it is not difficult to understand these concepts, and the principle of indexing is far less complex than imagined.

MySQL indexing principle
Indexing purpose
The goal of the index is to improve query efficiency, you can analogy dictionary, if you want to check the word "MySQL", we definitely need to locate the M letter, and then down from the bottom to find the Y letter, and then find the remaining SQL. If there is no index, then you may need to look through all the words to find what you want, if I want to find the words at the beginning of M? Or the word ze begins? Do you think that if there is no index, this thing can not be completed?

Indexing principle
In addition to dictionaries, there are examples of indexes in life, such as train stations and catalogues of books. The principles are the same, by narrowing the range of data you want to get to the end of the desired result, and turning random events into sequential events, which means we always lock the data by the same way.
The same is true for databases, but it's obviously a lot more complicated, because not only are you facing equivalent queries, there are scope queries (>, <, between, in), Fuzzy queries (like), and set queries (or), and so on. How should the database choose the way to deal with all the problems? We recall the dictionary example, can we divide the data into segments and then segment the query? The simplest if 1000 pieces of data, 1 to 100 into the first paragraph, 101 to 200 into the second paragraph, 201 to 300 into the third paragraph ... This way to look at the No. 250 data, as long as the third paragraph can be, all of a sudden to the exception of 90% of invalid data. But what if it's a 10 million record and it's better to break into paragraphs? A little algorithm based students will think of the search tree, its average complexity is LGN, with good query performance. But here we have overlooked a key problem, the complexity model is based on each of the same operating costs to consider, the database implementation is more complex, data stored on disk, and in order to improve performance, each time you can read some of the data into memory to calculate, because we know that the cost of access to the disk is about 100,000 times times the access to memory, So the simple search tree is difficult to meet the complex application scenarios.

Disk IO and pre-read
the previous mention of access to the disk, so here is a brief introduction to disk IO and prefetching, disk reading data by mechanical motion, each read data can be divided into search time, rotation delay, transmission time three parts, seek time refers to the magnetic arm to move to the specified track time, Mainstream disk generally below 5ms; rotational delay is what we often hear about disk speed, such as a disk 7200 rpm, means that 7,200 times per minute, that is, 1 seconds to turn 120 times, rotation delay is 1/120/2 = 4.17ms ; Transmission time refers to the time that is read out from disk or written to disk, typically in a fraction milliseconds, which can be negligible relative to the first two times. Then the time to access the disk, that is, the time of disk IO is about equal to 5+4.17 = 9ms, sounds pretty good, but to know that a 500-mips machine can execute 500 million instructions per second, because the instructions rely on the nature of electricity, In other words, the time to perform an IO can execute 400,000 instructions, the database is 1.001 billion or even tens data, 9 milliseconds each time, is obviously a disaster. The diagram below is a comparison diagram of computer hardware delay for your reference:

Considering that disk IO is a very expensive operation, the computer operating system does some optimization, when an IO, not only the current disk address data, but also the adjacent data are read to the memory buffer, because the principle of local prefetching tells us that when the computer access to an address of the data, Data that is adjacent to it will be accessed quickly. Each time IO reads the data we call a page. A specific page of how much data related to the operating system, generally 4k or 8k, that is, we read a page of data, actually occurred once io, this theory for the data structure of the index design is very helpful.

Data structure of the index
in front of the example of the index of Life, the basic principles of indexing, the complexity of the database, and the relevant knowledge of the operating system, the purpose is to let you understand that any kind of data structure is not produced in thin air, there must be its background and use of the scene, we now summarize, What we need to do with this data structure is, in fact, simple: to control the number of disk IO times at a very small order of magnitude each time you look for data, preferably a constant order of magnitude. So what do we think if a highly controllable multi-channel search tree meets the needs? In this way, B + trees came into being.

Detailed B + Tree

As shown above, is a B + tree, the definition of B + tree can be seen in B + trees, here's a few points, light blue blocks we call a disk block, and you can see that each disk block contains several data items (shown in dark blue) and pointers (shown in yellow), such as disk Block 1 containing data items 17 and 35, containing pointers P1, P2 , P3,P1 represents a disk block less than 17, P2 represents a disk block between 17 and 35, and P3 represents a disk block greater than 35. The real data is found in leaf nodes, 3, 5, 9, 10, 13, 15, 28, 29, 36, 60, 75, 79, 90, 99. Non-leaf nodes do not store real data, only data items that guide the direction of the search, such as 17, 35, are not real in the datasheet.

The search process of B + Tree
as shown in the figure, if you are looking for data item 29, then the disk Block 1 is loaded into memory by the disk, at which time Io, in memory with a binary lookup to determine 29 between 17 and 35, locking disk Block 1 P2 pointer, memory time because very short (compared to disk IO) can be ignored, Disk Block 3 is loaded into memory by the disk address of the P2 pointer of disk Block 1. The second io,29 occurs between 26 and 30, locking the disk block 3 P2 pointer, loading the disk block 8 to memory via the pointer, taking a third Io, while in memory doing a binary lookup find 29, ending the query, totaling three times io. The real situation is that 3-storey B + trees can represent millions of data, if millions of data lookups only need three times IO, performance improvement will be huge, if there is no index, each data item will occur once IO, then a total of million Io, obviously the cost is very high.

B + Tree Property
1. Through the analysis above, we know that IO times depend on the height of the B + number H, assuming that the current data table data is N, the number of data items per disk block is M, then there is H=㏒ (m+1) n, when the amount of data n is certain, the greater the M, the smaller the H; m = disk block size/data item size , the size of the disk block is a data page size, is fixed, if the data items accounted for the smaller the number of data items, the higher the height of the tree. This is why each data item, that is, an indexed field, is as small as possible, such as int, which is 4 bytes, and less than half the bigint8 byte. This is why the B + tree requires the real data to be put to the leaf node rather than the inner node, once put into the inner node, the disk block data items will drop significantly, resulting in higher tree. When the data item equals 1 o'clock, it will degenerate into a linear table.
2. When the data item of the B + tree is a composite data structure, for example (Name,age,sex), the B + number is set up to search the tree in order from left to right, for example, when data such as (John, 20,f) is retrieved, the B + Tree prioritizes the name to determine the direction of the next search, If the name is the same and then the age and sex are compared, then the retrieved data is obtained; but when no name-like data comes in (20,F), the B + tree doesn't know which node to look at Next, because name is the first comparison factor when the search tree is built, You must search by name before you know where to go next. For example, when data such as (John, F) is retrieved, B + trees can use name to specify the direction of the search, but the next field age is missing, so only the name equal to John data are found, and then match the gender is the data of F, this is very important property, namely the index of the leftmost matching characteristics.

Slow query optimization
about the MySQL indexing principle is relatively boring things, we only need to have a perceptual understanding, do not need to understand very thoroughly and in-depth. We look back at the beginning of the slow query we said, after understanding the principle of indexing, we do not have any ideas? Let's summarize some basic principles of indexing

Several principles of index building
1. The leftmost prefix matching principle, very important principle, MySQL will always match to the right until the range query (>, <, between, like) to stop the match, such as a = 1 and B = 2 and C > 3 and D = 4 if established (A, B,C,D) sequential index, D is not indexed, if the index is established (A,B,D,C) can be used, a,b,d order can be arbitrarily adjusted.
2.= and in can be ordered, such as a = 1 and B = 2 and C = 3 established (A,B,C) index can be in any order, the MySQL query optimizer will help you to optimize the form that the index can recognize
3. As far as possible to choose a high degree of distinction between the column as an index, the formula for the degree of discrimination is count (distinct col)/count (*), indicating the proportion of the field does not repeat, the greater the proportion of the number of records we scan, the unique key is 1, and some states The gender field may be in front of the Big data distinction is 0, that may be asked, what is the empirical value of this ratio? Using the scene is different, this value is also difficult to determine, the general need to join the field we are required to be more than 0.1, that is, the average 1 scan 10 records
4. Index columns can not participate in the calculation, keep the column "clean", such as from_unixtime (create_time) = ' 2014-05-29 ' can not be used to the index, the reason is very simple, B + trees are stored in the data table in the field values, but when retrieving, You need to apply all the elements to the function to be able to compare, obviously cost too much. So the statement should be written

Create_time = Unix_timestamp (' 2014-05-29 ');

5. Try to expand the index, do not create a new index. For example, the table already has the index of a, now want to add (a,b) index, then only need to modify the original index can

Back to the start of the slow query
according to the leftmost matching principle, the index of the first SQL statement should be the federated Index of status, operator_id, type, operate_time, where the order of status, operator_id, type can be reversed, so I say , all the relevant inquiries of this table are found, will be integrated analysis;
For example, there are the following inquiries

SELECT * from task where status = 0 and type = limit;
Select COUNT (*) from task where status = 0;

Then indexing (status,type,operator_id,operate_time) is very correct because it can be overridden in all cases. This is the principle of using the leftmost match of the index

Query optimization artifact-explain command
about explain command believe that everyone is not unfamiliar, the specific use and field meaning can refer to the official website Explain-output, here need to emphasize that rows is the core index, most rows small statement execution must be very fast (with exception, the following will be mentioned). So the optimization statements are basically optimizing rows.

Slow query Optimization basic steps
0. First run to see if it's really slow, note set Sql_no_cache
1.where condition Checklist, lock minimum return record table. The meaning of this sentence is to use the query statement where all the records returned in the table to the smallest table began to check, the single table each field query, see which field is the highest degree of discrimination
2.explain View execution plan, consistent with 1 expectations (start query from a table with fewer locked records)
3.order by limit SQL statements to let sorted tables first check
4. Understand the business side use scenario
5. Several major principles of reference indexing when indexing
6. Observations, inconsistent with expected continuation from 0 analysis

Several slow query cases
The following examples explain in detail how to analyze and optimize a slow query

Complex statement Writing
In many cases, we write SQL only to achieve functionality, this is only the first step, the different statement writing methods for efficiency are often essential differences, which requires us to the MySQL implementation plan and indexing principles have a very clear understanding, please see the following statement

Select
  distinct cert.emp_id
  from Cm_log cl 
inner JOIN
  (
   select
     emp.id as emp_id,
     Emp_cert.id as cert_id 
   from
     employee EMP left 
   join
     emp_certificate emp_cert on 
      emp.id = Emp_ cert.emp_id 
   where
     emp.is_deleted=0
  ) cert on 
   (
     cl.ref_table= ' Employee ' and 
     cl.ref_ oid= cert.emp_id
   ) 
   or (
     cl.ref_table= ' empcertificate ' and 
     cl.ref_oid= cert.cert_id
   ) 
where
  cl.last_upd_date >= ' 2013-11-07 15:03:00 ' and 
  cl.last_upd_date<= ' 2013-11-08 16:00:00 ';

0. Run first, 53 records 1.87 seconds, and no use of aggregation statements, relatively slow

Rows in Set (1.87 sec)

1.explain

+----+-------------+------------+-------+---------------------------------+-----------------------+---------+-- -----------------+-------+--------------------------------+
| ID | Select_type | Table | Type | Possible_keys | Key | Key_len | Ref | Rows |
Extra | +----+-------------+------------+-------+---------------------------------+-----------------------+---------+-- -----------------+-------+--------------------------------+
| 1 | PRIMARY | CL | Range | Cm_log_cls_id,idx_last_upd_date | Idx_last_upd_date | 8 |  NULL | 379 | The Using where; Using Temporary | | 1 | PRIMARY | <derived2> | All | NULL | NULL | NULL | NULL | 63727 | The Using where; Using Join Buffer | | 2 | DERIVED | EMP | All | NULL | NULL | NULL | NULL | 13317 | Using where | | 2 | DERIVED | Emp_cert | Ref | Emp_certificate_empid | Emp_certificate_empid | 4 | Meituanorg.emp.id |  1 |
Using Index | +----+-------------+------------+-------+---------------------------------+-----------------------+---------+--
 -----------------+-------+--------------------------------+

Briefly describe the execution plan, first MySQL obtains 379 records according to the Idx_last_upd_date Index Scan cm_log table, then the look-up table scans 63,727 records, divides into two parts, derived represents the construction table, also is the nonexistent table, Can be simply understood as a result set of a statement, followed by a number representing the ID of the statement. DERIVED2 represents a query with ID = 2 that constructs a virtual table and returns 63,727 records. Let's take a look at what the ID = 2 statement actually did. Returns such a large amount of data, first the entire Table scan employee table 13,317 Records, and then according to the index emp_certificate_empid associated emp_certificate table, rows = 1 indicates that Each association locks only one record and is more efficient. After the acquisition, the 379 records of Cm_log are associated with the rule. From the execution process can be seen that too much data returned, most of the data returned Cm_log are not used, because Cm_log only locked 379 records.
How do I optimize it? We can see that after the run, we still have to do join with Cm_log, so can we join with Cm_log before? Careful analysis of the statement is not difficult to find, the basic idea is if Cm_log ref_table is empcertificate on the Association emp_certificate table, if Ref_table is employee on the Association employee table, We can completely break it down into two parts and join it with union, and note that the Union is used instead of union all because the original statement has a "distinct" to get a unique record, and the union happens to have the function. If no distinct in the original statement does not need to be weighed, we can use union all directly, because the use of Union requires a heavy action that affects SQL performance.
The optimized statements are as follows

Select
  emp.id 
from
  cm_log cl 
INNER JOIN
  employee emp on 
   cl.ref_table = ' Employee ' 
   and Cl.ref_oid = emp.id 
where
  cl.last_upd_date >= ' 2013-11-07 15:03:00 ' and 
  cl.last_upd_date<= ' 2013-11-08 16:00:00 ' and 
  emp.is_deleted = 0 
Union
Select
  emp.id 
from
  cm_log cl 
INNER JOIN
  emp_certificate EC on 
   cl.ref_table = ' empcertificate ' and 
   cl.ref_oid = ec.id 
inner join< C22/>employee emp on 
   emp.id = ec.emp_id 
where
  cl.last_upd_date >= ' 2013-11-07 15:03:00 ' 
  and cl.last_upd_date<= ' 2013-11-08 16:00:00 ' and 
  emp.is_deleted = 0

2. Does not need to understand the business scene, only needs to transform the statement and the transformation preceding statement maintains the result to be consistent

3. Existing indexes can be satisfied without the need to build indexes

4. After the transformation of the sentence test, only need to reduce the 10ms nearly 200 times times!

+----+--------------+------------+--------+---------------------------------+-------------------+---------+---- -------------------+------+-------------+
| ID | Select_type | Table | Type | Possible_keys | Key | Key_len | Ref | Rows |
Extra | +----+--------------+------------+--------+---------------------------------+-------------------+---------+---- -------------------+------+-------------+
| 1 | PRIMARY | CL | Range | Cm_log_cls_id,idx_last_upd_date | Idx_last_upd_date | 8 | NULL | 379 | Using where | | 1 | PRIMARY | EMP | Eq_ref | PRIMARY | PRIMARY | 4 |  meituanorg.cl.ref_oid | 1 | Using where | | 2 | UNION | CL | Range | Cm_log_cls_id,idx_last_upd_date | Idx_last_upd_date | 8 | NULL | 379 | Using where | | 2 | UNION | EC | Eq_ref | Primary,emp_certificate_empid | PRIMARY | 4 |  meituanorg.cl.ref_oid |       1 | |
| 2 | UNION | EMP | Eq_ref | PRIMARY | PRIMARY | 4 |  meituanorg.ec.emp_id | 1 | Using where | | NULL | UNION Result | <union1,2> | All | NULL | NULL | NULL | NULL |       NULL |
| +----+--------------+------------+--------+---------------------------------+-------------------+---------+----
 -------------------+------+-------------+ rows in Set (0.01 sec)

Clear Application Scenarios
The purpose of this example is to subvert our perception of the distinction of a column, in general we think that the higher the distinction of the column, the easier it is to lock fewer records, but in some special cases, this theory has limitations

Select
  * 
from
  stage_poi sp 
where
  sp.accurate_result=1 and 
  (
   sp.sync_status=0 
   or sp.sync_status=2 
   or sp.sync_status=4
  );

0. First look at how long to run, 951 data 6.22 seconds, really very slow

951 rows in Set (6.22 sec)

1. First explain,rows reached 3.61 million, type = All indicates a full table scan

+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
| id | Select_type | Table | Type | Possible_keys | Key | Key_len | Ref | Rows  | Extra    |
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+
| 1 | Simple   | SP  | All | NULL     | NULL | NULL  | NULL | 3613155 | Using where |
+----+-------------+-------+------+---------------+------+---------+------+---------+-------------+

2. All fields apply the query returns the number of records, because it is a single table query 0 has done 951

3. Let explain's rows as close as possible to 951

Take a look at the number of records Accurate_result = 1

Select COUNT (*), Accurate_result from Stage_poi Group by Accurate_result;
+----------+-----------------+
| count (*) | Accurate_result
| +----------+-----------------+
|   1023 |       -1 |
| 2114655 |        0 |
|  972815 |        1 |
+----------+-----------------+

We see that the accurate_result of this field is very low, the entire table has only -1,0,1 three values, and the index is unable to lock particularly small amounts of data

Look at the Sync_status field again.

Select COUNT (*), sync_status from Stage_poi Group by Sync_status;
+----------+-------------+
| count (*) | Sync_status
| +----------+-------------+
|   3080 |      0 |
| 3085413 |      3 |
+----------+-------------+

The same distinction is also very low, according to the theory, it is not appropriate to establish an index

Problem analysis to this, as if the table can not optimize the conclusion that two columns are very low, even if the index can only adapt to this situation, it is difficult to do universal optimization, such as when Sync_status 0, 3 distribution is very average, then the lock record is also millions other

4. Find the business side to communicate and see the use of the scene. Business side is so to use this SQL statement, every five minutes will scan the eligible data, processing completed after the Sync_status this field into 1, five minutes to meet the criteria of the number of records will not be too much, 1000 or so. Once you understand the usage scenarios of the business side, optimizing this SQL becomes simple, because the business side guarantees the imbalance of the data, and if you add the index, you can filter out most of the unwanted data.

5. Based on indexing rules, use the following statement to establish an index

ALTER TABLE STAGE_POI Add index Idx_acc_status (accurate_result,sync_status);

6. Observe the expected results, found that only 200ms, faster than 30 times.

952 rows in Set (0.20 sec)

Let's go back to the process of analyzing the problem. A single table query is relatively good optimization, most of the time only need to put the field in the where conditions in accordance with the rules of the index, if only this "no brain" optimization, obviously some very low distinction between the columns, should not indexed columns will be indexed, This can have a serious impact on insert, update performance, and may also affect other query statements. So our 4th step is very critical to the use of SQL, we only know this business scenario, in order to better assist us to better analyze and optimize the query statements.

Statements that cannot be optimized

Select
  c.id,
  c.name,
  c.position,
  c.sex,
  c.phone
  , C.office_phone, C.feature_info,
  C.birthday,
  c.creator_id,
  C.is_keyperson,
  C.giveup_reason,
  c.status,
  C.data_ Source,
  From_unixtime (c.created_time) as Created_time,
  from_unixtime (c.last_modified) as last_modified,
  c.last_modified_user_id from Contacts
  C 
INNER join
  contact_branch cb on 
   c.id = cb.contact_id 
inner join
  branch_user bu on 
   cb.branch_id = bu.branch_id and 
   bu.status in (
     1,< C27/>2) 
  inner join
   org_emp_info oei on 
     oei.data_id = bu.user_id and 
     oei.node_left 
     >= and Oei.node_right <= 10802 
     and oei.org_category =-1
   ORDER BY c.created_time desc LIMIT 0,
   10;

or a few steps.
0. See how long the statement runs, 10 records in 13 seconds, has been intolerable

Rows in Set (13.06 sec)

1.explain

+----+-------------+-------+--------+-------------------------------------+-------------------------+---------+ --------------------------+------+----------------------------------------------+
| ID | Select_type | Table | Type | Possible_keys | Key | Key_len | Ref | Rows |
Extra | +----+-------------+-------+--------+-------------------------------------+-------------------------+---------+ --------------------------+------+----------------------------------------------+
| 1 | Simple | Oei | Ref | idx_category_left_right,idx_data_id | Idx_category_left_right | 5 | Const | 8849 | The Using where; Using temporary; Using Filesort | | 1 | Simple | Bu | Ref | Primary,idx_userid_status | Idx_userid_status | 4 |  meituancrm.oei.data_id | 76 | The Using where; Using Index | | 1 | Simple | CB | Ref | idx_branch_id,idx_contact_branch_id | idx_branch_id | 4 |  meituancrm.bu.branch_id |            1 |           |
| 1 | Simple | C | Eq_ref | PRIMARY | PRIMARY | 108 |  meituancrm.cb.contact_id |                       1 |
| +----+-------------+-------+--------+-------------------------------------+-------------------------+---------+
 --------------------------+------+----------------------------------------------+

From the implementation plan, MySQL first check the org_emp_info table Scan 8849 Records, then use the Index Idx_userid_status Association branch_user table, and then indexed idx_branch_id associated Contact_branch table, The last primary key is associated with the Contact table.
Rows returned very little and saw no anomalies. We are looking at the statement, found that there is an order by + limit combination, will be the amount of sorting too much? So we simplified the SQL, removed the order by and limit, and looked at exactly how many records were used to sort

Select
 Count (*) from the contacts
  C 
INNER join
  contact_branch cb on 
   c.id = cb.contact_id 
Inner join
  branch_user bu on 
   cb.branch_id = bu.branch_id and 
   bu.status in (
     1,
   2) 
  inner Join
   org_emp_info oei on 
     oei.data_id = bu.user_id and 
     oei.node_left >= 2875 and Oei.node_right 
     & lt;= 10802 and 
     oei.org_category =-1 

+----------+
| count (*) |
+----------+
|  778878 |
+----------+
1 row in Set (5.19 sec)

Found that the sort before incredibly locked 778,878 records, if for 700,000 of the result set sort, will be disastrous, no wonder so slow, then we can change the idea, first according to the contact Created_time sort, then join will not be faster?
The following statement can be transformed into a straight_join to optimize

Select
c.id,
c.name,
c.position,
c.sex,
c.phone
, C.office_phone, C.feature_info,
C.birthday,
c.creator_id,
C.is_keyperson,
C.giveup_reason,
c.status,
C.data_ Source,
From_unixtime (c.created_time) as Created_time,
from_unixtime (c.last_modified) as last_modified,
c.last_modified_user_id from Contacts
C
where
exists (
select
1
from
Contact_branch CB
INNER join
branch_user bu on
cb.branch_id = bu.branch_id and
bu.status in (
1,
2)
inner join
org_emp_info oei on
oei.data_id = bu.user_id and
oei.node_left >= 2875 and
oei.node_right <= 10802 and
oei.org_category =-1
where
c.id = cb.contact_id
) C40/>order
by c.created_time desc LIMIT 0,
10;

Verify that the effect is expected to be within 1ms, a 13,000-fold increase!

' SQL rows in
Set (0.00 sec)

I thought this workpiece was a success, but we missed a detail in the previous analysis, the first sort of join and first join reordering the theoretical cost is the same, why so much to promote because there is a limit! The general implementation process is: MySQL first sorted by index to get the first 10 records, then go to join filter, when found not enough 10, go again to 10, join again, this obviously in the inner layer join filter data very much, will be a disaster, extreme situation, the inner layer of a piece of data can not find, MySQL is also silly to take 10 each time, almost traversing the data table!
With different parameters of the SQL test

 select Sql_no_cache C.id, C.name, C.position, C.sex, C.phone, C.office_phone , C.feature_info, C.birthday, c.creator_id, C.is_keyperson, C.giveup_reason, C.status, C.data_source, from  
_unixtime (C.created_time) as Created_time, From_unixtime (c.last_modified) as last_modified, c.last_modified_user_id From contacts C where exists (select 1 from contact_branch CB INNER JOIN Branch_use        
     R bu on cb.branch_id = bu.branch_id and Bu.status in (1, 2) INNER JOIN Org_emp_info oei on oei.data_id = bu.user_id and Oei.node_lef        
     T >= 2875 and Oei.node_right <= 2875 and oei.org_category =-1
where c.id = cb.contact_id) Order BY c.created_time desc limit 0, 10; 
Empty Set (2 min 18.99 sec)

2 min 18.99 sec! It's a lot worse than before. Because of the nested loop mechanism of MySQL, this situation is basically impossible to optimize. Ultimately, this statement can only be given to the application system to optimize its own logic.
From this example we can see that not all statements can be optimized, and often we optimize, because the SQL use case regression to drop some extreme circumstances, will cause more serious consequences than the original. So, first: don't expect all statements to be optimized by SQL, second: Don't be overly confident, just optimize for specific cases, and ignore more complex situations.

Slow query cases are analyzed here, these are just a few typical cases. We've encountered more than 1000 lines in the optimization process, involving 16 table join "Garbage SQL", also encountered a line of offline database differences led to the application is directly dragged down by slow query, but also encountered varchar equivalent not to write single quotes, but also encountered a Cartesian product query directly from the library to kill. No matter how many cases are actually just some accumulation of experience, if we are familiar with the query optimizer, the internal principles of the index, then analysis of these cases becomes particularly simple.

It's written in the back.
in this paper, we introduce the MySQL indexing principle and the methodology of optimizing the slow query in a slow query case, and make a detailed analysis for the typical cases encountered. In fact, after such a long period of sentence optimization found that any database level optimization is not up to the application system optimization, the same is MySQL, can be used to support google/facebook/taobao applications, but may not even your personal site can not hold. Apply recently more popular words: "Query easy, optimization is not easy, and write and cherish!" ”

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.