MySQL: Indexing principle and slow query optimization

Source: Internet
Author: User
Tags benchmark bulk insert create index mul mysql query mysql index

    • An introduction
    • The principle of two indexes
    • Three-indexed data structures
    • Three MySQL index management
    • Four Test indexes
    • Five correct use index
    • Six query optimization artifact-explain
    • The basic steps of seven-slow query optimization
    • Eight Slow log management
    • Nine Reference Blogs
An introduction

Why should I have an index?

General application system, reading and writing ratio of about 10:1, and the insertion operation and the general update operation rarely appear performance problems, in the production environment, we encounter the most, is also the most prone to problems, or some complex query operations, so the optimization of query statements is obviously the most serious. Speaking of accelerating queries, you have to mention the index.

What is an index?

The index, also called a "key" in MySQL, is a data structure used by the storage engine to quickly find records. Indexes for good performance
Is critical, especially when the amount of data in the table is increasing, the impact of the index on performance becomes increasingly important.
Index optimization should be the most effective means of optimizing query performance. Indexes can easily improve query performance by several orders of magnitude.
The index is equivalent to the dictionary's Sequencer list, if you want to check a word, if you do not use a sequencer, you need to check from page hundreds of.

                              5-          661    6    each in 100
two principles of indexing

One index principle

The purpose of the index is to improve the efficiency of the query, which is the same as the directory we use to look up books: First locate the chapter, then navigate to a section under that chapter, and then find the number of pages. Similar examples include: Look up a dictionary, check train trips, plane flights, etc.

Essentially, by narrowing down the range of data you want to get to the final desired result, and turning random events into sequential events, that is, with this indexing mechanism, we can always use the same search method to lock the data.

The database is the same, but obviously more complex, because not only face the equivalent query, there are scope queries (>, <, between, in), Fuzzy query (like), the set query (or) and so on. How should the database choose the way to deal with all the problems? We recall the example of the dictionary, can we divide the data into segments and then query it in segments? The simplest if 1000 data, 1 to 100 is divided into the first paragraph, 101 to 200 is divided into the second paragraph, 201 to 300 is divided into the third paragraph ... This check No. 250 data, as long as the third paragraph can be, all of a sudden to remove 90% of invalid data. But what if it's a 10 million record and it's better to be divided into sections? A little algorithm based on the students will think of the search tree, its average complexity is LGN, with good query performance. But here we overlook a key problem, and the complexity model is based on the same cost per operation. The database implementation is more complex, on the one hand, the data is saved on disk, on the other hand, in order to improve performance, each time you can read some of the data into memory to calculate, because we know that the cost of accessing the disk is about 100,000 times times the amount of access to memory, so simple search tree difficult to meet the complex application scenario.

Two-disk IO and pre-read

Before referring to the access disk, then here is a brief introduction of disk IO and pre-reading, disk reading data by the mechanical movement, the time spent on each read data can be divided into the seek time, rotation delay, transmission time three parts, seek time refers to the magnetic arm moved to the specified track time, The main disk is generally below 5ms; rotation delay is what we often hear of disk speed, such as a disk 7200 rpm, that can be rotated 7,200 times per minute, that is, 1 seconds can go 120 times, rotation delay is 1/120/2 = 4.17ms Transmission time refers to the time that reads from disk or writes data to disk, typically in fraction milliseconds, and is negligible relative to the first two times. Then the time to access a disk, that is, a disk IO time is approximately equal to 5+4.17 = 9ms, sounds pretty good, but to know that a 500-mips (Million instructions per Second) machine can execute 500 million instructions per second, Because the instruction depends on the nature of the electricity, in other words, the time to execute an IO can execute about 4.5 million instructions, the database at every turn 1.001 billion or even tens data, each time 9 milliseconds, is obviously a disaster. Is the computer hardware delay comparison chart, for your reference:

Considering that disk IO is a very expensive operation, the computer operating system does some optimization, when an IO, not only the current disk address data, but also the adjacent data are read into the memory buffer , because the local pre-reading principle tells us that when the computer access to the data of an address, The data adjacent to it will also be accessed quickly. Each IO reads the data we call a page. The specific page of how big the data is related to the operating system, generally 4k or 8k, that is, when we read the data in a page, actually occurred once io, this theory is very helpful for the data structure design of the index.

Three-indexed data structures

The basic principle of the index, the complexity of the database, and the relevant knowledge of the operating system, the purpose is to let everyone understand that any kind of data structure is not produced in a vacuum, there will be its background and use of the scene, we now summarize, we need this data structure can do what, in fact, very simple, That is: The number of disk IO is controlled at a very small order of magnitude each time the data is found, preferably a constant order of magnitude. Then we think if a highly controllable multi-path search tree can meet the needs? In this way, the B + Tree was born.

For example, is a B + tree, the definition of B + tree can be seen in the B + tree, here is only a few points, the light blue block we call a disk block, you can see each disk block contains several data items (dark blue) and pointers (shown in yellow), such as disk Block 1 contains data items 17 and 35, including pointers P1, P3,P1 represents a disk block that is less than 17, P2 represents a disk block between 17 and 35, and P3 represents a disk block greater than 35. Real data exists at leaf nodes 3, 5, 9, 10, 13, 15, 28, 29, 36, 60, 75, 79, 90, 99. Non-leaf nodes do not store real data, only data items that guide the direction of the search, such as 17 and 35, do not exist in the data table.

# # #b + Tree discovery process
, if you want to find the data item 29, then the disk Block 1 is loaded into memory by disk, at this time Io, in memory with a binary lookup to determine 29 between 17 and 35, locking disk Block 1 P2 pointer, memory time because of very short (compared to the disk IO) can be negligible, Disk Block 1 through disk address of the P2 pointer to the disk block 3 is loaded into memory, the second io,29 between 26 and 30, locking disk block 3 of the P2 pointer, loading disk blocks 8 through the pointer to memory, a third Io, while in-memory binary find found 29, the end of the query, a total of three IO. The real situation is, the 3-tier B + tree can represent millions of data, if millions of data to find only three Io, the performance will be huge, if there is no index, each data item will occur once IO, then a total of millions of Io, it is obviously very expensive.

# # #b + Tree Nature
1. The index field should be as small as possible : through the above analysis, we know that the number of IO depends on the height of the B + number of H, assuming that the current data table data is N, the number of data items per disk block is M, there is H=㏒ (m+1) n, when the amount of data n is certain, m larger, h smaller = Size of disk block/data item size, disk block size is a data page size, is fixed, if the data items occupy less space, the more data items, the lower the height of the tree. This is why each data item, the index field, is as small as possible, such as an int accounting for 4 bytes, which is less than half the bigint8 byte. This is why the B + tree requires the real data to be placed on the leaf node instead of the inner node, and once placed in the inner node, the data items of the disk block will be greatly reduced, resulting in a higher tree. When the data item equals 1 o'clock, it will degenerate into a linear table.
2. The leftmost matching feature of the index : when the data items of the B + tree are composite data structures, such as (Name,age,sex), the B + number is set to the search tree in order from left to right, such as when the data (Zhang San, 20,f) is retrieved, B + The tree will first compare the name to determine the direction of the next search, if name is the same, then compare age and sex, and finally get the retrieved data, but when (20,f) such a data without name, B + Tree does not know which node to check next, Because name is the first comparison factor when creating a search tree, you must first search by name to know where to look next. For example, when (Zhang San, F) such data to retrieve, B + tree can use name to specify the direction of the search, but the next field of age is missing, so only the name equal to Zhang San data are found, and then match the gender is the data of F, this is very important property, that is, the index of the leftmost matching characteristics.

three MySQL index management

a function

#1. The function of indexing is to speed up finding # #. Primary Key,unique in MySQL, the union is the only index, these indexes, in addition to accelerating the search, there are constraints of the function

Two common indexes of MySQL

Normal index: Fast Find Unique index:    -PRIMARY key index PRIMARY key: Accelerated Find + constraint (not empty, cannot be duplicated)    -Unique index unique: Accelerated find + constraint (cannot be duplicated) Federated index:    -primary key ( Id,name): Federated primary Key index    -unique (id,name): Federated Unique index    -index (id,name): Federated Common Index

Application scenarios for each index

For example, for example, you are making a membership card system for a mall. This system has a membership table with the following fields: Member number int member name varchar (10) member ID number varchar (18) member phone varchar (10) member address varchar (50) member note Information text then this membership number, as the primary key, Use primary member name if you want to build an index, then is the ordinary index member ID number if you want to build index, then you can choose unique (unique, not allowed to repeat) #除此之外还有全文索引, that is, Fulltext member notes information, if necessary to build an index , you can select Full-text search. Used to search for a long article, the effect is the best. Used in relatively short text, if the one or two lines of the word, the normal INDEX can also. But in fact for full-text search, we do not use the MySQL comes with this index, but will choose third-party software such as Sphinx, specifically to do full-text search. #其他的如空间索引SPATIAL, understanding, almost no

Two types of hash and btree of three indexes

#我们可以在创建上述索引的时候, for which the index type is specified, the index of two types of hash: Query single fast, range query slow btree type index: B + Tree, the more layers, the amount of data exponential growth (we use it, because InnoDB support it by default) # Different storage engines support the same type of index InnoDB support transactions, support row-level locking, Support B-tree, Full-text and other indexes, do not support Hash index; MyISAM does not support transactions, table-level locking, Support B-tree, Full-text and other indexes , does not support hash index; Memory does not support transactions, support table-level locking, Support B-tree, Hash and other indexes, do not support Full-text index; NDB support transactions, support row-level locking, Support Hash index, not support B-tree, Full-text and other indexes Archive does not support transactions, supports table-level locking, does not support B-tree, Hash, Full-text and other indexes;        

Four syntax for creating/deleting indexes

#方法一: Create table table name when creating Tables    (                field name 1  data type [integrity constraint ...],                field name 2  data type [integrity constraint ...],                [UNIQUE | Fulltext | SPATIAL]   INDEX | KEY                [index name]  (field name [(length)]  [ASC | DESC])                 ; #方法二: Create creates an index on an existing table creation  [UNIQUE | Fulltext | SPATIAL]  index  name on                      table name (field name [(length)]  [ASC | DESC]); #方法三: ALTER TABLE creates an index on an existing table        ALTER TABLE name ADD  [UNIQUE | Fulltext | SPATIAL] Index                             name (field name [(length)]  [ASC | DESC]);                             #删除索引: DROP index name on table name;
Four Test index

A preparation

Copy Code # #. Prepare tables CREATE TABLE S1 (ID int,name varchar, gender char (6), email varchar), #2. Create a stored procedure that implements a bulk insert record delimiter $$ #声明存储过程的结束符号为 $ $create Procedure Auto_insert1 () BEGIN    declare i int default 1;    while (i<3000000) does        insert into S1 values (I, ' Egon ', ' Male ', concat (' Egon ', I, ' @oldboy '));        Set i=i+1;    End while; end$$ #$$ end delimiter; #重新声明分号为结束符号. View stored procedures Show CREATE PROCEDURE Auto_insert1\g #4. Calling the stored procedure call AUTO_INSERT1 (); Copying code

Second, test the query speed without indexing

#无索引: MySQL simply does not know whether there is a record ID equals 333333333, can only scan the data table from start to finish, at this time how many disk blocks need to do how much IO operation, so the query speed is very slow mysql> select * from S1 where id=333333333; Empty Set (0.33 sec)

Third, if there is a large amount of data in the table, indexing a field segment will be slow to build

Four when the index is established, the query speed is improved when the field is a query condition


1. mysql first go to the index table according to the search principle of B + tree quickly search for the ID equals 333333333 record does not exist, Io greatly reduced, thus the speed significantly increased

2. We can go to the MySQL data directory to find the table, you can see more hard disk space occupied

3. It is important to note that

Five summary

#1. You must create an index for the field of the search criteria, such as SELECT * from S1 where id = 333; you need to index # # for the ID. In the case of a large number of data in the table, the index will be very slow, and take up hard disk space, after the completion of the query speed, such as CREATE INDEX IDX on S1 (ID), the table will be scanned all the data, and then the ID of the data item, create an index structure, stored in the table of the hard disk. After the construction, the query will be very fast. #3. It is important to note that the index of the InnoDB table is stored in the S1.ibd file, and the index of the MyISAM table has a separate index file table1.myi
Five correct use index

An index miss

It's not that we create an index that will speed up the query, to use the index to achieve the desired effect of increasing query speed, we must follow the following questions when adding an index

1 range issues, or conditions are ambiguous, these symbols or keywords appear in the condition: >, >=, <, <=,! =, Between...and ..., like,

Greater than sign, less than sign

Not Equal! =

Between ...


2 Try to choose a high-differentiated column as the index, the formula for the degree of sensitivity is count (distinct col)/count (*), indicating that the field does not repeat the scale, the larger the ratio of the number of records we scan, the difference between the unique key is 1, and some states, The gender field may be 0 in front of big data, and one might ask, what is the empirical value of this ratio? Using different scenarios, this value is also difficult to determine, generally need to join the field we are required to be more than 0.1, that is, the average 1 scan 10 records

#先把表中的索引都删除, let's concentrate on the question of the degree of distinction.
#先把表中的索引都删除, let's concentrate on the question of the degree of Distinction mysql> desc s1;+--------+-------------+------+-----+---------+-------+| Field | Type | Null | Key | Default | Extra |+--------+-------------+------+-----+---------+-------+| ID | Int (11) | YES | MUL |       NULL | || name | varchar (20) |     YES | |       NULL | || Gender | CHAR (5) |     YES | |       NULL | || email | varchar (50) | YES | MUL |       NULL | |+--------+-------------+------+-----+---------+-------+4 rows in Set (0.00 sec) mysql> Drop index A on S1; Query OK, 0 rows affected (0.20 sec) records:0 duplicates:0 warnings:0mysql> Drop index D on S1; Query OK, 0 rows affected (0.18 sec) records:0 duplicates:0 warnings:0mysql> desc s1;+--------+-------------+------ +-----+---------+-------+| Field | Type | Null | Key | Default | Extra |+--------+-------------+------+-----+---------+-------+| ID | Int (11) |     YES | |       NULL | || name | varchar (20) |     YES | |  NULL |     || Gender | CHAR (5) |     YES | |       NULL | || email | varchar (50) |     YES | |       NULL | |+--------+-------------+------+-----+---------+-------+4 rows in Set (0.00 sec)

Analysis Reason
We write the stored procedure for the Table S1 batch add records, the Name field value is Egon, that is, the name of the field is very low (gender field is the same, we will talk about it later) recall the structure of the B + tree, the speed of the query is inversely proportional to the height of the tree, To control the height of the tree is very low, you need to ensure that the data items in a layer are in accordance from left to right, from small to large order in turn, that is left 1< left 2< left 3< For low-sensitivity fields, the size relationship cannot be found because the values are equal, and there is no doubt that you want to store these equivalent data in a B + tree, only to increase the height of the tree, the lower the field's sensitivity, the higher the height of the tree. In extreme cases, the value of the indexed field is the same, and the B + tree is almost a stick. In this case, in this extreme case, all the values in the Name field are ' Egon ' #现在我们得出一个结论: Indexing a low-sensitivity field, the height of the index tree is high, but what effect does it have??? #1: If the condition is name= ' xxxx ', then it is certainly possible to determine the first time ' xxxx ' is not in the index tree (because all the values in the tree are ' Egon '), so the query speed is very fast: if the condition is exactly name= ' Egon ', when queried, We can never get a definite range from somewhere in the tree, only down, looking down, looking down ... This is not much different from the IO count of the full table scan, so it is very slow

3 = and in can be disorderly, such as a = 1 and B = 2 and c = 3 build (a,b,c) index can be arbitrary order, MySQL query optimizer will help you to optimize the form of the index can be recognized

4 The index column cannot participate in the calculation, keep the column "clean", such as from_unixtime (create_time) = ' 2014-05-29 ' can not be used to the index, the reason is very simple, B + tree is stored in the Data table field values, but when the retrieval, You need to apply all the elements to the function to compare, obviously the cost is too large. So the statement should be written create_time = Unix_timestamp (' 2014-05-29 ')

5 and

Condition 1 and Condition 2: In the case of condition 1 is not established, no longer to determine the condition 2, at this time if the condition 1 of the field has an index, and the condition 2 is not, then the query speed is still fast

In cases where the left condition is true but the index field is low-sensitivity (both name and gender are the case), a high-sensitivity index field is found in turn, speeding up the query

After analysis, in the condition of Name= ' Egon ' and gender= ' male ' and id>333 and email= ' xxx ', we have absolutely no need to index the fields of the first three conditions because only the index of the email field can be used. The index of the first three fields will reduce our query efficiency.

6 The leftmost prefix matching principle, very important principle, for the composite index MySQL will always match right until it encounters a range query (>, <, between, like) to stop matching, such as a = 1 and B = 2 and C > 3 and D = 4 if established (a , b,c,d) The index of the order, D is not indexed, if the establishment (A,B,D,C) of the index can be used, a,b,d order can be arbitrarily adjusted.

7 Other conditions

    SELECT * from tb1 where reverse (email) = ' Wupeiqi ';            -or select * from tb1 where nid = 1 or name = ' [email protected] ';            Special: When the OR condition has an unindexed columns invalidation, the following will go through the index select * from tb1 where nid = 1 or name = ' seven '; SELECT * from tb1 where nid = 1 or name = ' [email protected] ' and email = ' Alex '-inconsistent type if    The column is a string type, and the incoming condition must be enclosed in quotation marks, otherwise ... select * from tb1 where email = 999; Normal index does not equal not go index-! = SELECT * from tb1 where email! = ' Alex ' Special: If it is a primary key, it will still go index select * from TB1 where nid! = 123-> select * from tb1 where email > ' Alex ' Special: If the primary key or index is an integer type, then the index select * from TB1 whe    Re nid > 123 select * from TB1 where num > 123 #排序条件为索引, the Select field must also be an indexed field, otherwise it cannot be hit-order by    Select name from S1 order by email desc;    When sorting by index, if the field of the select query is not an index, do not go to index Select email from s1 order by email desc; Special: If the primary key is sorted, then the index: SELECT * from Tb1 ORDER by nid desc; -Combined index leftmost prefix if the combined index is: (name,email) name and email--Use index name--use index email --Do not use index-count (1) or count (column) instead of COUNT (*) There is no difference in MySQL-create index xxxx on TB (title) #text类型, you must set the length

Other precautions

-Avoid using SELECT *-count (1) or count (column) instead of COUNT (*)-CREATE table when possible char instead of varchar-table field order fixed Length field precedence-composite index instead of multiple single-column indexes (when multiple conditional queries are used frequently)-use short cables as much as possible Citation-use Join to replace subquery (sub-queries)-Liangui when the condition type needs to be consistent-index hash value (less repetition) does not fit index, example: gender inappropriate

Three-overlay index and index merging

#覆盖索引:    -Get data directly in the index file分析select * from S1 where id=123; The SQL hit the index, but the index was not overwritten. Use the id=123 to index the data structure to locate the ID on the hard disk, or the location in the data table. But the field of our select is *, other fields are required in addition to the ID, which means that we are not enough to get the ID through the index structure, and we need to use that ID to find the other field values of the row where the ID is located, it takes time, obviously, if we just select the ID, Minus this distress, the following select ID from S1 where id=123; this is the overlay index, hit the index, and the data structure from the index is directly taken to the ID on the hard disk address, fast

#索引合并: Combine multiple single-column indexes with # Analysis: the things that the composite index can do, we can use the index merge to solve, such as CREATE index NE on S1 (name,email); Combined index We can create indexes for name and email individually. Index combination index can be hit: select * from S1 where name= ' Egon '; select * from S1 where name= ' Egon ' and email= ' ADF ' The index merge can be hit: select * from S1 where name= ' Egon '; select * from S1 where email= ' ADF '; select * from S1 where name= ' Egon ' and Em Ail= ' ADF '; At first glance it looks as if the index merge better: can hit more cases, but in fact, to see the situation, if it is name= ' Egon ' and email= ' ADF ', then the efficiency of the composite index is higher than the index merge, if it is a single condition check, then it is more reasonable to use the index merge
Six search optimization artifact-explain

About explain command believe everyone is not unfamiliar, specific usage and field meaning can refer to official website Explain-output, here need to emphasize rows is the core indicator, most of the rows small statement execution must be very fast (with exception, as described below). So the optimization statements are basically optimizing rows.

Execution plan: Let MySQL estimate perform action (generally correct) all    < index < range < Index_merge < Ref_or_null < ref < Eq_ref < SYSTEM/C Onst    id,email        Slow:        select * from Userinfo3 where name= ' Alex '                explain select * from Userinfo3 where Name= ' al Ex '        type:all (full table Scan)            SELECT * from Userinfo3 limit 1;    Fast:        select * from Userinfo3 where email= ' Alex '        type:const (walk Index)

Seven basic steps for slow query optimization
0. Run first to see if it is really slow, pay attention to setting the Sql_no_cache1.where condition sheet, and lock the minimum return record table. This sentence means to apply the where of the query to the table the smallest number of records returned from the table began to look up, a single table each field query, to see which field of the highest degree of sensitivity 2.explain View execution plan, is consistent with 1 expected (from a table with fewer locked records) 3.order by The SQL statement in the limit form gives priority to the sorted table 4. Understanding Business Party Usage Scenarios 5. Index-indexed reference to several principles 6. Observation results, non-conforming to the expected continuation from 0 analysis
Eight Slow log management
        Slow log            -Execution time >            -Miss Index            -log file path                    configuration:            -memory                show variables like '%query% ';                Show variables like '%queries% ';                Set global variable name = value            -config file                mysqld--defaults-file= ' E:\wupeiqi\mysql-5.7.16-winx64\mysql-5.7.16-winx64\ My-default.ini '                                my.conf content:                    slow_query_log = on                    slow_query_log_file = d:/....                                    Note: After you modify the configuration file, you need to restart the service
Slow Log Management
MySQL Log management ======================================================== error log: Log MySQL server startup, shutdown and run error information binary log: Also known as Binlog log, Record the operation query log in the database except SELECT in a binary file: Log query Information slow query log: Record execution time more than a specified time of operation relay log: The repository copies the primary repository's binary logs to its own trunk log, which is used to replay the common log locally: Which account is audited, In which time period, what event transaction logs or redo logs were made: Record InnoDB transactions related such as transaction execution time, checkpoint, etc. ======================================================== one, Bin-log1. Enable # vim/etc/my.cnf[mysqld]log-bin[=dir\[filename]]# service mysqld Restart2. Pause//Only current session set sql_log_bin=0; SET sql_log_bin=1;3. View all: # Mysqlbinlog mysql.000002 by Time: # Mysqlbinlog mysql.000002--start-datetime= "2012-12-05 10:02:56" # Mysqlbinlog mysql.000002--stop-datetime= "2012-12-05 11:02:54" # mysqlbinlog mysql.000002--start-datetime= "2012-12-05 10:02:56"- -stop-datetime= "2012-12-05 11:02:54" by bytes: # mysqlbinlog mysql.000002--start-position=260# mysqlbinlog mysql.000002-- stop-position=260# mysqlbinlog mysql.000002--start-position=260--stop-position=9304. Truncate Bin-log (produces a new Bin-log file) a. Restart MySQL Server B. # mysql-uroot-p123-e ' Flush logs ' 5. Delete Bin-logFile # mysql-uroot-p123-e ' Reset master ' II, query log enable universal Query Log # vim/etc/my.cnf[mysqld]log[=dir\[filename]]# service mysqld Resta RT three, slow query log enable slow query log # vim/etc/my.cnf[mysqld]log-slow-queries[=dir\[filename]]long_query_time=n# service mysqld Restartmysql 5.6:slow-query-log=1slow-query-log-file=slow.loglong_query_time=3 View slow query log test: BENCHMARK (count,expr) SELECT BENCHMARK (50000000,2*3);
Nine Reference Blogs


Http:// Cbdk2wd2vtzlns2upkst3of4odqoluq2rqpomk8ap12rdnxbnns6gby8dxvvwmo9bmxjwgs_vkhyus22ghazyues

MySQL: Indexing principle and slow query optimization

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.