International - English

Cart Console

Topic Center

Contact Sales

Home > Developer > MySQL

MySQL indexing principle and query optimization

Last Update:2018-08-12 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, Introduction

1. What is an index?

General application system, reading and writing ratio of about 10:1, and the insertion operation and the general update operation rarely appear performance problems, in the production environment, we encounter the most, is also the most prone to problems, or some complex query operations, so the optimization of query statements is obviously the most serious. Speaking of accelerating queries, you have to mention the index.

2. Why should I have an index?

The index, also called a "key" in MySQL, is a data structure used by the storage engine to quickly find records. Indexes for good performance
Is critical, especially when the amount of data in the table is increasing, the impact of the index on performance becomes increasingly important.
Index optimization should be the most effective means of optimizing query performance. Indexes can easily improve query performance by several orders of magnitude.
The index is equivalent to the dictionary's Sequencer list, if you want to check a word, if you do not use a sequencer, you need to check from page hundreds of.

Second, the principle of the index

One index principle

The purpose of the index is to improve the efficiency of the query, which is the same as the directory we use to look up books: First locate the chapter, then navigate to a section under that chapter, and then find the number of pages. Similar examples include: Look up a dictionary, check train trips, plane flights, etc.

Essentially, by narrowing down the range of data you want to get to the final desired result, and turning random events into sequential events, that is, with this indexing mechanism, we can always use the same search method to lock the data.

The database is the same, but obviously more complex, because not only face the equivalent query, there are scope queries (>, <, between, in), Fuzzy query (like), the set query (or) and so on. How should the database choose the way to deal with all the problems? We recall the example of the dictionary, can we divide the data into segments and then query it in segments? The simplest if 1000 data, 1 to 100 is divided into the first paragraph, 101 to 200 is divided into the second paragraph, 201 to 300 is divided into the third paragraph ... This check No. 250 data, as long as the third paragraph can be, all of a sudden to remove 90% of invalid data. But what if it's a 10 million record and it's better to be divided into sections? A little algorithm based on the students will think of the search tree, its average complexity is LGN, with good query performance. But here we overlook a key problem, and the complexity model is based on the same cost per operation. The database implementation is more complex, on the one hand, the data is saved on disk, on the other hand, in order to improve performance, each time you can read some of the data into memory to calculate, because we know that the cost of accessing the disk is about 100,000 times times the amount of access to memory, so simple search tree difficult to meet the complex application scenario.

two disk IO and pre-read

Considering that disk IO is a very expensive operation, the computer operating system does some optimization, when an IO, not only the current disk address data, but also the adjacent data are read into the memory buffer , because the local pre-reading principle tells us that when the computer access to the data of an address, The data adjacent to it will also be accessed quickly. Each IO reads the data we call a page. The specific page of how big the data is related to the operating system, generally 4k or 8k, that is, when we read the data in a page, actually occurred once io, this theory is very helpful for the data structure design of the index.

Third, the data structure of the index

Any kind of data structure is not produced in a vacuum, there will be its background and use of the scene, we now summarize, we need this data structure can do something, in fact, it is very simple, that is: every time you look for data to control the number of disk IO in a very small order of magnitude, preferably a constant order of magnitude. Then we think if a highly controllable multi-path search tree can meet the needs? In this way, the B + Tree was born.

For example, is a B + tree, the definition of B + tree can be seen in the B + tree, here is only a few points, the light blue block we call a disk block, you can see each disk block contains several data items (dark blue) and pointers (shown in yellow), such as disk Block 1 contains data items 17 and 35, including pointers P1, P3,P1 represents a disk block that is less than 17, P2 represents a disk block between 17 and 35, and P3 represents a disk block greater than 35. Real data exists at leaf nodes 3, 5, 9, 10, 13, 15, 28, 29, 36, 60, 75, 79, 90, 99. Non-leaf nodes do not store real data, only data items that guide the direction of the search, such as 17 and 35, do not exist in the data table.

# # #b + Tree discovery process
, if you want to find the data item 29, then the disk Block 1 is loaded into memory by disk, at this time Io, in memory with a binary lookup to determine 29 between 17 and 35, locking disk Block 1 P2 pointer, memory time because of very short (compared to the disk IO) can be negligible, Disk Block 1 through disk address of the P2 pointer to the disk block 3 is loaded into memory, the second io,29 between 26 and 30, locking disk block 3 of the P2 pointer, loading disk blocks 8 through the pointer to memory, a third Io, while in-memory binary find found 29, the end of the query, a total of three IO. The real situation is, the 3-tier B + tree can represent millions of data, if millions of data to find only three Io, the performance will be huge, if there is no index, each data item will occur once IO, then a total of millions of Io, it is obviously very expensive.

# # #b + Tree Nature
1. The index field should be as small as possible : through the above analysis, we know that the number of IO depends on the height of the B + number of H, assuming that the current data table data is N, the number of data items per disk block is M, there is H=㏒ (m+1) n, when the amount of data n is certain, m larger, h smaller = Size of disk block/data item size, disk block size is a data page size, is fixed, if the data items occupy less space, the more data items, the lower the height of the tree. This is why each data item, the index field, is as small as possible, such as an int accounting for 4 bytes, which is less than half the bigint8 byte. This is why the B + tree requires the real data to be placed on the leaf node instead of the inner node, and once placed in the inner node, the data items of the disk block will be greatly reduced, resulting in a higher tree. When the data item equals 1 o'clock, it will degenerate into a linear table.
2. The leftmost matching attribute of the index (that is, from left to right): When the data item of the B + tree is a composite data structure, such as (Name,age,sex), the B + number is set to the search tree in order from left to right, for example (Zhang San, 20,f) When such data is retrieved, the B + tree will prefer to compare the name to determine the direction of the next search, if name is the same, then compare age and sex, and finally get the retrieved data, but when the (20,F) does not have name data, B + tree does not know which node to check next, Because name is the first comparison factor when creating a search tree, you must first search by name to know where to look next. For example, when (Zhang San, F) such data to retrieve, B + tree can use name to specify the direction of the search, but the next field of age is missing, so only the name equal to Zhang San data are found, and then match the gender is the data of F, this is very important property, that is, the index of the leftmost matching characteristics.

Iv. MySQL index management

First, the function

#1. The function of indexing is to speed up finding # #. Primary Key,unique in MySQL, the union is the only index, these indexes, in addition to accelerating the search, there are constraints of the function

Second, MySQL's index classification

Index classification 1. Normal index: Speed up find 2. Unique index    PRIMARY key index: PRIMARY key: Accelerated find + constraint (not empty and unique)    unique index: Unique: Accelerated find + constraint (unique) 3. Federated Index    -primary Key (Id,name): Federated primary Key index    -unique (id,name): Federated Unique index    -index (id,name): Federated Common Index 4. Full-text index fulltext: used to search for a long article, the best effect. 5. Spatial indexes Spatial: Well, it's almost no

1 For example, you are making a membership card system for a mall. 2  3 This system has a membership table 4 has the following fields: 5 Member ID INT 6 member name varchar (10) 7 Member ID number varchar (18) 8 member phone varchar (10) 9 member address varchar (50) 10 members Notes TEXT11 12 then this membership number, as the primary key, the use of PRIMARY13 member name if you want to build an index, then is the ordinary INDEX14 member ID number if you want to build an index, then you can choose unique (unique, not allowed to repeat) 15 16 # In addition to the full-text index, that is, FULLTEXT17 member notes information, if you need to build an index, you can choose Full-text search. 18 works best when searching for a long story. 19 used in relatively short text, if the one or two lines of words, the normal INDEX can also. 20 but in fact for full-text search, we do not use the MySQL comes with this index, but will choose third-party software such as Sphinx, specifically to do full-text search. #其他的如空间索引SPATIAL, understand, almost no

The two types of hash and btree of index

#我们可以在创建上述索引的时候, for which the index type is specified, the index of two types of hash: Query single fast, range query slow btree type index: B + Tree, the more layers, the amount of data exponential growth (we use it, because InnoDB support it by default) # Different storage engines support the same type of index InnoDB support transactions, support row-level locking, Support B-tree, Full-text and other indexes, do not support Hash index; MyISAM does not support transactions, table-level locking, Support B-tree, Full-text and other indexes , does not support hash index; Memory does not support transactions, support table-level locking, Support B-tree, Hash and other indexes, do not support Full-text index; NDB support transactions, support row-level locking, Support Hash index, not support B-tree, Full-text and other indexes Archive does not support transactions, supports table-level locking, does not support B-tree, Hash, Full-text and other indexes;

Iv. syntax for creating/deleting indexes

1 #方法一: When creating a table 2 CREATE table     table name (3                 field name 1  data type [integrity constraint ...], 4                 field name 2  data type [integrity constraint ...], 5                 [UNIQUE | Fulltext | SPATIAL]   INDEX | KEY 6                 [index name]  (field name [(length)]  [ASC | DESC])  7                 ); 8  9 #方法二: Create creates an index on an existing table creation  [UNIQUE | Fulltext | SPATIAL]  index  name on                      table name (field name [(length)]  [ASC | DESC]); #方法三: ALTER TABLE creates an index on an existing table the         ALTER TABLE name ADD  [UNIQUE | Fulltext | SPATIAL] INDEX17                              index name (field name [(length)]  [ASC | DESC]);                              #删除索引: DROP index name on table name;

Use Help Createhelp create index==================1. Creating an index    -creating a table (points to note) CREATE TABLE    s1 (    ID int, # You can add primary key    #id int index #不可以这样加索引, because index is just an index, without a constraint,    #不能像主键, as well as a unique constraint, index the    name char (20) When defining the field, Age    int,    email varchar (    #primary key (ID) #也可以在这加    index (ID) #可以这样加    );    -Create index name on S1 after creating the table    ; #添加普通索引 create unique Age on    S1; add unique index    ALTER TABLE s1 add PRI Mary Key (ID); The #添加住建索引, which is to add a primary KEY constraint to the ID field,    CREATE index name on S1 (id,name); #添加普通联合索引2. Delete the index    drop on S1;    Drop index name on S1; #删除普通索引    Drop index Age on S1; #删除唯一索引, just like a normal index, you can delete the    ALTER TABLE S1 drop primary key without adding a unique to index. #删除主 Key (because it is added by the alter to add, then we also use the alter to delete)

Help View

V. Test index

1. Preparation

#1. Prepare tables CREATE TABLE S1 (ID int,name varchar, gender char (6), email varchar), #2. Create a stored procedure that implements a bulk insert record delimiter $$ #声明存储过程的结束符号为 $ $create Procedure Auto_insert1 () BEGIN    declare i int default 1;    while (i<3000000) does        insert into S1 values (I,concat (' Egon ', i), ' Male ', concat (' Egon ', I, ' @oldboy '));        Set i=i+1;    End while; end$$ #$$ end delimiter; #重新声明分号为结束符号. View stored procedures Show CREATE PROCEDURE Auto_insert1\g #4. Calling the stored procedure call Auto_insert1 ();

2. Test the query speed without indexing

#无索引: Scan from beginning to end, so the query is slow mysql> select * from S1 where id=333;+------+---------+--------+----------------+| ID |   name    | gender | Email          |+------+---------+--------+----------------+|  333 | egon333 | Male   | [Email protected] | |  333 | egon333 | F      | [Email protected] | |  333 | egon333 | F      | [Email protected] |+------+---------+--------+----------------+rows in Set (0.32 sec) mysql> select * from S1 where Emai l= ' [email protected] ';. Rows in Set (0.36 sec)

3, plus index

#1. You must create an index for the fields of the search criteria, such as SELECT * from T1 where > 5; In the case of a large number of data in the table, the index will be very slow, and take up hard disk space, insert Delete Update is very slow, only query fast such as CREATE INDEX IDX on S1 (ID), scan all the data in the table, and then the ID as the data item, create the index structure, stored in the table of the hard disk. After the construction, the query will be very quick. It is important to note that the index of the InnoDB table is stored in the S1.ibd file, and the index of the MyISAM table has a separate index file table1.myi

Vi. correct use of the index

First, the coverage index

#分析select * from S1 where id=123; The SQL hit the index, but the index is not overwritten. Use the id=123 to index the data structure to locate the ID on the hard disk, or the location in the data table. But the field of our select is *, other fields are required in addition to the ID, which means that we are not enough to get the ID through the index structure, and we need to use that ID to find the other field values of the row where the ID is located, it takes time, obviously, if we just select the ID, Minus this distress, the following select ID from S1 where id=123; this is the overlay index, hit the index, and the data structure from the index is directly taken to the ID on the hard disk address, fast

Second, the joint index

Third, index merging

#索引合并: Combine multiple single-column indexes with # Analysis: the things that the composite index can do, we can use the index merge to solve, such as CREATE index NE on S1 (name,email); Combined index We can create indexes for name and email individually. Index combination index can be hit: select * from S1 where name= ' Egon '; select * from S1 where name= ' Egon ' and email= ' ADF ' The index merge can be hit: select * from S1 where name= ' Egon '; select * from S1 where email= ' ADF '; select * from S1 where name= ' Egon ' and Em Ail= ' ADF '; At first glance it looks as if the index merge better: can hit more cases, but in fact, to see the situation, if it is name= ' Egon ' and email= ' ADF ', then the efficiency of the composite index is higher than the index merge, if it is a single condition check, then it is more reasonable to use the index merge

If you want to use an index to achieve the desired effect of increasing query speed, we must follow the following guidelines when adding an index

#1. Leftmost prefix matching principle, very important principle, create index ix_name_email on S1 (Name,email,)-leftmost prefix match: must match select * from S1 where Name= ' Egon in order from left to right ‘; #可以select * from S1 where name= ' Egon ' and email= ' asdf '; #可以select * from S1 where email= ' [email protected] '; #不可以mysql会一直向右匹配直到遇到范围查询 (>, <, between, like) stop matching, such as a = 1 and B = 2 and C > 3 and D = 4 If the index of the (A,B,C,D) order is established, D is not indexed , if the index of the Establishment (A,B,D,C) can be used, the order of a,b,d can be arbitrarily adjusted. #2. = And in can be disorderly, such as a = 1 and B = 2 and c = 3 build (a,b,c) indexes can be in any order, the MySQL query optimizer will help you to optimize the index can be identified in the form of # #. Try to choose a high-sensitivity column as the index, and the formula for the sensitivity is count (Distin CT col)/count (*), indicating the scale of the field is not repeated, the greater the proportion of the number of records we scan, the difference between the unique key is 1, and some states, sex fields may be in front of the big data to differentiate the degree is 0, then some people may ask, what is the ratio of experience value? Using different scenarios, this value is also difficult to determine, the general need to join the field we are required to be more than 0.1, that is, the average 1 scan 10 Records # #. The index column cannot participate in the calculation, keep the column "clean", such as from_unixtime (create_time) = ' 2014-05-29 ' can not be used to the index, the reason is very simple, B + tree is stored in the Data table field values, but for retrieval, all elements need to be applied to the function to compare, obviously the cost is too large. So the statement should be written create_time = Unix_timestamp (' 2014-05-29 ');

Leftmost prefix demonstration

Mysql> SELECT * from S1 where id>3 and Name= ' Egon ' and email= ' [email protected] ' and gender= ' male '; Empty Set (0.39 sec) mysql> CREATE index idx on S1 (Id,name,email,gender); #未遵循最左前缀Query OK, 0 rows affected (15.27 sec) records:0  duplicates:0  warnings:0mysql> select * from S1 where Id>3 and Name= ' Egon ' and email= ' [email protected]dboy.com ' and gender= ' male '; Empty Set (0.43 sec) mysql> Drop index idx on S1; Query OK, 0 rows affected (0.16 sec) records:0  duplicates:0  warnings:0mysql> CREATE index idx on S1 (name,emai L,GENDER,ID); #遵循最左前缀Query OK, 0 rows affected (15.97 sec) records:0  duplicates:0  warnings:0mysql> select * from S1 where I D>3 and Name= ' Egon ' and email= ' [email protected] ' and gender= ' male '; Empty Set (0.03 sec)

1 6. The leftmost prefix matches 2 index (id,age,email,name) 3 #条件中一定要出现id (as long as the ID will increase speed) 4 ID 5 ID age 6 ID email 7 ID name 8 9 email #不行 If this is the beginning of the individual Can't lift the speed. Mysql> Select COUNT (*) from S1 where id=3000;11 +----------+12 |        COUNT (*) |13 +----------+14 | 1 |15 +----------+16 1 row in Set (0.11 sec) for mysql> create index XXX on S1 (id,name,age,email); Query OK, 0 rows Affected (6.44 sec) records:0 duplicates:0 warnings:021 a mysql> Select count (*) from S1 where id=3000;23 +--- -------+24 |        COUNT (*) |25 +----------+26 |  1 |27 +----------+28 1 row in Set (0.00 sec) for Mysql> select COUNT (*) from S1 where Name= ' Egon '; +----------+32 |   COUNT (*) |33 +----------+34 | 299999 |35 +----------+36 1 row in Set (0.16 sec) Notoginseng mysql> Select COUNT (*) from S1 where email= ' [Email protect Ed] '; +----------+40 |        COUNT (*) |41 +----------+42 | 1 |43 +----------+44 1 row in Set (0.15 sec) mysql> Select COUNT (*) from S1 where id=1000 and Email= ' [email  ProTected] '; +----------+48 |        COUNT (*) |49 +----------+50 | 0 |51 +----------+52 1 row in Set (0.00 sec), mysql> Select COUNT (*) from S1 where email= ' [email protected] ' and id=3000;55 +----------+56 |        COUNT (*) |57 +----------+58 | 0 |59 +----------+60 1 row in Set (0.00 sec)

You need to be aware of a situation where Index misses:

-Like '%xx ' select * from tb1 where email like '%CN ';        -Use function select * from TB1 where reverse (email) = ' Wupeiqi ';            -or select * from tb1 where nid = 1 or name = ' [email protected] ';            Special: When the OR condition has an unindexed columns invalidation, the following will go through the index select * from tb1 where nid = 1 or name = ' seven '; SELECT * from tb1 where nid = 1 or name = ' [email protected] ' and email = ' Alex '-inconsistent type if    The column is a string type, and the incoming condition must be enclosed in quotation marks, otherwise ... select * from tb1 where email = 999; Normal index does not equal not go index-! = SELECT * from tb1 where email! = ' Alex ' Special: If it is a primary key, it will still go index select * from TB1 where nid! = 123-> select * from tb1 where email > ' Alex ' Special: If the primary key or index is an integer type, then the index select * from TB1 whe    Re nid > 123 select * from TB1 where num > 123 #排序条件为索引, the Select field must also be an indexed field, otherwise it cannot be hit-order by    Select name from S1 order by email desc; If the field of the select query is not indexed when sorted by index, the index select email from s1 ORDER by email DEsc Special: If the primary key is sorted, then the index: SELECT * from Tb1 ORDER by nid desc;                -Combined index leftmost prefix if the combined index is: (name,email) name and email--Use index name--use index email --Do not use index-count (1) or count (column) instead of COUNT (*) There is no difference in MySQL-create index xxxx on TB (title) #text类型, you must set the length

-Avoid using SELECT *-count (1) or count (column) instead of COUNT (*)-CREATE table when possible char instead of varchar-table field order fixed Length field precedence-composite index instead of multiple single-column indexes (when multiple conditional queries are used frequently)-use short cables as much as possible Citation-use Join to replace subquery (sub-queries)-Liangui when the condition type needs to be consistent-index hash value (less repetition) does not fit index, example: gender inappropriate

Vii. basic steps for slow query optimization

0. Run first to see if it is really slow, pay attention to setting the Sql_no_cache1.where condition sheet, and lock the minimum return record table. This sentence means to apply the where of the query to the table the smallest number of records returned from the table began to look up, a single table each field query, to see which field of the highest degree of sensitivity 2.explain View execution plan, is consistent with 1 expected (from a table with fewer locked records) 3.order by The SQL statement in the limit form gives priority to the sorted table 4. Understanding Business Party Usage Scenarios 5. Index-indexed reference to several principles 6. Observation results, non-conforming to the expected continuation from 0 analysis

MySQL indexing principle and query optimization

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

query optimization mysql what is query optimization in mysql mysql query and indexing in php mysql sql query optimization tips postgresql query optimization explain query optimization

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

MySQL indexing principle and query optimization

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support