mysql-index principle and slow query optimization

Source: Internet
Author: User
Tags create index mysql query mysql index

An introduction

Why should I have an index?

General application system, reading and writing ratio of about 10:1, and the insertion operation and the general update operation rarely appear performance problems, in the production environment, we encounter the most, is also the most prone to problems, or some complex query operations, so the optimization of query statements is obviously the most serious. Speaking of accelerating queries, you have to mention the index.

What is an index?

The index, also called a "key" in MySQL, is a data structure used by the storage engine to quickly find records. Indexes for good performance
Is critical, especially when the amount of data in the table is increasing, the impact of the index on performance becomes increasingly important.
Index optimization should be the most effective means of optimizing query performance. Indexes can easily improve query performance by several orders of magnitude.
The index is equivalent to the dictionary's Sequencer list, if you want to check a word, if you do not use a sequencer, you need to check from page hundreds of.

                              5-          661    6    each in 100
two principles of indexing

One index principle

The purpose of the index is to improve the efficiency of the query, which is the same as the directory we use to look up books: First locate the chapter, then navigate to a section under that chapter, and then find the number of pages. Similar examples include: Look up a dictionary, check train trips, plane flights, etc.

Essentially, by narrowing down the range of data you want to get to the final desired result, and turning random events into sequential events, that is, with this indexing mechanism, we can always use the same search method to lock the data.

The database is the same, but obviously more complex, because not only face the equivalent query, there are scope queries (>, <, between, in), Fuzzy query (like), the set query (or) and so on. How should the database choose the way to deal with all the problems? We recall the example of the dictionary, can we divide the data into segments and then query it in segments? The simplest if 1000 data, 1 to 100 is divided into the first paragraph, 101 to 200 is divided into the second paragraph, 201 to 300 is divided into the third paragraph ... This check No. 250 data, as long as the third paragraph can be, all of a sudden to remove 90% of invalid data. But what if it's a 10 million record and it's better to be divided into sections? A little algorithm based on the students will think of the search tree, its average complexity is LGN, with good query performance. But here we overlook a key problem, and the complexity model is based on the same cost per operation. The database implementation is more complex, on the one hand, the data is saved on disk, on the other hand, in order to improve performance, each time you can read some of the data into memory to calculate, because we know that the cost of accessing the disk is about 100,000 times times the amount of access to memory, so simple search tree difficult to meet the complex application scenario.

Two-disk IO and pre-read

Before referring to the access disk, then here is a brief introduction of disk IO and pre-reading, disk reading data by the mechanical movement, the time spent on each read data can be divided into the seek time, rotation delay, transmission time three parts, seek time refers to the magnetic arm moved to the specified track time, The main disk is generally below 5ms; rotation delay is what we often hear of disk speed, such as a disk 7200 rpm, that can be rotated 7,200 times per minute, that is, 1 seconds can go 120 times, rotation delay is 1/120/2 = 4.17ms Transmission time refers to the time that reads from disk or writes data to disk, typically in fraction milliseconds, and is negligible relative to the first two times. Then the time to access a disk, that is, a disk IO time is approximately equal to 5+4.17 = 9ms, sounds pretty good, but to know that a 500-mips (Million instructions per Second) machine can execute 500 million instructions per second, Because the instruction depends on the nature of the electricity, in other words, the time to execute an IO can execute about 4.5 million instructions, the database at every turn 1.001 billion or even tens data, each time 9 milliseconds, is obviously a disaster. Is the computer hardware delay comparison chart, for your reference:

Considering that disk IO is a very expensive operation, the computer operating system does some optimization, when an IO, not only the current disk address data, but also the adjacent data are read into the memory buffer , because the local pre-reading principle tells us that when the computer access to the data of an address, The data adjacent to it will also be accessed quickly. Each IO reads the data we call a page. The specific page of how big the data is related to the operating system, generally 4k or 8k, that is, when we read the data in a page, actually occurred once io, this theory is very helpful for the data structure design of the index.

Three-indexed data structures

The basic principle of the index, the complexity of the database, and the relevant knowledge of the operating system, the purpose is to let everyone understand that any kind of data structure is not produced in a vacuum, there will be its background and use of the scene, we now summarize, we need this data structure can do what, in fact, very simple, That is: The number of disk IO is controlled at a very small order of magnitude each time the data is found, preferably a constant order of magnitude. Then we think if a highly controllable multi-path search tree can meet the needs? In this way, the B + Tree was born.

For example, is a B + tree, the definition of B + tree can be seen in the B + tree, here is only a few points, the light blue block we call a disk block, you can see each disk block contains several data items (dark blue) and pointers (shown in yellow), such as disk Block 1 contains data items 17 and 35, including pointers P1, P3,P1 represents a disk block that is less than 17, P2 represents a disk block between 17 and 35, and P3 represents a disk block greater than 35. Real data exists at leaf nodes 3, 5, 9, 10, 13, 15, 28, 29, 36, 60, 75, 79, 90, 99. Non-leaf nodes do not store real data, only data items that guide the direction of the search, such as 17 and 35, do not exist in the data table.

# # #b + Tree discovery process
, if you want to find the data item 29, then the disk Block 1 is loaded into memory by disk, at this time Io, in memory with a binary lookup to determine 29 between 17 and 35, locking disk Block 1 P2 pointer, memory time because of very short (compared to the disk IO) can be negligible, Disk Block 1 through disk address of the P2 pointer to the disk block 3 is loaded into memory, the second io,29 between 26 and 30, locking disk block 3 of the P2 pointer, loading disk blocks 8 through the pointer to memory, a third Io, while in-memory binary find found 29, the end of the query, a total of three IO. The real situation is, the 3-tier B + tree can represent millions of data, if millions of data to find only three Io, the performance will be huge, if there is no index, each data item will occur once IO, then a total of millions of Io, it is obviously very expensive.

# # #b + Tree Nature
1. The index field should be as small as possible : through the above analysis, we know that the number of IO depends on the height of the B + number of H, assuming that the current data table data is N, the number of data items per disk block is M, there is H=㏒ (m+1) n, when the amount of data n is certain, m larger, h smaller = Size of disk block/data item size, disk block size is a data page size, is fixed, if the data items occupy less space, the more data items, the lower the height of the tree. This is why each data item, the index field, is as small as possible, such as an int accounting for 4 bytes, which is less than half the bigint8 byte. This is why the B + tree requires the real data to be placed on the leaf node instead of the inner node, and once placed in the inner node, the data items of the disk block will be greatly reduced, resulting in a higher tree. When the data item equals 1 o'clock, it will degenerate into a linear table.
2. The leftmost matching feature of the index : when the data items of the B + tree are composite data structures, such as (Name,age,sex), the B + number is set to the search tree in order from left to right, such as when the data (Zhang San, 20,f) is retrieved, B + The tree will first compare the name to determine the direction of the next search, if name is the same, then compare age and sex, and finally get the retrieved data, but when (20,f) such a data without name, B + Tree does not know which node to check next, Because name is the first comparison factor when creating a search tree, you must first search by name to know where to look next. For example, when (Zhang San, F) such data to retrieve, B + tree can use name to specify the direction of the search, but the next field of age is missing, so only the name equal to Zhang San data are found, and then match the gender is the data of F, this is very important property, that is, the index of the leftmost matching characteristics.

three MySQL index management

a function

#1. The function of indexing is to speed up finding # #. Primary Key,unique in MySQL, the union is the only index, these indexes, in addition to accelerating the search, there are constraints of the function

Two MySQL index classification

Normal index: Fast Find Unique index:    -PRIMARY key index PRIMARY key: Accelerated Find + constraint (not empty, cannot be duplicated)    -Unique index unique: Accelerated find + constraint (cannot be duplicated) Federated index:    -primary key ( Id,name): Federated primary Key index    -unique (id,name): Federated Unique index    -index (id,name): Federated Common Index

Two types of hash and btree of three indexes

#我们可以在创建上述索引的时候, for which the index type is specified, the index of two types of hash: Query single fast, range query slow btree type index: B + Tree, the more layers, the amount of data exponential growth (we use it, because InnoDB support it by default) # Different storage engines support the same type of index InnoDB support transactions, support row-level locking, Support B-tree, Full-text and other indexes, do not support Hash index; MyISAM does not support transactions, table-level locking, Support B-tree, Full-text and other indexes , does not support hash index; Memory does not support transactions, support table-level locking, Support B-tree, Hash and other indexes, do not support Full-text index; NDB support transactions, support row-level locking, Support Hash index, not support B-tree, Full-text and other indexes Archive does not support transactions, supports table-level locking, does not support B-tree, Hash, Full-text and other indexes;        

Four syntax for creating/deleting indexes

#方法一: Create table table name when creating Tables    (                field name 1  data type [integrity constraint ...],                field name 2  data type [integrity constraint ...],                [UNIQUE | Fulltext | SPATIAL]   INDEX | KEY                [index name]  (field name [(length)]  [ASC | DESC])                 ; #方法二: Create creates an index on an existing table creation  [UNIQUE | Fulltext | SPATIAL]  index  name on                      table name (field name [(length)]  [ASC | DESC]); #方法三: ALTER TABLE creates an index on an existing table        ALTER TABLE name ADD  [UNIQUE | Fulltext | SPATIAL] Index                             name (field name [(length)]  [ASC | DESC]);                             #删除索引: DROP index name on table name;
Four Test index

1 preparation

View Code

2 test query speed without indexing

#无索引: Scan from beginning to end, so the query is slow mysql> select * from S1 where id=333;+------+---------+--------+----------------+| ID |   name    | gender | Email          |+------+---------+--------+----------------+|  333 | egon333 | Male   | [Email protected] | |  333 | egon333 | F      | [Email protected] | |  333 | egon333 | F      | [Email protected] |+------+---------+--------+----------------+3 rows in Set (0.32 sec) mysql> select * from S1 where em ail= ' [email protected] ';. Rows in Set (0.36 sec)

3 Plus Index

#1. You must create an index for the fields of the search criteria, such as SELECT * from T1 where > 5; In the case of a large number of data in the table, the index will be very slow, and take up hard disk space, insert Delete Update is very slow, only query fast such as CREATE INDEX IDX on S1 (ID), scan all the data in the table, and then the ID as the data item, create the index structure, stored in the table of the hard disk. After the construction, the query will be very quick. It is important to note that the index of the InnoDB table is stored in the S1.ibd file, and the index of the MyISAM table has a separate index file table1.myi

PS: We can go to the MySQL data directory to find the table, you can see more hard disk space occupied

Five correct use index

It's not that we created an index that would definitely speed up the query, as follows index misses

Select Sql_no_cache * from S1 where email= ' xxx '; #命中索引, fast Select Sql_no_cache * from S1 where email like '%old% '; #无法使用索引, the speed is still slow

Two-covering index and index merging

#覆盖索引:    -Get data directly in the index file    http://blog.itpub.net/22664653/viewspace-774667/#分析select * from S1 where id=123; The SQL hit the index, but the index was not overwritten. Use the id=123 to index the data structure to locate the ID on the hard disk, or the location in the data table. But the field of our select is *, other fields are required in addition to the ID, which means that we are not enough to get the ID through the index structure, and we need to use that ID to find the other field values of the row where the ID is located, it takes time, obviously, if we just select the ID, Minus this distress, the following select ID from S1 where id=123; this is the overlay index, hit the index, and the data structure from the index is directly taken to the ID on the hard disk address, fast

#索引合并: Combine multiple single-column indexes with # Analysis: the things that the composite index can do, we can use the index merge to solve, such as CREATE index NE on S1 (name,email); Combined index We can create indexes for name and email individually. Index combination index can be hit: select * from S1 where name= ' Egon '; select * from S1 where name= ' Egon ' and email= ' ADF ' The index merge can be hit: select * from S1 where name= ' Egon '; select * from S1 where email= ' ADF '; select * from S1 where name= ' Egon ' and Em Ail= ' ADF '; At first glance it looks as if the index merge better: can hit more cases, but in fact, to see the situation, if it is name= ' Egon ' and email= ' ADF ', then the efficiency of the composite index is higher than the index merge, if it is a single condition check, then it is more reasonable to use the index merge

If you want to use an index to achieve the desired effect of increasing query speed, we must follow the following guidelines when adding an index

#1. Leftmost prefix matching principle, very important principle, create index ix_name_email on S1 (Name,email,)-leftmost prefix match: must match select * from S1 where Name= ' Egon in order from left to right ‘; #可以select * from S1 where name= ' Egon ' and email= ' asdf '; #可以select * from S1 where email= ' [email protected] '; #不可以mysql会一直向右匹配直到遇到范围查询 (>, <, between, like) stop matching, such as a = 1 and B = 2 and C > 3 and D = 4 If the index of the (A,B,C,D) order is established, D is not indexed , if the index of the Establishment (A,B,D,C) can be used, the order of a,b,d can be arbitrarily adjusted. #2. = And in can be disorderly, such as a = 1 and B = 2 and c = 3 build (a,b,c) indexes can be in any order, the MySQL query optimizer will help you to optimize the index can be identified in the form of # #. Try to choose a high-sensitivity column as the index, and the formula for the sensitivity is count (Distin CT col)/count (*), indicating the scale of the field is not repeated, the greater the proportion of the number of records we scan, the difference between the unique key is 1, and some states, sex fields may be in front of the big data to differentiate the degree is 0, then some people may ask, what is the ratio of experience value? Using different scenarios, this value is also difficult to determine, the general need to join the field we are required to be more than 0.1, that is, the average 1 scan 10 Records # #. The index column cannot participate in the calculation, keep the column "clean", such as from_unixtime (create_time) = ' 2014-05-29 ' can not be used to the index, the reason is very simple, B + tree is stored in the Data table field values, but for retrieval, all elements need to be applied to the function to compare, obviously the cost is too large. So the statement should be written as Create_time = Unix_timestamp (' 2014-05-29 '); #5. Expand the index as much as possible and do not create a new index. For example, the table already has an index of a, now to add (A, b) of the index, then only need to modify the original index

Leftmost prefix demonstration

Mysql> SELECT * from S1 where id>3 and Name= ' Egon ' and email= ' [email protected] ' and gender= ' male '; Empty Set (0.39 sec) mysql> CREATE index idx on S1 (Id,name,email,gender); #未遵循最左前缀Query OK, 0 rows affected (15.27 sec) records:0  duplicates:0  warnings:0mysql> select * from S1 where Id>3 and Name= ' Egon ' and email= ' [email protected] ' and gender= ' male '; Empty Set (0.43 sec) mysql> Drop index idx on S1; Query OK, 0 rows affected (0.16 sec) records:0  duplicates:0  warnings:0mysql> CREATE index idx on S1 (name,emai L,GENDER,ID); #遵循最左前缀Query OK, 0 rows affected (15.97 sec) records:0  duplicates:0  warnings:0mysql> select * from S1 where I D>3 and Name= ' Egon ' and email= ' [email protected] ' and gender= ' male '; Empty Set (0.03 sec)

You need to be aware of a situation where Index misses:

-Like '%xx ' select * from tb1 where email like '%CN ';        -Use function select * from TB1 where reverse (email) = ' Wupeiqi ';            -or select * from tb1 where nid = 1 or name = ' [email protected] ';            Special: When the OR condition has an unindexed columns invalidation, the following will go through the index select * from tb1 where nid = 1 or name = ' seven '; SELECT * from tb1 where nid = 1 or name = ' [email protected] ' and email = ' Alex '-inconsistent type if    The column is a string type, and the incoming condition must be enclosed in quotation marks, otherwise ... select * from tb1 where email = 999; Normal index does not equal not go index-! = SELECT * from tb1 where email! = ' Alex ' Special: If it is a primary key, it will still go index select * from TB1 where nid! = 123-> select * from tb1 where email > ' Alex ' Special: If the primary key or index is an integer type, then the index select * from TB1 whe    Re nid > 123 select * from TB1 where num > 123 #排序条件为索引, the Select field must also be an indexed field, otherwise it cannot be hit-order by    Select name from S1 order by email desc; If the field of the select query is not indexed when sorted by index, the index select email from s1 ORDER by email DEsc Special: If the primary key is sorted, then the index: SELECT * from Tb1 ORDER by nid desc;                -Combined index leftmost prefix if the combined index is: (name,email) name and email--Use index name--use index email --Do not use index-count (1) or count (column) instead of COUNT (*) There is no difference in MySQL-create index xxxx on TB (title) #text类型, you must set the length

Other precautions

-Avoid using SELECT *-count (1) or count (column) instead of COUNT (*)-CREATE table when possible char instead of varchar-table field order fixed Length field precedence-composite index instead of multiple single-column indexes (when multiple conditional queries are used frequently)-use short cables as much as possible Citation-use Join to replace subquery (sub-queries)-Liangui when the condition type needs to be consistent-index hash value (less repetition) does not fit index, example: gender inappropriate
Six search optimization artifact-explain

About explain command believe everyone is not unfamiliar, specific usage and field meaning can refer to official website Explain-output, here need to emphasize rows is the core indicator, most of the rows small statement execution must be very fast (with exception, as described below). So the optimization statements are basically optimizing rows.

Execution plan: Let MySQL estimate perform action (generally correct) all    < index < range < Index_merge < Ref_or_null < ref < Eq_ref < SYSTEM/C Onst    id,email        Slow:        select * from Userinfo3 where name= ' Alex '                explain select * from Userinfo3 where Name= ' al Ex '        type:all (full table Scan)            SELECT * from Userinfo3 limit 1;    Fast:        select * from Userinfo3 where email= ' Alex '        type:const (walk Index)

http://blog.itpub.net/29773961/viewspace-1767044/

Seven basic steps for slow query optimization
0. Run first to see if it is really slow, pay attention to setting the Sql_no_cache1.where condition sheet, and lock the minimum return record table. This sentence means to apply the where of the query to the table the smallest number of records returned from the table began to look up, a single table each field query, to see which field of the highest degree of sensitivity 2.explain View execution plan, is consistent with 1 expected (from a table with fewer locked records) 3.order by The SQL statement in the limit form gives priority to the sorted table 4. Understanding Business Party Usage Scenarios 5. Index-indexed reference to several principles 6. Observation results, non-conforming to the expected continuation from 0 analysis
Eight Slow log management
        Slow log            -Execution time >            -Miss Index            -log file path                    configuration:            -memory                show variables like '%query% ';                Show variables like '%queries% ';                Set global variable name = value            -config file                mysqld--defaults-file= ' E:\wupeiqi\mysql-5.7.16-winx64\mysql-5.7.16-winx64\ My-default.ini '                                my.conf content:                    slow_query_log = on                    slow_query_log_file = d:/....                                    Note: After you modify the configuration file, you need to restart the service
Log Management

Nine Reference Blogs

Https://tech.meituan.com/mysql-index.html

http://blog.itpub.net/29773961/viewspace-1767044/
Http://www.cnblogs.com/wupeiqi/articles/5716963.html

Http://www.cnblogs.com/hustcat/archive/2009/10/28/1591648.html
Http://www.cnblogs.com/mr-wid/archive/2013/05/09/3068229.html
Http://www.cnblogs.com/kissdodog/p/4159176.html
http://blog.csdn.net/ggxxkkll/article/details/7551766
http://blog.itpub.net/26435490/viewspace-1133659/
Http://pymysql.readthedocs.io/en/latest/user/examples.html
Http://www.cnblogs.com/lyhabc/p/3793524.html
Http://www.jianshu.com/p/ed32d69383d2
http://doc.mysql.cn/mysql5/refman-5.1-zh.html-chapter/
http://doc.mysql.cn/
Http://www.php100.com/html/webkaifa/database/Mysql/2013/0316/12223.html
http://blog.csdn.net/ltylove2007/article/details/21084809
Http://lib.csdn.net/base/mysql
http://blog.csdn.net/c_enhui/article/details/9021271
Http://www.cnblogs.com/edisonchou/p/3878135.html?utm_source=tuicool&utm_medium=referral
Http://www.cnblogs.com/ggjucheng/archive/2012/11/11/2765465.html
Http://www.cnblogs.com/cchust/p/3444510.html
Http://www.docin.com/p-705091183.html
http://www.open-open.com/doc/view/51f552745f514bbbaf0aaecf6c88509a
Http://www.open-open.com/doc/view/f80947a5c805458db8cf929834d241bf
Http://www.open-open.com/lib/view/open1435498096607.html
http://www.open-open.com/doc/view/48c510607ab84fd8b87b158c3fe9d177
Http://www.open-open.com/lib/view/open1448032294072.html
Http://www.open-open.com/lib/view/open1404887901263.html
Http://www.cnblogs.com/cchust/p/3426927.html
Http://wribao.php230.com/category/news/1138254.html
Http://www.iqiyi.com/w_19rqqds1ut.html
Http://wenku.baidu.com/link?url=7Grxv0cQ_a00Ni2ZEU_ Cbdk2wd2vtzlns2upkst3of4odqoluq2rqpomk8ap12rdnxbnns6gby8dxvvwmo9bmxjwgs_vkhyus22ghazyues
Http://www.cnblogs.com/edisonchou/p/3878135.html
Http://blog.chinaunix.net/uid-540802-id-3419311.html
http://my.oschina.net/scipio/blog/293052
http://blog.itpub.net/29773961/viewspace-1767044/
http://my.oschina.net/lionets/blog/407263

mysql-index principle and slow query optimization

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.