MySQL Five: Indexing principle and slow query optimization

Source: Internet
Author: User
Tags mysql index

First, Introduction

1, why should have index?

General application system, reading and writing ratio of about 10:1, and the insertion operation and the general update operation rarely appear performance problems, in the production environment, we encounter the most, is also the most prone to problems, or some complex query operations, so the optimization of query statements is obviously the most serious. Speaking of accelerating queries, you have to mention the index.

2. What is an index?

The index, also called a "key" in MySQL, is a data structure used by the storage engine to quickly find records. Indexes are critical to good performance, especially when the amount of data in a table is increasing, and the impact of indexes on performance becomes increasingly important.

Index optimization should be the most effective means of optimizing query performance. Indexes can easily improve query performance by several orders of magnitude. The index is equivalent to the dictionary's Sequencer list, if you want to check a word, if you do not use a sequencer, you need to check from page hundreds of.

Attention:

Indexing is an important aspect of application design and development. If there are too many indexes, the performance of the application may be affected. With too few indexes, there is an impact on query performance, and finding a balance point is critical to the performance of your application.

Second, the principle of the index

1. Index principle

The purpose of the index is to improve the efficiency of the query, which is the same as the directory we use to look up books: First locate the chapter, then navigate to a section under that chapter, and then find the number of pages. Similar examples include: Check the dictionary, check train trips, aircraft flights and so on.

Essentially, by narrowing down the range of data you want to get to the final desired result, and turning random events into sequential events, that is, with this indexing mechanism, we can always use the same search method to lock the data.

2. Disk IO and pre-read

Here is a brief introduction of disk IO and pre-reading, disk reading data by the mechanical movement, the time spent on each read data can be divided into seek time, rotation delay, transmission time three parts;

Seek time refers to the time required for the magnetic arm to move to the specified track, and the main disk is generally below 5ms;

Rotation delay is what we often hear of disk speed, such as a disk 7200 rpm, indicating that can be rotated 7,200 times per minute, that is, 1 seconds can go 120 times, rotation delay is 1/120/2 = 4.17ms;

Transfer time refers to the time that reads from disk or writes data to disk, typically in fraction milliseconds, compared to the first two times, the time to access the disk, that is, the time of the disk IO is approximately equal to 5+4.17 = about 9MS

Sounds pretty good, but know that a 500-mips (Million instructions per Second) machine can execute 500 million instructions per second, because the instruction relies on the nature of the electricity, in other words, the time to execute an IO can execute about 4.5 million instructions, A database of 1.001 billion or even tens data, 9 milliseconds each time, is clearly a disaster.

Considering that disk IO is a very expensive operation, the computer operating system does some optimization, when an IO, not only the current disk address data, but also the adjacent data are read into the memory buffer, because the local pre-reading principle tells us that when the computer access to the data of an address, The data adjacent to it will also be accessed quickly. Each IO reads the data we call a page. The specific page of how big the data is related to the operating system, generally 4k or 8k, that is, when we read the data in a page, actually occurred once io, this theory is very helpful for the data structure design of the index.

Iii. the data type of the index

1. B + Tree

For example, is a B + tree, the definition of B + tree can be seen in the B + tree, here is only a few points, the light blue block we call a disk block, you can see each disk block contains several data items (shown in dark blue) and pointers (XXX), such as disk Block 1 contains data items 17 and 35, including pointers P1, P3,P1 represents a disk block that is less than 17, P2 represents a disk block between 17 and 35, and P3 represents a disk block greater than 35. Real data exists at leaf nodes 3, 5, 9, 10, 13, 15, 28, 29, 36, 60, 75, 79, 90, 99. Non-leaf nodes do not store real data, only data items that guide the direction of the search, such as 17 and 35, do not exist in the data table.

2. B + Tree Search process

, if you want to find the data item 29, then the disk Block 1 is loaded into memory by disk, at this time Io, in memory with a binary lookup to determine 29 between 17 and 35, locking disk Block 1 P2 pointer, memory time because of very short (compared to the disk IO) can be negligible, Disk Block 1 through disk address of the P2 pointer to the disk block 3 is loaded into memory, the second io,29 between 26 and 30, locking disk block 3 of the P2 pointer, loading disk blocks 8 through the pointer to memory, a third Io, while in-memory binary find found 29, the end of the query, a total of three IO. The real situation is, the 3-tier B + tree can represent millions of data, if millions of data to find only three Io, the performance will be huge, if there is no index, each data item will occur once IO, then a total of millions of Io, it is obviously very expensive.

3. B + Tree Nature

1) The index field should be as small as possible: through the above analysis, we know that the number of IO depends on the height of the B + number of H, assuming that the current data table data is N, the number of data items per disk block is M, then there is H=㏒ (m+1) n, when the amount of data n is certain, M is larger, h The size of the data item, the size of the disk block, which is the size of a data page, is fixed, and if the data item occupies less space, the higher the number of data items, the lower the height of the tree. This is why each data item, the index field, is as small as possible, such as an int accounting for 4 bytes, which is less than half the bigint8 byte. This is why the B + tree requires the real data to be placed on the leaf node instead of the inner node, and once placed in the inner node, the data items of the disk block will be greatly reduced, resulting in a higher tree. When the data item equals 1 o'clock, it will degenerate into a linear table.

2) The leftmost matching feature of the index: when the data items of the B + tree are composite data structures, such as (Name,age,sex), the B + number is set to the search tree in order from left to right, such as when the data (Zhang San, 20,f) is retrieved, B + The tree will first compare the name to determine the direction of the next search, if name is the same, then compare age and sex, and finally get the retrieved data, but when (20,f) such a data without name, B + Tree does not know which node to check next, Because name is the first comparison factor when creating a search tree, you must first search by name to know where to look next. For example, when (Zhang San, F) such data to retrieve, B + tree can use name to specify the direction of the search, but the next field of age is missing, so only the name equal to Zhang San data are found, and then match the gender is the data of F, this is very important property, that is, the index of the leftmost matching characteristics.


Iv. clustered index and secondary index

In the database, the B + tree is generally the height of the four tiers, which is to find a key value of the row record up to 2 to 4 Io, this is good. Because the current general mechanical hard disk can do at least 100 times per second io,2~4 IO means that the query time only needs 0.02~0.04 seconds. The B + Tree index in the database can be divided into clustered index (clustered index) and secondary index (secondary index)

A clustered index is the same as a secondary index: whether it is a clustered index or a secondary index, the inside is a B + tree, which is balanced in height, and the leaf node holds all the data.

A clustered index differs from a secondary index in that the leaf node is stored as a whole line of information

1. Clustered index

The InnoDB storage Engine represents the index organization table in which the data in the table is stored in the primary key order. A clustered index constructs a B + tree according to the primary key of each table, while the leaf nodes are stored as rows of the entire table, and the leaf nodes of the clustered index are called data pages. This attribute of the clustered index determines that the data in the indexed organization table is also part of the index. As with the B + Tree data structure, each data page is linked through a doubly linked list.

If a primary key is not defined, MySQL takes the first unique index (unique) and contains only non-empty columns (not NULL) as the primary key, which InnoDB uses as a clustered index.

Without such a column, InnoDB itself produces an ID value that has six bytes and is hidden as the clustered index.

Because the actual data page can only be sorted by a B + tree, each table can have only one clustered index. In how many cases, the query optimizer tends to take a clustered index. Because the clustered index can find data directly on the leaf nodes of the B + Tree index. In addition, because the logical order of the data is defined, the clustered index can be accessed particularly quickly for scopes that are worth querying.

1) One of the benefits of a clustered index: its sort lookup and range lookup for the primary key is very fast, and the leaf node data is the data that the user wants to query. If the user needs to find a table, query the last 10 user information, because B + Tree index is a doubly linked list, so users can quickly find the last data page, and take out 10 records

2) The benefit of a clustered index: the scope of the query, that is, if you want to find the primary key in a range of data, through the upper middle node of the leaf node can get the scope of the page, then directly read the data page

2. Secondary index

Tables in addition to the clustered index are secondary indexes (also known as nonclustered indexes), and the difference between the clustered index is that the leaf node of the secondary index does not contain all the data of the row Records. In addition to the key values, the leaf node contains a bookmark (bookmark) in the index row in each leaf node. This bookmark is used to tell the INNODB where the storage engine is going to find the row data that corresponds to the index. Because the InnoDB storage engine is an indexed organization table, the bookmark for the secondary index of the InnoDB storage engine is the clustered index key for the corresponding row data.

The presence of secondary indexes does not affect the organization of the data in the clustered index, so there can be multiple secondary indexes on each table, but only one clustered index. When looking for data through a secondary index, the InnoDB storage engine traverses the secondary index and obtains a primary key with a leaf-level pointer to the primary key index, and then finds a complete row record through the primary key index.

For example, if you look for data in a secondary index tree with a height of 3, you need to go through the secondary index trees 3 times to find the specified primary key, and if the clustered index tree has the same height of 3, you also need to find the clustered index tree 3 times, and finally find a page with the complete row data. Therefore, a total of 6 logical IO accesses are required to obtain the final data page.


V. MySQL index management

1. function

1) The function of indexing is to speed up the search

2) Primary key,unique in MySQL, the union is the only index, these indexes, in addition to accelerating the search, there are constraints of the function

2. mysql Common index

Normal index: Speed up search

Unique index:

-PRIMARY key Index primary key: Accelerated find + constraint (not empty, cannot be duplicated)

-Unique index Unique: Accelerated find + constraint (cannot be duplicated)

Federated Index:

-primary Key (Id,name): Federated primary Key index

-unique (id,name): Federated Unique Index

-index (id,name): United common Cable

3. Two types of hash and btree of the index (specify the index type when creating the index above)

Hash Type index: query single fast, range query slow

Index of Btree type: B + Tree, more layers, exponential growth of data volume (we use it because InnoDB supports it by default)

Different types of indexes supported by the storage engine are not the same

InnoDB supports transactions, supports row-level locking, supports indexes such as B-tree, Full-text, and does not support Hash indexing;

MyISAM does not support transactions, supports table-level locking, supports indexes such as B-tree, Full-text, and does not support Hash indexing;

Memory does not support transactions, support table-level locking, Support B-tree, Hash and other indexes, do not support Full-text index;

NDB support transactions, support row-level locking, Support Hash index, do not support B-tree, full-text and other indexes;

Archive does not support transactions, support table-level locking, does not support B-tree, Hash, Full-text and other indexes;

4. Syntax for creating/deleting indexes

Method One: When you create a table

CREATE Table Table name (

Field name 1 data type [integrity constraint ...],

Field Name 2 data type [integrity constraint ...],

[UNIQUE | Fulltext | SPATIAL] INDEX | KEY

[Index name] (Field name [(length)] [ASC | DESC])

);

Method Two: Create an index on an existing table

CREATE [UNIQUE | Fulltext | SPATIAL] Index Name

On table name (field name [(length)] [ASC | DESC]);

Method Three: ALTER table to create an index on an existing table

ALTER table name ADD [UNIQUE | Fulltext | SPATIAL] INDEX

Index name (field name [(length)] [ASC | DESC]);

DROP INDEX: Dropping index name on table name;


Six, index test


Vii. correct use of the index


VIII. combined index and coverage index

1. Joint Index

A federated index is an index that combines multiple columns on a table. A federated index is created in the same way that a single index is created, except that there are multiple indexed columns

2. Overlay Index

The InnoDB storage engine supports overwriting indexes (or index overrides), which is the ability to get query records from a secondary index without querying the records in the clustered index.


Nine, query optimization artifact-explain


Ten, the basic steps of slow query optimization

0. Run first to see if it is really slow, note the setting Sql_no_cache

1.where condition single Check, lock minimum return record table. This sentence means to apply the where of the query to the table the smallest number of records returned in the table began to look up, single table each field query, to see which field is the highest degree of distinction

2.explain View execution plan, consistent with 1 expected (start query from a table with fewer locked records)

3.order by limit SQL statement allows sorted tables to be prioritized

4. Understanding Business Party usage Scenarios

5. Index reference several principles of index construction

6. Observation results, non-conforming to the expected continuation from 0 analysis


11 Slow Log Management

Slow log

-Execution Time > 10

-Missing Index

-Log file path

Configuration:

-Memory

Show variables like '%query% ';

Show variables like '%queries% ';

Set global variable name = value

-Configuration file

Mysqld--defaults-file= ' E:\wupeiqi\mysql-5.7.16-winx64\mysql-5.7.16-winx64\my-default.ini '

My.conf content:

Slow_query_log = On

Slow_query_log_file = d:/....

Note: After you modify the configuration file, you need to restart the service


MySQL Five: Indexing principle and slow query optimization

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.