MySQL index details, MySQL index details
I. Introduction to index basics 1.1
In MySQL, indexes are also called keys. They are a data structure that the storage engine uses to quickly find records.
Indexes are critical to good performance. Especially when the amount of data in a table increases, the impact of indexes on performance becomes more and more important.
Index optimization should be the most effective way to optimize the query performance. To create a truly optimal index, you often need to rewrite the SQL query statement.
1.2 How indexes work
To understand how indexes work in MySQL, the simplest way is to take a look at the index part of a book. For example, if you want to find a topic in a book, you will usually first read the index directory, after finding the corresponding chapter and page number, you can quickly find the content you want to see.
In MySQL, the storage engine uses an index in a similar way. First, it searches for the corresponding value in the index, and then finds the corresponding data row based on the matching index record, finally, return the data result set to the client.
1.3 index type
In MySQL, we usually refer to the following index types:
A regular index, also called a common index (index or key), can improve the query efficiency. A data table can have multiple regular indexes. Regular indexes are the most common index types. If the index type is not explicitly specified, all the indexes we call General indexes.
Primary Key Index
The Primary Key index (also called the Primary Key. It improves query efficiency and provides uniqueness constraints. A table can have only one primary key. Fields marked as auto-increment must be primary keys, but primary keys may not necessarily grow automatically. Generally, the primary key is defined on meaningless fields (such as number). The data type of the primary key is preferably a numerical value.
A Unique index (Unique Key) can improve query efficiency and provide uniqueness constraints. A table can have multiple unique indexes.
Full Text index can improve the query efficiency of Full Text search, which is generally replaced by Sphinx. However, sphek does not support Chinese search. Coreseek is a full-text search engine that supports Chinese, also known as sphek with the Chinese word segmentation function. In actual projects, we use Coreseek.
Foreign Key Index
The Foreign Key index (Foreign Key), referred to as the Foreign Key, improves the query efficiency. The Foreign Key is automatically associated with the primary keys of other tables. The foreign key is mainly used to ensure record consistency and integrity.
Note: Only InnoDB Storage engine tables support foreign keys. If the index name is not specified for the foreign key field, it is automatically generated. If you want to delete records in a parent table (such as a Category Table), you must first Delete the corresponding records in a sub-table (such as a table with foreign keys, such as an article table). Otherwise, an error occurs. When creating a table, you can set a foreign key for the field, such as foreign key (cate_id) references cms_cate (id). Because the foreign key is not very efficient, we do not recommend that you use a foreign key, however, we need to use the foreign key idea to ensure data consistency and integrity.
1.4 Index Method
In MySQL, indexes are implemented at the storage engine layer, rather than on the server layer. The index methods supported by MySQL can also be referred to as the index type (in a broad sense), mainly including the following:
If no type is specified, most of the data is B-Tree indexes. Different storage engines use B-Tree indexes in different ways, and their performance varies. For example, MyISAM uses the prefix compression technology to make the index smaller, but InnoDB stores the index according to the original data format. Another example is that MyISAM references the indexed row through the physical location of the data, while InnoDB references the indexed row based on the primary key.
B-Tree stores index columns sequentially, so it is suitable for searching range data. It can speed up data access, because the storage engine no longer needs to perform full table scans to obtain the required data.
If an index contains values of multiple fields (columns), it is a composite index. The composite index sorts the values of multiple fields based on the column sequence when the index is created. As follows:
Create table people (id int unsigned not null auto_increment primary key comment 'Primary key id', last_name varchar (20) not null default '''comment' surname ', first_name varchar (20) not null default ''comment' name', birthday date not null default '1970-01-01 'comment' birthdate ', gender tinyint unsigned not null default 3 comment' gender: 1 male, 2 female, 3 unknown ', key (last_name, first_name, birthday) engine = innodb default charset = utf8;
The following data has been inserted into the people table:
We created a composite index key (last_name, first_name, birthday). For each row of data in the table, the index contains values of the last name, name, and date of birth. Indexes are sorted and stored in this order. If two people have the same surname and name, they will sort and store the index based on their birthdate.
The B-Tree index is applicable to full-key value, key-value range, or key prefix search. The middle-key prefix search is only applicable to search based on the leftmost prefix.
The composite index is valid for the following types of queries:
Full value matching
Full value matching means matching with all columns in the index. For example, search for Allen, and persons born on January 1.
The SQL statement is:
select id,last_name,first_name,birthday from people where last_name='Allen' and first_name='' and birthday='1960-01-01';
Match the leftmost prefix
For example, if you only use the first column of the index, you can find all the persons named Allen. The SQL statement is:
select id,last_name,first_name,birthday from people where last_name='Allen';
Match column prefix
For example, if you only match the beginning of the value in the first column of the index, search for all persons whose surnames start with. The SQL statement is:
select id,last_name,first_name,birthday from people where last_name like ‘A%';
Matching range value
For example, the range matches the person whose last name is between Allen and Clinton. The SQL statement is:
select id,last_name,first_name,birthday from people where last_name BETWEEN ‘Allen' And ‘Clinton';
Only the first column of the index is used here.
Exact match of the first column and range match of the following column
For example, search for a person whose name is Allen and whose name starts with the letter C. That is, the first column of the composite index is fully matched, and the range matches the second column. The SQL statement is:
select id,last_name,first_name,birthday from people where last_name = ‘Allen' and first_name like'C%';
Access only index queries
B-Tree generally supports "Access-only index query", that is, the query only needs to access the index, instead of accessing data rows. This is related to the optimization of "covering indexes", which will be discussed later.
The following describes the failure of compound indexes:
(1) If you do not start searching based on the leftmost column of the composite index, you cannot use the index. For example, in the above example, the index cannot be used to find the person named or the person of a specific birth date, because neither of the two columns is a composite index key (last_name, first_name, the leftmost data column of birthday. Similarly, it is impossible to find the person whose last name ends with a letter, that is, the fuzzy match % of the like range query. If it is placed first, the index will become invalid.
(2) If the column in the index is skipped during search, only the previous index column will be used, and the subsequent index column will be invalid. For example, find the person with the name Allen and the birth date on a specific date. Because the search name (first_name) is not specified, MySQL can only use the first column (last_name) of the composite index ).
(3) If the query contains a range query for a column, all columns on the right of the Column cannot use the index to optimize the query. For example, if the query condition is where last_name = 'allen' and first_name like 'C % 'and birthday = '2017-10-25', this query can only use the first two columns of the index, because like is a range condition. If the number of values in a Range Query column is limited, you can use multiple equal conditions to replace the range condition for optimization so that the column on the right can also use the index.
Now we know how important the order of columns in the composite index is. These restrictions are related to the order of index columns. When optimizing performance, you may need to use the same column but different indexes in order to meet different types of query requirements. For example, in a table, you may need two composite index keys (last_name, first_name, birthday) and key (first_name, last_name, birthday ).
B-Tree indexes are the most common index types. Otherwise, B-Tree indexes are used.
1. Hash Index
The hash index is implemented based on the hash table. Only queries that precisely match all columns of the index are valid. In MySQL, only Memory engine displays support hash indexes.
2. Spatial Data Index (R-Tree)
The MyISAM engine supports spatial indexes and can be used for storing geographical data. Unlike B-Tree indexes, this index does not require prefix queries.
3. Full-text index
Full-text index is a special type of index. It looks for keywords in text rather than directly comparing values in the index. Full-text indexes have different matching methods than other indexes. They are more similar to what search engines do, rather than simple where condition matching. You can create both full-text indexes and B-Tree indexes on the same column. Full-text indexes are applicable to the Match Against operation, rather than the common where condition operation.
An index can contain values of one or more columns. If an index contains multiple columns, it is generally called a composite index. At this time, the column order is very important, because MySQL can only efficiently use the leftmost prefix column of the index. Creating an index that contains two columns is very different from creating two indexes that only contain one column.
1.5 advantages of Indexes
The index allows MySQL to quickly find the data we need, but this is not the only function of the index.
The most common B-Tree index stores data in sequence. Therefore, MySQL can be used for Order By and Group By operations. Because data is stored in sequence, B-Tree stores the relevant column values. Finally, because the actual column values are also stored in the index, some queries can obtain all the data only using the index, without returning to the table for query. Based on these features, the index has the following three advantages:
- The index greatly reduces the amount of data that the MySQL server needs to scan.
- Indexes can help the server avoid sorting and temporary tables.
- The index can change random I/O to sequential I/O.
In addition, some people use the "three-star system" to evaluate whether an index is suitable for a query statement. The Samsung system mainly means that if the index can put relevant records together, it will get one star. If the data order in the index is the same as that in the search, it will get two stars; if the column in the index contains all the columns required for the query, Samsung is returned.
Indexing is not always the best tool, nor does it mean that the more indexes, the better. In general, the index is useful as long as the index helps the storage engine quickly find records to bring more benefits than the additional work it brings.
For very small tables, a simple full table scan is more efficient in most cases, and there is no need to create an index. For medium to large tables, the benefits of indexes are very obvious.
Ii. High-Performance Index policies
Creating and using indexes correctly is the basis for high-performance queries. We have already introduced various types of indexes and their advantages and disadvantages. Now let's take a look at how to truly take advantage of these indexes. The following sections help you understand how to use indexes efficiently.
2.1 independent Columns
We usually see that some queries do not use local indexes, or MySQL cannot use existing indexes. If the columns in the SQL query statement are not independent, MySQL will not use the index. "Independent column" means that an index Column cannot be a part of an expression or a function parameter.
For example, the following SQL query statement cannot use the primary key index id:
select id,last_name,first_name,birthday from people where id+1=3;
It is easy to see that the above where expression can be abbreviated as where id = 2, but MySQL cannot automatically parse this expression. We should develop the habit of simplifying the where condition and always place the index column on the side of the comparison operator. Therefore, to use the primary key index, write it as follows:
select id,last_name,first_name,birthday from people where id=2;
The following is another common error code:
select ... from ... where to_days(current_date()) - to_days(date_col) <= 10;
2.2 prefix index and index Selectivity
Sometimes, we need to index long character columns, which will make the index large and slow. The general solution is to only index the first few characters of the column, which can greatly save the index space and improve the index efficiency. However, it also reduces the indexing selectivity. The index selectivity refers to the ratio of the number of non-repeated index values (also known as the base number) to the total number of records in the data table. The value ranges from 0 to 1.
The selectivity of the unique index is 1, which is the best index selectivity and the best performance.
Generally, the prefix of a column is highly selective enough to meet the query performance. For Blob, Text, or long Varchar columns, the prefix index must be used, that is, only the first few characters of the column are indexed, because MySQL does not allow the full length of these columns to be indexed.
You can add a prefix index as follows:
Alter table user add key (address (8); // only index the first 8 characters of the address field
Prefix index is an effective way to make the index smaller and faster, but its disadvantage is that MySQL cannot use the prefix index for Order By and Group By operations, nor can it use the prefix index for overwriting scanning.
Sometimes suffix indexes are also useful, such as searching all email addresses of a domain name. However, MySQL native does not support suffix indexes. We can reverse the string and store it, create a prefix index based on it, and then maintain this index through triggers.
Over 2.3 column Indexes
A multi-column index is an index that contains multiple columns. Pay attention to the order of multiple columns. A multi-column index is also called a composite index. For example, the previous key (last_name, first_name, birthday) is a composite index.
A common mistake is to create a separate index for each column, or create multiple column indexes in the wrong order.
First, let's look at the first question: create an independent index for each column. From show create table, it is easy to see this situation:
create table t ( c1 int, c2 int, c3 int, key(c1), key(c2), key(c3));
This incorrect index policy is generally caused by vague suggestions from some experts such as "adding the columns in the where condition to the index.
Creating an independent single column index on multiple columns does not improve MySQL query performance in most cases. In MySQL 5.0 and later versions, an index merge policy is introduced, to some extent, it can use multiple single-column indexes on the table to locate the specified row. However, the efficiency is much lower than that of composite indexes.
For example, the film_actor table has a single column index on the film_id and actor_id fields. The SQL query statement is as follows:
select film_id,actor_id from film_actor where actor_id=1 or film_id=1;
In Versions later than MySQL, the query can use these two single column indexes for scanning at the same time and merge the results. This algorithm has three variants: union of or conditions, intersection of and conditions, and union and intersection of the first two conditions of combination.
The preceding query uses the combination of two index scans. You can see this through the Extra column in The explain statement (the Extra value contains the union character:
explain select film_id,actor_id from film_actor where actor_id=1 or film_id=1\G
The index merge policy is sometimes an optimization result, but in fact it shows that the index on the table is poorly created:
- When multiple indexes are overlapped (usually with multiple and conditions), a composite index containing all related columns is usually required, rather than multiple independent single column indexes.
When multiple indexes are combined (usually with multiple or conditions), a large amount of CPU and memory resources are usually consumed in the cache, sort, and merge operations of algorithms. In this case, you can rewrite the query to two query Union methods:
select film_id,actor_id from film_actor where actor_id=1union allselect film_id,actor_id from film_actor where film_id=1 and actor_id<>1;
If the Union of indexes is found in the explain results, check the SQL query statement and table structure to see if it is already the best, can it be split into multiple query Union methods.
2.4 select an appropriate index column Sequence
The most confusing thing is the order of columns in the composite index. In a composite index, the correct column order depends on the query using this index, and you also need to consider how to better meet the needs of sorting and grouping.
The order of index columns means that the index is first sorted by the leftmost column, followed by the second and third columns .... Therefore, indexes can be scanned in ascending or descending order to meet the query requirements of clauses such as order by, group by, and distinct that precisely match the column order.
When you do not need to consider sorting and grouping, it is usually good to put the columns with the highest selectivity to the leftmost (forefront) of the composite index. In this case, the index is only used to optimize the where condition search. However, we may also need to adjust the order of index columns Based on the queries with the highest running frequency, so that the index has the highest selectivity in this case.
Take the following query as an example:
select * from payment where staff_id=2 and customer_id=500;
Should I create an index for key (staff_id, customer_id) or key (customer_id, staff_id? You can run some queries to determine the distribution of the table's values and determine which column is more selective. For example, you can use the following query to predict:
select sum(staff_id=2), sum(customer_id=500) from payment\G
If the result shows that sum (staff_id = 2) is 7000, and sum (customer_id = 500) is 60. Therefore, in the preceding query, customer_id is more selective and should be placed at the top of the index, that is, key (customer_id, staff_id ).
However, you must note that the query results depend on the selected values. If optimization is performed according to the above method, the query for other different condition values may be unfair, or the overall performance of the server may become worse.
If the "worst query" is extracted from the report of a tool such as pt-query-digest, It is very efficient to select the index order according to the above method. If there is no specific query to run, it is best to do it according to the empirical rule, because the empirical rule considers the global base and selectivity, rather than the query of a specific condition value. The following is an empirical method for determining selectivity:
select count(distinct staff_id)/count(*) as staff_id_selectivity,count(distinct customer_id)/count(*) as customer_id_selectivity,from payment\G
Assume that the value of staff_id_selecti.pdf is 0.001, while that of customer_id_selecti.pdf is 0.086. We know that the higher the value, the higher the selectivity. Therefore, customer_id is more selective. Therefore, we should use it as the first column of the index column:
alter table payment add key(customer_id, staff_id);
Although the empirical rules on selectivity and global base are worth studying and analyzing, do not forget the influence of order by, group by, and other factors, these factors may have a significant impact on the query performance.
2.5 clustered Index
Clustered index is not a separate index type, but a data storage method. The specific details depend on the implementation method, but the InnoDB Cluster Index actually stores the B-Tree index and data row in the same structure.
When a table has a clustered index, its data rows are actually stored in the index's leaf page (leaf page). That is to say, the leaf page contains all the data of the row, the node page only contains the data of the index column.
Because the storage engine implements indexes, not all storage engines Support clustered indexes. This section focuses on InnoDB. The content discussed here is applicable to any storage engine that supports clustered indexes.
InnoDB aggregates data through the primary key. If no primary key is defined, InnoDB will replace it with a unique non-null index. Without such an index, InnoDB implicitly defines a primary key as a clustered index.
Advantages of clustered index:
- You can save related data together.
- Faster data access. Clustering indexes store indexes and data in the same B-Tree. Therefore, retrieving data from clustering indexes is usually faster than non-clustering indexes.
- You can directly use the primary key value on the node page for queries that overwrite the index scan.
If you can make full use of the above advantages when designing tables and queries, You can greatly improve the performance.
Disadvantages of clustered index:
- Clustering indexes maximize the performance of I/O-intensive applications. However, if all the data is stored in the memory, the access sequence is less important, clustered indexes have no advantages.
- The insert speed depends heavily on the insert sequence. Inserting Data in the order of primary keys is the fastest way to insert data into InnoDB tables. However, if data is not inserted in the primary key sequence, it is recommended that you use the optimize table command to reorganize the TABLE after the operation is completed.
- Updating clustering index columns is costly because InnoDB is forced to move each updated row to a new location.
- When a table based on Clustered index inserts a new row or the primary key is updated, it may face the "page split" issue when moving rows. Split pages may cause tables to occupy more disk space.
In InnoDB, clustering indexes "are" tables, so they do not need independent Row Storage as MyISAM does. Each leaf node of the clustered index contains the primary key value, transaction ID, rollback pointer for transactions and MVCC (Multi-version control), and all the remaining columns.
InnoDB's secondary index (non-clustered index) is very different from the clustered index. The leaf node of the secondary index does not store the "Row Pointer", but the primary key value. Therefore, when data is searched through a secondary index, two index queries are performed. The storage engine needs to first find the leaf node of the secondary index to obtain the corresponding primary key value, and then find the corresponding data row in the clustered index based on the primary key value.
To ensure that data rows are inserted in sequence, the simplest way is to define the primary key as auto_increment for automatic growth. When using InnoDB, we should try to insert data in the primary key order as much as possible, and try to use the monotonic increase of the primary key value to insert new rows.
For high-concurrency workloads, inserting data in the primary key sequence in InnoDB may cause significant primary key value contention issues. This problem is very serious and can be solved by Baidu.
2.6 covering Indexes
Generally, you can create an appropriate index based on the where condition of the query, but this is only one aspect of index optimization. The whole query should be considered for an excellent index design, not just the where condition.
Indexes are indeed an efficient way to search for data. However, MySQL can also use indexes to directly retrieve column data, so that you do not have to read data rows. If the leaf node of the index already contains all the data to be queried, what else do I need to query the table?
If an index contains (or overwrites) the values of all fields (columns) to be queried, we call it "overwrite the index ".
Covering indexes is very useful and can greatly improve performance. Consider the following benefits if you only need to scan the index and do not need to return to the table to obtain data rows:
- Index entries are usually much smaller than the data row size. Therefore, if you only need to read the index, MySQL will greatly reduce the data access volume. Covering indexes is also helpful for I/O-intensive applications, because indexes are smaller than data, and they are easier to put all in the memory.
- Because indexes are stored in the order of column values (at least on a single page ), therefore, I/O-intensive range queries are much less than I/O-intensive queries that randomly read each row from the disk.
- Because of the collection index of InnoDB, covering indexes is particularly useful for InnoDB tables. The InnoDB secondary index (non-clustered index) stores the primary key value of the row in the leaf node. Therefore, if the secondary primary key can overwrite the query, secondary query of the primary key index can be avoided.
In all these scenarios, the cost of completing all queries in the index is generally much lower than that of querying the returned table.
B-Tree indexes can be covered indexes, but hash indexes, spatial indexes, and full-text indexes do not support covered indexes.
When an indexed overwrite query (also called an indexed overwrite query) is initiated, the "Using index" information is displayed in the Extra column of the explain statement. For example:
explain select id from people;explain select last_name from people;explain select id,first_name from people;explain select last_name,first_name,birthday from people;explain select last_name,first_name,birthday from people where last_name='Allen';
The people table is created in the preceding section. It contains a primary key (id) index and a composite index key (last_name, first_name, birthday) for multiple columns ), these two indexes cover the values of the four fields. If an SQL query statement contains all the fields to be queried, this query can be called an index overwrite query. If an index contains the values of all fields to be queried in an SQL query statement, this index overwrites the index for this query statement. For example, key (last_name, first_name, birthday) for select last_name, first_name from people is to overwrite the index.
2.7 sort by index Scan
MySQL can generate ordered result sets in two ways: order by and Automatic Sorting by index order scanning (that is, sorting by index ). In fact, these two sorting operations do not conflict, that is, order by can be sorted using indexes.
To be exact, there are two sorting methods for the result set in MySQL:
1. Index sorting
Index sorting refers to sorting the result set using the Field Values in the index. If the value of the type parameter displayed in the explain statement is index, MySQL must use index sorting. For example:
explain select id from people;explain select id,last_name from people order by id desc;explain select last_name from people;explain select last_name from people order by last_name;explain select last_name from people order by last_name desc;
Note: even if the type value displayed in the explain statement is not an index, it may also be an index sorting. For example:
explain select id from people where id >3;explain select id,last_name from people where id >3 order by id desc;
2. File Sorting
File Sorting refers to sorting the queried result sets through additional operations and then returning them to the client. This sorting method does not use index sorting, which is less efficient. Although MySQL calls it filesort for File Sorting, it does not necessarily use disk files.
If the value of the Extra parameter displayed in the explain statement contains the "Using filesort" string, it indicates File Sorting. In this case, you must optimize the index or SQL query statement. For example:
explain select id,last_name,first_name from people where id > 3 order by last_name;
MySQL can use the same index to meet both search and query requirements. If possible, the two operations should be satisfied as much as possible during index design.
Only when the index column contains the field in the where condition and the Field in order by, and the column order in the index is consistent with the order of all fields contained in the where + order by (note: order by is behind the where clause.
Now, we can optimize the preceding SQL statement so that it can use index sorting.
First, add a multi-column index.
alter table people add key(id,last_name);
You will find that you still cannot use index sorting by adding only the key (id, last_name). This is because the where + order by statement must also meet the leftmost prefix requirements of the index, where id> 3 is a range condition, which causes order by last_name to fail to use the index key (id, last_name ).
Next, change order by last_name in the SQL statement to order by id and last_name.
Note: If the SQL query statement is an associated query for multiple tables, you can use index sorting only when all the order by sorting fields come from the first table.
The following lists several cases where index sorting cannot be used:
1. If order by is sorted by multiple fields but the sorting direction of multiple fields is inconsistent, that is, some fields are asc (ascending, the default is ascending ), some fields are desc (descending ). For example:
explain select * from people where last_name='Allen' order by first_name asc, birthday desc;
2. If order by contains a field that is not in the index column. For example:
explain select * from people where last_name='Allen' order by first_name, gender;
3. If the first column of the index column is a range search condition. For example:
explain select * from people where last_name like 'A%' order by first_name;
4. In this case, you can optimize the SQL statement:
explain select * from people where last_name like 'A%' order by last_name,first_name;
2.8 redundant and duplicate Indexes
MySQL allows you to create multiple indexes on the same column (except that the index names are different). Because MySQL needs to maintain duplicate indexes separately, in addition, the optimizer also needs to analyze the query individually, so duplicate indexes will affect the performance.
Duplicate indexes refer to indexes of the same type created in the same column order on the same column. Duplicate indexes should be avoided and deleted immediately after discovery.
Redundant indexes are different from duplicate indexes. If index key (A, B) is created, then index key (A) is created, that is, redundant index. Because Index (A) is only the prefix index of the previous index. Indexes (A, B) can also be used as indexes (. However, if you create another index (B, A), it is not A redundant index.
Redundant indexes usually occur when a new index is added to a table. For example, someone may add A new index (A, B) instead of extending the existing index (). Another case is that A secondary index (A) is extended to (A, ID), where ID is the primary key. For InnoDB, the secondary index already contains the primary key column by default, this is also redundant.
In most cases, redundant indexes are not required. Extend existing indexes as much as possible instead of creating new indexes. However, sometimes redundant indexes are also required for performance considerations, because expansion of existing indexes will lead to a larger value, which will affect the performance of other query statements using this index.
Be especially careful when you extend the index. Because the leaf node of the secondary index contains the primary key value, the index on column (A) is equivalent to the index on (A, ID. If someone uses A query like where A = 5 order by ID, index (A) is very useful. However, if you change index (A) to index (A, B), it actually becomes index (A, B, ID, the order by statement in the preceding query does not support index sorting, but File Sorting.
We recommend that you use the pt-upgrade tool in the Percona toolkit to carefully check the expected index changes.
Therefore, the original index can be extended only when you are clear about all the queries related to an index. Otherwise, creating a new index (making the original index a redundant index of the new index) is the safest way.
2.9 unused Indexes
There may be some indexes that will never be used on the MySQL server. Such indexes are completely cumbersome. We recommend that you delete them. However, it should be noted that the uniqueness constraint function of a unique index may not be used for query, but it can be used to avoid duplicate data.