Understanding MySQL-index and optimization summary, mysql Index Summary

Source: Internet
Author: User
Tags mysql index

Understanding MySQL-index and optimization summary, mysql Index Summary

Preface: indexes have a crucial impact on the query speed. Understanding indexes is also the starting point for optimizing database performance. Consider the following: Assume that a table in the database has 10 ^ 6 records, the page size of the DBMS is 4 K, and 100 records are stored. If no index exists, the query will scan the entire table. In the worst case, if all data pages are not in memory, you need to read 10 ^ 4 pages, if these 10 ^ 4 pages are randomly distributed on the disk, I/O needs to be performed 10 ^ 4 times. Assume that the I/O time of each disk is 10 ms (data transmission time is ignored ), it takes 100 s in total (but it is much better ). If you create a B-Tree index, you only need to perform log100 (10 ^ 6) = 3 page reads. In the worst case, it takes 30 ms. This is the effect of indexes. In many cases, when your application performs SQL queries slowly, you should think about whether you can create indexes. Enter the subject:

Chapter 2 Index and Optimization

1. Select the index data type

MySQL supports many data types. Selecting the appropriate data type to store data has a great impact on performance. Generally, the following guiding principles can be observed:

(1) smaller data types are generally better: smaller data types usually require less space in the disk, memory, and CPU cache for faster processing.

(2) A simple data type is better: the processing overhead of integer data is smaller than that of characters, because the strings are more complex. In MySQL, the built-in Date and Time data types should be used instead of strings to store the time, and the IP addresses of integer data types should be used to store the time.

(3) Avoid NULL as much as possible: the column should be specified as not null unless you want to store NULL. In MySQL, it is difficult to query and optimize columns with null values, because they make the index and index statistics and comparison operations more complex. You should replace null values with 0, a special value, or an empty string.

1.1 select an identifier

It is important to select an appropriate identifier. You should consider not only the storage type, but also how MySQL performs computation and comparison. Once the data type is selected, make sure that all related tables use the same data type.

(1) integer: it is usually the best choice for identifiers, because it can be processed faster and can be set to AUTO_INCREMENT.

(2) string: Avoid using strings as identifiers whenever possible. They consume more space and process slowly. Generally, strings are random, so their locations in the index are random, which leads to page splitting and random access to the disk, clustered index split (for storage engines that use clustered index ).

2. Indexing

For any DBMS, indexing is the most important factor for optimization. For a small amount of data, the impact of no suitable index is not very big, but as the data volume increases, the performance will drop sharply.

If multiple columns are indexed (composite indexes), the column order is very important. MySQL can only search for the leftmost prefix of the index effectively. For example:
Assume that the combined index it1c1c2 (c1, c2) exists. The query statement select * from t1 where c1 = 1 and c2 = 2 can use this index. The query statement select * from t1 where c1 = 1 can also use this index. However, the query statement select * from t1 where c2 = 2 cannot use this index, because there is no Bootstrap column for the combined index, that is, to use column c2 for search, c1 must be equal to a value.

2.1. Index type

Indexes are implemented in the storage engine rather than on the server layer. Therefore, the indexes of each storage engine are not necessarily the same, and not all storage engines Support all index types.

2.1.1. B-Tree indexes

Assume that the following table is used:

CREATE TABLE People (  last_name varchar(50)  not null,  first_name varchar(50)  not null,  dob    date      not null,  gender   enum('m', 'f') not null,  key(last_name, first_name, dob));

The index contains the last_name, first_name, and dob columns of each row in the table. Its structure is roughly as follows:

 

The values stored in the index are arranged in the order in the index column. You can use the B-Tree index to query the full keyword, keyword range, and keyword prefix. of course, if you want to use an index, you must ensure that the leftmost prefix of the index (leftmost prefix of the index).

(1) Match the full value: specify a specific value for all columns in the index. For example, the medium index can help you find Cuba Allen born on January 1.

(2) Match a leftmost prefix: You can use the index to find the person whose last name is Allen and use only the 1st columns in the index.

(3) Match a column prefix: for example, you can use the index to find the person whose last name starts with J, which only uses the 1st column in the index.

(4) Match a range of values: You can use the index to find the person with the last name between Allen and Barrymore. Only 1st columns of the index are used.

(5) exact matching part while range matching (Match one part exactly and match a range on another part): You can use the index to find that the last name is Allen, the first name starts with the letter K.

(6) only query indexes (Index-only queries): If all queried columns are in the Index, you do not need to read the values of the tuples.

Because the nodes in the B-tree are stored sequentially, you can use indexes to search for (find some values) or perform ORDER BY on the query results. Of course, the use of B-tree indexes has the following restrictions:

(1) The query must start with the leftmost column of the index. I have already mentioned this many times. For example, you cannot use indexes to search for people born on a certain day.

(2) You cannot skip an index column. For example, you cannot use the index to find the person whose last name is Smith and was born on a certain day.

(3) the storage engine cannot use the column on the right of the range condition in the index. For example, if your query statement is WHERE last_name = "Smith" AND first_name LIKE 'J % 'AND dob = '2017-12-23 ', only the first two columns in the index are used for this query, because LIKE is a range query.

2.1.2. Hash Index

In MySQL, only the Memory storage engine displays the support for hash indexes, which is the default index type of the Memory table, although the Memory table can also use B-Tree indexes. The Memory storage engine supports non-unique hash indexes, which is rare in the database field. If multiple values have the same hash code, indexes store row pointers in the same hash table using a linked list.

Assume that the following table is created:

CREATE TABLE testhash (  fname VARCHAR(50) NOT NULL,  lname VARCHAR(50) NOT NULL,  KEY USING HASH(fname)) ENGINE=MEMORY;

The data contained is as follows:

Assume that the index uses the hash function f () as follows:

f('Arjen') = 2323f('Baron') = 7437f('Peter') = 8784f('Vadim') = 2458

The index structure is roughly as follows:

Slots are sequential, but records are not sequential. When you execute

mysql> SELECT lname FROM testhash WHERE fname='Peter';

MySQL calculates the hash value of 'Peter 'and then queries the row pointer of the index. Because f ('Peter ') = 8784, MySQL searches for 8784 in the index and obtains the pointer pointing to record 3.

Because indexes only store very short values, the indexes are very compact. The Hash value does not depend on the Data Type of the column. The index of a TINYINT column is as large as that of a long string column.

Hash indexes have the following restrictions:

(1) Because indexes only contain hash code and record pointers, MySQL cannot avoid reading records by using indexes. However, the access records in the memory are very fast and will not have a huge impact on the performance.

(2) hash indexes cannot be used for sorting.

(3) Hash indexes do not support partial key matching, because the whole index value is used to calculate the hash value.

(4) Hash indexes only support equivalent comparison, for example, using =, IN () and <=>. WHERE price> 100 cannot accelerate queries.

2.1.3. Spatial (R-Tree) Indexes

MyISAM supports spatial indexes and is mainly used for geospatial data types, such as GEOMETRY.

2.1.4 Full-text index

Full-text indexing is a special index type of MyISAM and is mainly used for full-text retrieval.

3. High-Performance Index policies

3.1. Clustered Indexes (Clustered Indexes)

Clustering indexes ensure that the physical locations of tuples with the same keyword values are also the same (so it is not recommended to create clustering indexes for string types, especially random strings, which will cause the system to perform a lot of moving operations ), A table can have only one clustered index. Because indexes are implemented by the storage engine, not all engines Support clustered indexes. Currently, only solidDB and InnoDB are supported.

The cluster index structure is roughly as follows:

 

Note: The leaf page contains the complete tuples, while the inner node page only contains the indexed columns (the indexed column is an integer type ). Some DBMS allow users to specify clustering indexes, but MySQL's storage engine does not support this so far. InnoDB creates a clustered index for the primary key. If you do not specify a primary key, InnoDB will replace it with a unique and non-null index. If such an index does not exist, InnoDB defines a hidden primary key and creates a clustered index for it. In general, DBMS stores actual data in the form of clustered indexes, which is the basis of other secondary indexes.

3.1.1 comparison of data layout of InnoDB and MyISAM

To better understand clustering indexes and non-clustering indexes, or primary indexes and second indexes (MyISAM does not support clustering indexes), we can compare the data layout of InnoDB and MyISAM. For the following table:

CREATE TABLE layout_test (  col1 int NOT NULL,  col2 int NOT NULL,  PRIMARY KEY(col1),  KEY(col2));

Assume that the value of the primary key is between 1---10, 000, and is inserted in random order, and then optimized using optimize table. Col2 is randomly assigned a value ranging from 1 to, so many duplicate values exist.

(1) Data Layout of MyISAM

The layout is very simple. MyISAM stores data on the disk in the order of insertion, as shown below:

Note: The row number on the left starts from 0. Because the size of the tuples is fixed, MyISAM can easily locate a certain byte from the beginning of the table.

The index structure of the established primary key is roughly as follows:

Note: MyISAM does not support clustered indexes. Each leaf node in the index only contains row numbers, and the leaf nodes are stored in the order of col1.
Let's take a look at the index structure of col2:

In fact, in MyISAM, the primary key is no different from other indexes. Primary key is only a unique non-null index called PRIMARY.

(2) InnoDB Data Layout

InnoDB stores data in the form of clustered indexes, so its data layout is very different. It stores the following table structure:

Note: Each leaf node in the clustered index contains the value of the primary key, the transaction ID, and the rollback pointer-used for transactions and MVCC, and the remaining columns (such as col2 ).

Compared with MyISAM, secondary indexes differ greatly from clustered indexes. The leaf of the InnoDB secondary index contains the primary key value, instead of the row pointers, which reduces the overhead of maintaining the secondary index when the mobile data or data page is split, because InnoDB does not need to update the index row pointer. Its structure is roughly as follows:

Comparison of clustered indexes and non-clustered index tables:

 

3.1.2 Insert rows in the order of primary keys (InnoDB)

If you use InnoDB without special clustered indexes, a good practice is to use the proxy primary key (surrogate key)-independent of the data in your application. The simplest way is to use an AUTO_INCREMENT column, which ensures that records are inserted in order and improves the performance of queries using the primary key for connection. Avoid random clustering primary keys as much as possible. For example, a string primary key is a bad choice, which makes the insert operation random.

3.2 Covering Indexes)

If the index contains all the data that meets the query conditions, it is called overwriting index. Covering indexes is a very powerful tool that can greatly improve query performance. You only need to read the index instead of reading data, which has the following advantages:

(1) index items are usually smaller than records, so MySQL accesses less data;

(2) indexes are stored in order of values. Less I/O is required than random access records;

(3) most data engines can better cache indexes. For example, MyISAM only caches indexes.

(4) covering indexes is particularly useful for InnoDB tables, because InnoDB uses clustered indexes to organize data. If the secondary index contains the data required for query, it no longer needs to be searched in clustered indexes.

Overwriting indexes cannot be any index, and only B-TREE indexes store the corresponding values. In addition, different storage engines implement different indexing methods, and not all storage engines Support overwriting indexes (not supported by Memory and Falcon ).

For index-covered query, You can see "Using index" in the Extra column when Using EXPLAIN ". For example, in the inventory table of sakila, there is a combined index (store_id, film_id). For queries that only need to access these two columns, MySQL can use the index as follows:

mysql> EXPLAIN SELECT store_id, film_id FROM sakila.inventory\G*************************** 1. row ***************************      id: 1 select_type: SIMPLE    table: inventory     type: indexpossible_keys: NULL     key: idx_store_id_film_id   key_len: 3     ref: NULL     rows: 5007    Extra: Using index1 row in set (0.17 sec)

In most engines, the index is overwritten only when the columns accessed by the query statement are part of the index. However, InnoDB is not limited to this. The InnoDB secondary index stores the primary key value in the leaf node. Therefore, the sakila. actor table uses InnoDB and has an index on last_name. Therefore, the index can overwrite the queries that access actor_id, for example:

mysql> EXPLAIN SELECT actor_id, last_name  -> FROM sakila.actor WHERE last_name = 'HOPPER'\G*************************** 1. row ***************************      id: 1 select_type: SIMPLE    table: actor     type: refpossible_keys: idx_actor_last_name     key: idx_actor_last_name   key_len: 137     ref: const     rows: 2    Extra: Using where; Using index

3.3 sort by index

In MySQL, there are two methods to generate an ordered result set: one is to use filesort, and the other is to scan by index order. Using indexes for sorting is very fast, and you can use the same index for searching and sorting at the same time. When the ORDER of indexes is the same as that of the columns in order by and all columns are in the same direction (all or all descending ORDER), you can use indexes to sort indexes. If the query is connected to multiple tables, the index is used only when all columns in order by are columns in the first table. Filesort is used in other cases.

create table actor(actor_id int unsigned NOT NULL AUTO_INCREMENT,name   varchar(16) NOT NULL DEFAULT '',password    varchar(16) NOT NULL DEFAULT '',PRIMARY KEY(actor_id), KEY   (name)) ENGINE=InnoDBinsert into actor(name,password) values('cat01','1234567');insert into actor(name,password) values('cat02','1234567');insert into actor(name,password) values('ddddd','1234567');insert into actor(name,password) values('aaaaa','1234567'); 
mysql> explain select actor_id from actor order by actor_id \G*************************** 1. row ***************************      id: 1 select_type: SIMPLE    table: actor     type: indexpossible_keys: NULL     key: PRIMARY   key_len: 4     ref: NULL     rows: 4    Extra: Using index1 row in set (0.00 sec) mysql> explain select actor_id from actor order by password \G*************************** 1. row ***************************      id: 1 select_type: SIMPLE    table: actor     type: ALLpossible_keys: NULL     key: NULL   key_len: NULL     ref: NULL     rows: 4    Extra: Using filesort1 row in set (0.00 sec) mysql> explain select actor_id from actor order by name \G*************************** 1. row ***************************      id: 1 select_type: SIMPLE    table: actor     type: indexpossible_keys: NULL     key: name   key_len: 18     ref: NULL     rows: 4    Extra: Using index 1 row in set (0.00 sec)

When MySQL cannot use indexes for sorting, it uses its own Sorting Algorithm (quick sorting algorithm) to sort data in the memory (sort buffer). If the memory cannot be loaded, it will partition data on the disk, sort the data blocks, and combine the blocks into ordered result sets (in fact, external sorting ). For filesort, MySQL has two sorting algorithms.

(1) Two passes)

The implementation is to first retrieve the fields to be sorted and the pointer information that can be directly located in the relevant row data, and then sort them in the Set memory (set by sort_buffer_size, after sorting, retrieve the required Columns through the row pointer information again.

Note: This algorithm is used before 4.1. It requires two accesses to the data. In particular, the second read operation will lead to a large number of random I/O operations. On the other hand, the memory overhead is small.

(3) single pass)

This algorithm extracts all required Columns at a time, sorts the Columns in the memory, and outputs The results directly.

Note: This algorithm has been used since MySQL 4.1. It reduces the number of I/O operations and is efficient, but the memory overhead is also large. If we extract the Columns that are not needed, the memory required by the sorting process will be greatly wasted. In Versions later than MySQL 4.1, you can set the max_length_for_sort_data parameter to control whether MySQL chooses the first sort algorithm or the second sort algorithm. When the total size of all the large fields retrieved is greater than that set by max_length_for_sort_data, MySQL selects the first sort algorithm, and vice versa. To improve the sorting performance as much as possible, we naturally want to use the second sort algorithm. Therefore, it is necessary to retrieve only the Columns needed in the Query.

When sorting the join operation, if order by only references the column of the First table, MySQL performs the filesort operation on the table and then performs the connection processing, EXPLAIN outputs "Using filesort"; otherwise, MySQL must generate a temporary table for the query result set and perform the filesort operation after the connection is complete. At this time, EXPLAIN outputs "Using temporary; using filesort ".

3.4 index and lock

The index is very important to InnoDB because it can make the query lock less tuples. This is very important because in MySQL 5.0, InnoDB will not be unlocked until the transaction is committed. There are two reasons: first, even if the InnoDB row-Level Lock overhead is very efficient and the memory overhead is small, there are still overhead in either case. Second, locking unnecessary tuples increases the lock overhead and reduces concurrency.

InnoDB only locks the tuples to be accessed, and indexes can reduce the number of groups accessed by InnoDB. However, this can be achieved only by filtering out unnecessary data at the storage engine layer. Once the index does not allow InnoDB to do that (that is, it cannot achieve the purpose of filtering), the MySQL server can only perform the WHERE operation on the data returned by InnoDB. At this time, it is impossible to avoid locking those tuples: innoDB has locked those tuples and the server cannot be unlocked.
Let's look at an example:

create table actor(actor_id int unsigned NOT NULL AUTO_INCREMENT,name   varchar(16) NOT NULL DEFAULT '',password    varchar(16) NOT NULL DEFAULT '',PRIMARY KEY(actor_id), KEY   (name)) ENGINE=InnoDBinsert into actor(name,password) values('cat01','1234567');insert into actor(name,password) values('cat02','1234567');insert into actor(name,password) values('ddddd','1234567');insert into actor(name,password) values('aaaaa','1234567');
SET AUTOCOMMIT=0;BEGIN;SELECT actor_id FROM actor WHERE actor_id < 4AND actor_id <> 1 FOR UPDATE;

Only 2-3 data is returned for this query. The exclusive lock has been applied to data from 1-3. InnoDB locks tuples 1 because the MySQL query plan uses only the index for Range Query (instead of filtering, the second condition in the WHERE clause is no longer available ):

mysql> EXPLAIN SELECT actor_id FROM test.actor  -> WHERE actor_id < 4 AND actor_id <> 1 FOR UPDATE \G*************************** 1. row ***************************      id: 1 select_type: SIMPLE    table: actor     type: indexpossible_keys: PRIMARY     key: PRIMARY   key_len: 4     ref: NULL     rows: 4    Extra: Using where; Using index1 row in set (0.00 sec) mysql>

It indicates that the storage engine obtains all rows starting from the index until the value of actor_id <4 is false, and the server cannot tell InnoDB to remove the productkey 1.
To verify that row 1 is locked, create another connection and perform the following operations:

SET AUTOCOMMIT=0;BEGIN;SELECT actor_id FROM actor WHERE actor_id = 1 FOR UPDATE;

The query will be suspended until the transaction of the first connection is committed to release the lock (this behavior is necessary for statement-based replication ).

As shown above, when an index is used, InnoDB locks unnecessary tuples. Even worse, if an index is not available for a query, MySQL scans the entire table and locks each tuples, whether or not required.

The above is all the content of this article. I hope it will be helpful for your learning and support for helping customers.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.