Understanding MySQL index and Optimization

Understanding MySQL index and Optimization _ MySQL

Last Update:2018-04-10 Source: Internet

Author: User

Tags mysql index

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Understanding MySQL-index and optimization is at the beginning: indexes have a crucial impact on the query speed. understanding indexes is also the starting point for database performance tuning. Consider the following: assume that a table in the database has 10 ^ 6 records, the page size of the DBMS is 4 K, and 100 records are stored. If no index exists, the query will scan the entire table. in the worst case, if all data pages are not in memory, you need to read 10 ^ 4 pages, if these 10 ^ 4 pages are randomly distributed on the disk, I/O needs to be performed 10 ^ 4 times. assume that the I/O time of each disk is 10 ms (data transmission time is ignored ), it takes 100 s in total (but it is much better ). If you create a B-Tree index, you only need to perform log100 (10 ^ 6) = 3 page reads. in the worst case, it takes 30 ms. This is the effect of indexes. in many cases, when your application performs SQL queries slowly, you should think about whether you can create indexes. Enter the subject:

Chapter 2 index and optimization

1. select the index data type

MySQL supports many data types. selecting the appropriate data type to store data has a great impact on performance. Generally, the following guiding principles can be observed:

(1) smaller data types are generally better: smaller data types usually require less space in the disk, memory, and CPU cache for faster processing.
(2) a simple data type is better: the processing overhead of integer data is smaller than that of characters, because the strings are more complex. In MySQL, the built-in date and time data types should be used instead of strings to store the time, and the IP addresses of integer data types should be used to store the time.
(3) Avoid NULL as much as possible: the column should be specified as not null unless you want to store NULL. In MySQL, it is difficult to query and optimize columns with null values, because they make the index and index statistics and comparison operations more complex. You should replace null values with 0, a special value, or an empty string.

1.1 select an identifier
It is important to select an appropriate identifier. You should consider not only the storage type, but also how MySQL performs computation and comparison. Once the data type is selected, make sure that all related tables use the same data type.
(1) integer: it is usually the best choice for identifiers, because it can be processed faster and can be set to AUTO_INCREMENT.

(2) string: Avoid using strings as identifiers whenever possible. they consume more space and process slowly. Generally, strings are random, so their locations in the index are random, which leads to page splitting and random access to the disk, clustered index split (for storage engines that use clustered index ).

2. indexing
For any DBMS, indexing is the most important factor for optimization. For a small amount of data, the impact of no suitable index is not very big, but as the data volume increases, the performance will drop sharply.
If multiple columns are indexed (composite indexes), the column order is very important. MySQL can only search for the leftmost prefix of the index effectively. For example:
Assume that the combined index it1c1c2 (c1, c2) exists. the query statement select * from t1 where c1 = 1 and c2 = 2 can use this index. The query statement select * from t1 where c1 = 1 can also use this index. However, the query statement select * from t1 where c2 = 2 cannot use this index, because there is no bootstrap column for the combined index, that is, to use column c2 for search, c1 must be equal to a value.

2.1. index type
Indexes are implemented in the storage engine rather than on the server layer. Therefore, the indexes of each storage engine are not necessarily the same, and not all storage engines support all index types.
2.1.1. B-Tree indexes
Assume that the following table is used:

Create table People (

Last_name varchar (50) not null,

First_name varchar (50) not null,

Dob date not null,

Gender enum ('M', 'F') not null,

Key (last_name, first_name, dob)

);

The index contains the last_name, first_name, and dob columns of each row in the table. Its structure is roughly as follows:

The values stored in the index are arranged in the order in the index column. You can use the B-Tree index to query the full keyword, keyword range, and keyword prefix. of course, if you want to use an index, you must ensure that the leftmost prefix of the index (leftmost prefix of the index).
(1) Match the full value: specify a specific value for all columns in the index. For example, the medium index can help you find Cuba Allen born on January 1.
(2) Match a leftmost prefix: you can use the index to find the person whose last name is Allen and use only the 1st columns in the index.
(3) Match a column prefix: for example, you can use the index to find the person whose last name starts with J, which only uses the 1st column in the index.
(4) Match a range of values: you can use the index to find the person with the last name between Allen and Barrymore. only 1st columns of the index are used.
(5) exact matching part while range matching (Match one part exactly and match a range on another part): You can use the index to find that the last name is Allen, the first name starts with the letter K.
(6) only query indexes (Index-only queries): If all queried columns are in the Index, you do not need to read the values of the tuples.
Because the nodes in the B-tree are stored sequentially, you can use indexes to search for (find some values) or perform ORDER BY on the query results. Of course, the use of B-tree indexes has the following restrictions:
(1) the query must start with the leftmost column of the index. I have already mentioned this many times. For example, you cannot use indexes to search for people born on a certain day.
(2) you cannot skip an index column. For example, you cannot use the index to find the person whose last name is Smith and was born on a certain day.
(3) the storage engine cannot use the column on the right of the range condition in the index. For example, if your query statement is WHERE last_name = "Smith" AND first_name LIKE 'J % 'AND dob = '2017-12-23 ', only the first two columns in the index are used for this query, because LIKE is a range query.

2.1.2. Hash index
In MySQL, only the Memory storage engine displays the support for hash indexes, which is the default index type of the Memory table, although the Memory table can also use B-Tree indexes. The Memory storage engine supports non-unique hash indexes, which is rare in the database field. if multiple values have the same hash code, indexes store row pointers in the same hash table using a linked list.
Assume that the following table is created:
Create table testhash (
Fname VARCHAR (50) not null,
Lname VARCHAR (50) not null,
Key using hash (fname)
) ENGINE = MEMORY;
The data contained is as follows:

Assume that the index uses the hash function f () as follows:

F ('arjen') = 2323

F ('Baron') = 7437

F ('Peter ') = 8784

F ('Vadim ') = 2458

The index structure is roughly as follows:

Slots are sequential, but records are not sequential. When you execute
Mysql> SELECT lname FROM testhash WHERE fname = 'Peter ';
MySQL calculates the hash value of 'Peter 'and then queries the row pointer of the index. Because f ('Peter ') = 8784, MySQL searches for 8784 in the index and obtains the pointer pointing to record 3.
Because indexes only store very short values, the indexes are very compact. The Hash value does not depend on the data type of the column. The index of a TINYINT column is as large as that of a long string column.

Hash indexes have the following restrictions:
(1) because indexes only contain hash code and record pointers, MySQL cannot avoid reading records by using indexes. However, the access records in the memory are very fast and will not have a huge impact on the performance.
(2) hash indexes cannot be used for sorting.
(3) Hash indexes do not support partial key matching, because the whole index value is used to calculate the hash value.
(4) Hash indexes only support equivalent comparison, for example, using =, IN () and <=>. WHERE price> 100 cannot accelerate queries.
2.1.3. spatial (R-Tree) indexes
MyISAM supports spatial indexes and is mainly used for geospatial data types, such as GEOMETRY.
2.1.4 Full-text index
Full-text indexing is a special index type of MyISAM and is mainly used for full-text retrieval.

3. high-performance index policies
3.1. Clustered Indexes (Clustered Indexes)
Clustering indexes ensure that the physical locations of tuples with the same keyword values are also the same (so it is not recommended to create clustering indexes for string types, especially random strings, which will cause the system to perform a lot of moving operations ), A table can have only one clustered index. Because indexes are implemented by the storage engine, not all engines support clustered indexes. Currently, only solidDB and InnoDB are supported.
The cluster index structure is roughly as follows:

Note: The Leaf page contains the complete tuples, while the inner node page only contains the indexed columns (the indexed column is an integer type ). Some DBMS allow users to specify clustering indexes, but MySQL's storage engine does not support this so far. InnoDB creates a clustered index for the primary key. If you do not specify a primary key, InnoDB will replace it with a unique and non-null index. If such an index does not exist, InnoDB defines a hidden primary key and creates a clustered index for it. In general, DBMS stores actual data in the form of clustered indexes, which is the basis of other secondary indexes.

3.1.1 Comparison of data layout of InnoDB and MyISAM
To better understand clustering indexes and non-clustering indexes, or primary indexes and second indexes (MyISAM does not support clustering indexes), we can compare the data layout of InnoDB and MyISAM. for the following table:

Create table layout_test (

Col1 int not null,

Col2 int not null,

Primary key (col1 ),

KEY (col2)

);

Assume that the value of the primary key is between 1---10, 000, and is inserted in random order, and then optimized using optimize table. Col2 is randomly assigned a value ranging from 1 to, so many duplicate values exist.
(1) data layout of MyISAM
The layout is very simple. MyISAM stores data on the disk in the order of insertion, as shown below:

Note: the row number on the left starts from 0. Because the size of the tuples is fixed, MyISAM can easily locate a certain byte from the beginning of the table.
The index structure of the established primary key is roughly as follows:

Note: MyISAM does not support clustered indexes. each leaf node in the index only contains row numbers, and the leaf nodes are stored in the order of col1.
Let's take a look at the index structure of col2:

In fact, in MyISAM, the primary key is no different from other indexes. Primary key is only a unique non-null index called PRIMARY.

(2) InnoDB data layout
InnoDB stores data in the form of clustered indexes, so its data layout is very different. It stores the following table structure:

Note: each leaf node in the clustered index contains the value of the primary key, the transaction ID, and the rollback pointer-used for transactions and MVCC, and the remaining columns (such as col2 ).

Compared with MyISAM, secondary indexes differ greatly from clustered indexes. The leaf of the InnoDB secondary index contains the primary key value, instead of the row pointers, which reduces the overhead of maintaining the secondary index when the mobile data or data page is split, because InnoDB does not need to update the index row pointer. Its structure is roughly as follows:

Comparison of clustered indexes and non-clustered index tables:

3.1.2 insert rows in the order of primary keys (InnoDB)

If you use InnoDB without special clustered indexes, a good practice is to use the proxy primary key (surrogate key)-independent of the data in your application. The simplest way is to use an AUTO_INCREMENT column, which ensures that records are inserted in order and improves the performance of queries using the primary key for connection. Avoid random clustering primary keys as much as possible. for example, a string primary key is a bad choice, which makes the insert operation random.

3.2 Covering Indexes)
If the index contains all the data that meets the query conditions, it is called overwriting index. Covering indexes is a very powerful tool that can greatly improve query performance. You only need to read the index instead of reading data, which has the following advantages:
(1) index items are usually smaller than records, so MySQL accesses less data;
(2) indexes are stored in order of values. less I/O is required than random access records;
(3) most data engines can better cache indexes. For example, MyISAM only caches indexes.
(4) covering indexes is particularly useful for InnoDB tables, because InnoDB uses clustered indexes to organize data. if the secondary index contains the data required for query, it no longer needs to be searched in clustered indexes.
Overwriting indexes cannot be any index, and only B-TREE indexes store the corresponding values. In addition, different storage engines implement different indexing methods, and not all storage engines support overwriting indexes (not supported by Memory and Falcon ).
For index-covered query, you can see "Using index" in the Extra column when Using EXPLAIN ". For example, in the inventory table of sakila, there is a combined index (store_id, film_id). for queries that only need to access these two columns, MySQL can use the index as follows:

Mysql> explain select store_id, film_id FROM sakila. inventory/G

* *************************** 1. row ***************************

Id: 1

Select_type: SIMPLE

Table: inventory

Type: index

Possible_keys: NULL

Key: idx_store_id_film_id

Key_len: 3

Ref: NULL

Rows: 5007

Extra: Using index

1 row in set (0.17 sec)

In most engines, the index is overwritten only when the columns accessed by the query statement are part of the index. However, InnoDB is not limited to this. The InnoDB secondary index stores the primary key value in the leaf node. Therefore, the sakila. actor table uses InnoDB and has an index on last_name. Therefore, the index can overwrite the queries that access actor_id, for example:

Mysql> explain select actor_id, last_name

-> FROM sakila. actor WHERE last_name = 'Hopper '/G

* *************************** 1. row ***************************

Id: 1

Select_type: SIMPLE

Table: actor

Type: ref

Possible_keys: idx_actor_last_name

Key: idx_actor_last_name

Key_len: 137

Ref: const

Rows: 2

Extra: Using where; Using index

3.3 sort by index
In MySQL, there are two methods to generate an ordered result set: one is to use filesort, and the other is to scan by index order. Using indexes for sorting is very fast, and you can use the same index for searching and sorting at the same time. When the ORDER of indexes is the same as that of the columns in order by and all columns are in the same direction (all or all descending ORDER), you can use indexes to sort indexes. If the query is connected to multiple tables, the index is used only when all columns in order by are columns in the first table. Filesort is used in other cases.

Create table actor (

Actor_id int unsigned not null AUTO_INCREMENT,

Name varchar (16) not null default '',

Password varchar (16) not null default '',

Primary key (actor_id ),

KEY (name)

) ENGINE = InnoDB

Insert into actor (name, password) values ('cat01', '123 ');

Insert into actor (name, password) values ('cat02', '123 ');

Insert into actor (name, password) values ('ddddd ', '123 ');

Insert into actor (name, password) values ('aaaaa', '123 ');

Mysql> explain select actor_id from actor order by actor_id/G

* *************************** 1. row ***************************

Id: 1

Select_type: SIMPLE

Table: actor

Type: index

Possible_keys: NULL

Key: PRIMARY

Key_len: 4

Ref: NULL

Rows: 4

Extra: Using index

1 row in set (0.00 sec)

Mysql> explain select actor_id from actor order by password/G

* *************************** 1. row ***************************

Id: 1

Select_type: SIMPLE

Table: actor

Type: ALL

Possible_keys: NULL

Key: NULL

Key_len: NULL

Ref: NULL

Rows: 4

Extra: Using filesort

1 row in set (0.00 sec)

Mysql> explain select actor_id from actor order by name/G

* *************************** 1. row ***************************

Id: 1

Select_type: SIMPLE

Table: actor

Type: index

Possible_keys: NULL

Key: name

Key_len: 18

Ref: NULL

Rows: 4

Extra: Using index

1 row in set (0.00 sec)

When MySQL cannot use indexes for sorting, it uses its own sorting algorithm (quick sorting algorithm) to sort data in the memory (sort buffer). if the memory cannot be loaded, it will partition data on the disk, sort the data blocks, and combine the blocks into ordered result sets (in fact, external sorting ). For filesort, MySQL has two sorting algorithms.
(1) Two passes)
The implementation is to first retrieve the fields to be sorted and the pointer information that can be directly located in the relevant row data, and then sort them in the set memory (set by sort_buffer_size, after sorting, retrieve the required Columns through the row pointer information again.
Note: This algorithm is used before 4.1. it requires two accesses to the data. in particular, the second read operation will lead to a large number of random I/O operations. On the other hand, the memory overhead is small.
(3) single pass)
This algorithm extracts all required Columns at a time, sorts the Columns in the memory, and outputs the results directly.
Note: This algorithm has been used since MySQL 4.1. It reduces the number of I/O operations and is efficient, but the memory overhead is also large. If we extract the Columns that are not needed, the memory required by the sorting process will be greatly wasted. In versions later than MySQL 4.1, you can set the max_length_for_sort_data parameter to control whether MySQL chooses the first sort algorithm or the second sort algorithm. When the total size of all the large fields retrieved is greater than that set by max_length_for_sort_data, MySQL selects the first sort algorithm, and vice versa. To improve the sorting performance as much as possible, we naturally want to use the second sort algorithm. Therefore, it is necessary to retrieve only the Columns needed in the Query.

When sorting the join operation, if order by only references the column of the first table, MySQL performs the filesort operation on the table and then performs the connection processing, EXPLAIN outputs "Using filesort"; otherwise, MySQL must generate a temporary table for the query result set and perform the filesort operation after the connection is complete. at this time, EXPLAIN outputs "Using temporary; using filesort ".

3.4 Index and lock
The index is very important to InnoDB because it can make the query lock less tuples. This is very important because in MySQL 5.0, InnoDB will not be unlocked until the transaction is committed. There are two reasons: first, even if the InnoDB row-level lock overhead is very efficient and the memory overhead is small, there are still overhead in either case. Second, locking unnecessary tuples increases the lock overhead and reduces concurrency.
InnoDB only locks the tuples to be accessed, and indexes can reduce the number of groups accessed by InnoDB. However, this can be achieved only by filtering out unnecessary data at the storage engine layer. Once the index does not allow InnoDB to do that (that is, it cannot achieve the purpose of filtering), the MySQL server can only perform the WHERE operation on the data returned by InnoDB. at this time, it is impossible to avoid locking those tuples: innoDB has locked those tuples and the server cannot be unlocked.
Let's look at an example:

Create table actor (

Actor_id int unsigned not null AUTO_INCREMENT,

Name varchar (16) not null default '',

Password varchar (16) not null default '',

Primary key (actor_id ),

KEY (name)

) ENGINE = InnoDB

Insert into actor (name, password) values ('cat01', '123 ');

Insert into actor (name, password) values ('cat02', '123 ');

Insert into actor (name, password) values ('ddddd ', '123 ');

Insert into actor (name, password) values ('aaaaa', '123 ');

Set autocommit = 0;

BEGIN;

SELECT actor_id FROM actor WHERE actor_id <4

AND actor_id <> 1 for update;

Only 2-3 data is returned for this query. the exclusive lock has been applied to data from 1-3. InnoDB locks tuples 1 because the MySQL Query plan uses only the index for range query (instead of filtering, the second condition in the WHERE clause is no longer available ):

Mysql> explain select actor_id FROM test. actor

-> WHERE actor_id <4 AND actor_id <> 1 for update/G

* *************************** 1. row ***************************

Id: 1

Select_type: SIMPLE

Table: actor

Type: index

Possible_keys: PRIMARY

Key: PRIMARY

Key_len: 4

Ref: NULL

Rows: 4

Extra: Using where; Using index

1 row in set (0.00 sec)

Mysql>

It indicates that the storage engine obtains all rows starting from the index until the value of actor_id <4 is false, and the server cannot tell InnoDB to remove the productkey 1.
To verify that row 1 is locked, create another connection and perform the following operations:

Set autocommit = 0;

BEGIN;

SELECT actor_id FROM actor WHERE actor_id = 1 for update;

The query will be suspended until the transaction of the first connection is committed to release the lock (this behavior is necessary for statement-based replication ).
As shown above, when an index is used, InnoDB locks unnecessary tuples. Even worse, if an index is not available for a query, MySQL scans the entire table and locks each tuples, whether or not required.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More