MySQL's index and its optimization

Source: Internet
Author: User
Tags mysql version one table

Objective

Indexes have a critical impact on the speed of queries, and understanding indexes is also a starting point for database Performance tuning. Consider the following scenario, assuming that a table in the database has 10^6 records, the DBMS has a page size of 4K, and stores 100 records. If there is no index, the query will scan the entire table, in the worst case, if all the data pages are not in memory, need to read 10^4 pages, if the 10^4 pages on the disk randomly distributed, need to 10^4 times I/O, assuming that the disk each time I/O time is 10ms (ignoring data transfer time), It will take a total of 100s (but actually much better). If the B-tree index is established, only log100 (10^6) = 3 page reads are required, and the worst case time is 30ms. This is the effect of the index, and many times, when your application makes SQL queries very slowly, you should think about whether you can build an index. Down into the whole.

Index and its optimization

1 Select the data type of the index

MySQL supports many data types, and choosing the right data type to store data has a significant impact on performance. In general, you can follow some of the following guidelines:

(1) Smaller data types are generally better: smaller data types typically require less space in disk, memory, and CPU caches, and are processed faster.

(2) Simple data types are better: integer data is less expensive to handle than characters, because string comparisons are more complex. In MySQL, you should use a built-in date and time data type instead of a string to store the time, and an integer data type to store the IP address.

(3) Try to avoid null: The column should be specified as NOT NULL unless you want to store null. In MySQL, columns with null values are difficult to query optimization because they complicate indexing, index statistics, and comparison operations. You should use 0, a special value, or an empty string instead of a null value.

1.1 Select identifiers

It is important to select the appropriate identifiers. The choice should not only consider the storage type, but also consider how MySQL is calculated and compared. Once the data type is selected, you should ensure that all related tables use the same data type.

(1) Integer type: Usually the best choice as an identifier, because it can be processed faster and can be set to auto_increment.

(2) String: Try to avoid using strings as identifiers, which consume better space and are slower to handle. And, generally, strings are random, so their position in the index is also random, which results in page splitting, random access to the disk, and clustered index splitting (for storage engines that use clustered indexes).

2 Getting Started with indexing

For any DBMS, the index is the most important factor for optimization. For a small amount of data, the lack of proper index impact is not very large, but when the amount of data increases, the performance will drop sharply.

If multiple columns are indexed (combined), the order of the columns is important, and MySQL can only make valid lookups on the leftmost prefix of the index. For example:

Suppose there is a composite index IT1C1C2 (C1,C2), the query statement select * from T1 where c1=1 and c2=2 can use the index. The query statement select * FROM T1 where C1=1 is also able to use the index. However, the query statement select * FROM T1 where c2=2 is not able to use the index because there is no combined index of the boot column, that is, to use the C2 column to find, the C1 equals a value must occur.

2.1 Types of indexes

Indexes are implemented in the storage engine, not in the server tier. Therefore, the indexes for each storage engine are not necessarily identical, and not all storage engines support all index types.

2.1.1 B-tree Index

Let's say it's the next table:

CREATE TABLE People (

last_name varchar () NOT NULL,

first_name varchar () NOT NULL,

DOB date NOT NULL,

Gender enum (' m ', ' F ') is not NULL,

Key (last_name, first_name, DOB)

);

Its index contains last_name, first_name, and DOB columns for each row in the table. The structure is broadly as follows:

The values stored by the index are arranged in the order of the indexed columns. You can use the B-tree index for full-keyword, keyword-range, and keyword-prefix queries, and of course, if you want to use an index, you must ensure that you query by the leftmost prefix of the index (leftmost prefix of the "index").

(1) Match full value: Specify a specific value for all columns in the index. For example, a mid-index can help you find Cuba Allen, born in 1960-01-01.

(2) match the leftmost prefix (match a leftmost prefix): You can use the index to find the last person named Allen, using only the 1th column in the index.

(3) matching column prefix (match a column prefix): For example, you can use the index to find the last name of the person starting with J, which only uses the 1th column in the index.

(4) Range query matching values (match a range of values): You can use the index to find the last name between Allen and Barrymore, using only the 1th column in the index.

(5) The matching section is accurate and the rest of the range is matched (match one part exactly and match a range on another parts): You can use the index to find last name Allen, and first name begins with the letter K 。

(6) Querying the index only (index-only queries): If the queried columns are in the index, you do not need to read the values of the tuples.

Because the nodes in the B-tree are stored sequentially, you can use the index for lookups (some values are found), or you can order by for the query results. Of course, using the B-tree index has some of the following limitations:

(1) The query must start at the leftmost column of the index. It has been mentioned many times about this. For example, you can't use an index to find people born on a given day.

(2) You cannot skip an indexed column. For example, you cannot use an index to find a person who was named Smith and was born on a day.

(3) The storage engine cannot use the column to the right of the range condition in the index. For example, if your query statement is where last_name= "Smith" and first_name like ' j% ' and dob= ' 1976-12-23 ', then the query will only use the first two columns in the index because like is a range query.

2.1.2 Hash Index

MySQL, only memory storage engine display Support hash index, is the memory table default index type, although memory table can also use B-tree index. The memory storage engine supports non-unique hash indexes, which are rare in the database domain, and if multiple values have the same hash code, the index saves their row pointers to the same hash table item with the linked list.

Suppose you create one of the following tables:

CREATE TABLE Testhash (
FName VARCHAR () not NULL,
LName VARCHAR () not NULL,
KEY USING HASH (fname)
) Engine=memory;

The following data is included:

Suppose the index uses the hash function f (), as follows:

F (' Arjen ') = 2323

F (' Baron ') = 7437

F (' Peter ') = 8784

F (' Vadim ') = 2458

At this point, the structure of the index is probably as follows:

The slots are orderly, but the records are not orderly. When you perform

Mysql> SELECT lname from Testhash WHERE fname= ' Peter ';

MySQL calculates the ' Peter ' hash value and then queries the index's row pointer through it. Because F (' Peter ') = 8784,mysql finds 8784 in the index, it gets a pointer to record 3.

Because indexes only store very short values, the index is very compact. The hash value does not depend on the data type of the column, and the index of a tinyint column is as large as the index of a long string column.

The hash index has some of the following limitations:

(1) Because the index contains only hash code and record pointers, MySQL cannot avoid reading records by using an index. But accessing the in-memory records is very rapid and does not have too much impact on sex.

(2) Cannot use hash index to sort.

(3) The hash index does not support partial matching of keys, because the hash value is computed by the entire index value.

(4) The hash index only supports equivalent comparisons, such as using =,in () and <=>. For where PRICE>100 does not speed up the query.

2.1.3 Space (r-tree) index

MyISAM supports spatial indexes, primarily for geospatial data types, such as geometry.

2.1.4 Full-text (full-text) index

Full-text indexing is a special type of index for MyISAM, and is mainly used for full-text retrieval.

3 High-performance indexing strategy

3.1 Clustered index (Clustered Indexes)

The clustered index guarantees that the value of the key is similar to the physical location of the tuple store (so the string type should not be clustered index, especially the random string, will make the system to carry out a large number of mobile operations), and a table can only have one clustered index. Because indexes are implemented by the storage engine, not all engines support clustered indexes. Currently, only SOLIDDB and InnoDB support.

The structure of the clustered index is roughly as follows:

Note: The leaf page contains the full tuple, while the Inner node page contains only the indexed columns (indexed column integers). Some DBMS allow users to specify clustered indexes, but MySQL's storage engine is not supported so far. InnoDB the clustered index on the primary key. If you do not specify a primary key, InnoDB replaces it with an index that has a unique and non-null value. If such an index does not exist, InnoDB defines a hidden primary key and then establishes a clustered index on it. In general, the DBMS stores the actual data in the form of a clustered index, which is the basis for other two-level indexes.

Comparison of data layouts for 3.1.1 InnoDB and MyISAM

To better understand clustered and nonclustered indexes, or primary indexes and second indexes (MyISAM does not support clustered indexes), compare the data layouts of InnoDB and MyISAM for the following table:

CREATE TABLE Layout_test (

col1 int not NULL,

col2 int not NULL,

PRIMARY KEY (col1),

KEY (col2)

);

Assume that the value of the primary key is between 1---10,000, inserted in a random order, and then optimized with optimize table. Col2 randomly assigns a value between 1---100, so there are many duplicate values.

(1) MyISAM data layout

The layout is simple and MyISAM stores data on disk in the order in which they are inserted, as follows:

Note: The left side is the row number, starting with 0. Because the tuple size is fixed, MyISAM can easily find the position of a byte from the beginning of the table.

The index structure of some of the established primary keys is broadly as follows:

Note: MyISAM does not support clustered indexes, and each leaf node in the index contains only the row number, and the leaf nodes are stored in col1 order.

Let's look at the index structure of col2:

In fact, there is no difference between primary key and other indexes in MyISAM. Primary key is just a unique, non-empty index called Primary.

(2) InnoDB data layout

InnoDB stores data in the form of clustered indexes, so its data layout is very different. It stores the table structure in the following general form:

Note: Each leaf node in the clustered index contains the value of the primary key, the transaction ID and the rollback pointer (rollback pointer)--for transactions and MVCC, and for the remaining columns (such as col2).

Two-level indexes are very different from clustered indexes compared to MyISAM. InnoDB's two-index leaf contains the value of primary key instead of the row pointer (row pointers), which reduces the overhead of maintaining a two-level index when moving data or splitting the data page, because InnoDB does not need to update the index's row pointers. The structure is broadly as follows:

Comparison of clustered and non-clustered index tables:

3.1.2 Inserting rows in order of primary key (InnoDB)

If you use InnoDB and do not need a special clustered index, a good practice is to use the Proxy primary key (surrogate key)--independent of the data in your app. The simplest approach is to use a auto_increment column, which ensures that records are inserted sequentially and improves the performance of queries that use primary key to connect. You should try to avoid random clustered primary keys, for example, the string primary key is a bad choice, which makes the insertion operation random.

3.2 Overlay Index (covering Indexes)

If the index contains all the data that satisfies the query, it is called the Overwrite index. The overlay index is a very powerful tool that can greatly improve query performance. Just reading the index without having to read the data has some advantages:

(1) Index entries are usually smaller than records, so MySQL accesses less data;

(2) Indexes are stored in the order of value, and less I/O is required relative to random access records.

(3) Most engines can cache indexes better. such as MyISAM only cache index;

(4) Overwriting an index is especially useful for innodb tables because InnoDB uses clustered indexes to organize data, and if the two-level index contains the data required for a query, it is no longer necessary to look in the clustered index.

The overwrite index cannot be any index, only the B-tree index stores the corresponding value. and different storage engines implement a different way of overwriting indexes, not all storage engines support overwriting indexes (memory and Falcon are not supported).

For an index overlay query (index-covered query), you can see "using index" in the extra column when using explain. For example, in Sakila's inventory table, there is a composite index (STORE_ID,FILM_ID), and for queries that only need access to these two columns, MySQL can use the index as follows:

mysql> EXPLAIN S Elect store_id, film_id from sakila.inventory\g

*************************** 1. Row ***************************

           id:1

 select_type:simple

        table:inventory

          type:index

Possible_keys:null

           key:idx_store_id_film_id

      key_len:3

           ref:null

          rows:5007

        extra:using index

1 row In Set (0.17 sec)

In most engines, the index is overwritten only if the column that the query statement accesses is part of the index. However, InnoDB is not limited to this, InnoDB's level two index stores the value of primary key in the leaf node. Therefore, the Sakila.actor table uses InnoDB and is indexed on last_name, so the index can overwrite queries that access actor_id, such as:

mysql> EXPLAIN S Elect store_id, film_id from sakila.inventory\g

*************************** 1. Row ***************************

           id:1

 select_type:simple

        table:inventory

          type:index

Possible_keys:null

           key:idx_store_id_film_id

      key_len:3

           ref:null

          rows:5007

        extra:using index

1 row In Set (0.17 sec)

3.3 Sorting using an index

In MySQL, there are two ways to generate an ordered result set: one is to use Filesort, and the other is to scan by index order. Sorting with an index is very fast, and you can use the same index to find and sort operations simultaneously. Indexes can be used when the order of the indexes is the same as the order of the columns in order BY and all columns are in the same direction (all ascending or all descending). If the query is connected to more than one table, the index is used only if all the columns in the order by are columns of the first table. In other cases, Filesort will be used.

CREATE TABLE actor (

actor_id int unsigned not NULL auto_increment,

Name varchar (+) not NULL DEFAULT ' ',

Password varchar (+) not NULL DEFAULT ' ',

PRIMARY KEY (actor_id),

KEY (name)

) Engine=innodb

Insert into actor (Name,password) VALUES (' cat01 ', ' 1234567 ');

Insert into actor (Name,password) VALUES (' cat02 ', ' 1234567 ');

Insert into actor (Name,password) VALUES (' ddddd ', ' 1234567 ');

Insert into actor (Name,password) VALUES (' aaaaa ', ' 1234567 ');

Mysql> Explain select actor_id from actor order by actor_id \g

1. Row ***************************

Id:1

Select_type:simple

Table:actor

Type:index

Possible_keys:null

Key:primary

Key_len:4

Ref:null

Rows:4

Extra:using Index

1 row in Set (0.00 sec)

Mysql> Explain select actor_id from actor order by password \g

1. Row ***************************

Id:1

Select_type:simple

Table:actor

Type:all

Possible_keys:null

Key:null

Key_len:null

Ref:null

Rows:4

Extra:using Filesort

1 row in Set (0.00 sec)

Mysql> Explain select actor_id from actor order by name \g

1. Row ***************************

Id:1

Select_type:simple

Table:actor

Type:index

Possible_keys:null

Key:name

Key_len:18

Ref:null

Rows:4

Extra:using Index

1 row in Set (0.00 sec)

When MySQL cannot use an index for sorting, it uses its own sorting algorithm (fast sorting algorithm) to sort the data in memory (sort buffer), and if memory is not loaded, it blocks the data on the disk, then sorts the individual blocks, The blocks are then combined into an ordered set of results (which is actually an out-of-order). There are two kinds of sorting algorithms for Filesort,mysql.

(1) Two times scanning algorithm (passes)

This is done by first removing the fields that need to be sorted and the pointer information that can be directly anchored to the relevant row data, and then sorting through the set of memory (through the parameter sort_buffer_size setting), and then again fetching the required columns through the row pointer information.

Note: This algorithm is the algorithm used before 4.1, it requires two access data, especially the second read operation will result in a large number of random I/O operations. On the other hand, memory overhead is small.

(2) One-time scanning algorithm (single pass)

The algorithm takes all the required columns out at once, and outputs the result directly after sorting in memory.

Note: This algorithm is used starting with MySQL version 4.1. It reduces the number of I/O, is more efficient, but also has a large memory overhead. If we take out the columns that we don't need, we're going to waste a lot of the memory needed for the sequencing process. In the version after MySQL 4.1, you can control whether MySQL chooses the first sorting algorithm or the second by setting the Max_length_for_sort_data parameter. MySQL chooses to use the first sorting algorithm when the total size of all large fields is larger than the Max_length_for_sort_data setting, whereas the second is selected. In order to improve the sorting performance as much as possible, we naturally prefer to use the second sorting algorithm, so it is necessary to simply take out the required Columns in Query.

When the join operation is sorted, if order by simply refers to the column of the first table, MySQL filesort the table and then connects processing, at which point the explain output "Using filesort"; MySQL must generate a temporary table for the result set of the query and perform a filesort operation after the connection is complete, at which point the explain output "Using temporary; Using Filesort ".

3.4 Indexing and locking

Indexes are very important for InnoDB because it allows you to lock fewer tuples in a query. This is important because in MySQL 5.0, InnoDB is unlocked until the transaction commits. There are two reasons: first, even though the overhead of InnoDB row-level locks is very efficient and memory overhead is small, there is still overhead anyway. Second, locking the unwanted tuples increases the cost of the lock and reduces concurrency.

InnoDB locks only on tuples that need access, and indexes reduce the number of tuples accessed by InnoDB. However, this can only be achieved if the storage engine layer filters out the unwanted data. Once the index does not allow InnoDB to do so (that is, it does not achieve the purpose of filtering), the MySQL server can only do where the data returned by InnoDB, at this time, has been unable to avoid locking those tuples: InnoDB has locked those tuples, the server cannot be unlocked.
Let's look at an example:

CREATE TABLE actor (

Actor_id int unsigned not NULL auto_increment,

Name      varchar ( ) NOT null default ',

password        varchar (+) NOT null default ',

PRIMARY KEY (actor_id),

 KEY     (name)

) Engine=innodb

INSERT into actor (Name,password) VALUES (' cat01 ', ' 1234567 ');

INSERT into actor (Name,password) VALUES (' cat02 ', ' 1234567 ');

INSERT into actor (Name,password) VALUES (' ddddd ', ' 1234567 ');

INSERT into actor (Name,password) VALUES (' aaaaa ', ' 1234567 ');

SET autocommit=0;

BEGIN;

SELECT actor_id from actor WHERE actor_id < 4

and actor_id <> 1 for UPDATE;

The query returns only 2---3 of the data, and it actually has a lock on the data of 1---3. InnoDB locking tuple 1 is because the query plan for MySQL only uses the index for the scope query (without filtering, where the second condition is no longer available for indexing):

mysql> EXPLAIN SELECT actor_id from test.actor

  & Nbsp; -> WHERE actor_id < 4 and actor_id <> 1 for UPDATE \g

*************************** 1. Row * * * * * * *

           id:1

 select_type:simple

        table:actor

          type:index

Possible_keys:primary

           key:primary

      key_len:4

          ref:null

         rows:4

        extra:using where; Using index

1 row in Set (0.00 sec)

 

Mysql>

Indicates that the storage engine starts at the beginning of the index, gets all the rows until Actor_id<4 is false, and the server cannot tell InnoDB to get rid of tuple 1.

In order to prove that row 1 has been locked, we have a separate connection to do the following:

SET autocommit=0;

BEGIN;

SELECT actor_id from actor WHERE actor_id = 1 for UPDATE;

The query is suspended until the first connected transaction commits to release the lock, which is necessary for statement-based replication (statement-based replication).

As shown above, when using an index, InnoDB locks the tuples it does not need. Worse, if the query cannot use the index, MySQL will scan the whole table and lock each tuple, whether it is really needed or not.

MySQL's index and its optimization

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.