mysql-High Performance indexing strategy

Source: Internet
Author: User
Tags compact percona percona server

Standalone index:

A stand-alone index refers to an indexed column that cannot be part of an expression, or a function's argument

  Example 1:

SELECTactor_id fromActorWHEREactor_id+1=5--This writing, even if the index on the actor_id, it does not work

  Example 2:

SELECT....WHERETo_days (current_date)-To_days (Date_col)<= Ten --It's also a false notation

Multi-column index (Federated Index) & Select the appropriate index column order:

A multicolumn index (Multiple-column Indexes) is also called a composite Index (composite index), which also indexes multiple columns at the same time.

When do I use a multi-column index?
    • When a server intersects multiple indexes (usually with multiple and conditions), it usually means that a multi-column index containing all the related columns is required, rather than multiple independent single-column indexes.
    • When a server needs to do a federated operation on multiple indexes (usually with multiple or conditions), it often takes a lot of CPU and memory resources to cache, sort, and merge operations on the algorithm. In particular, when some of these indexes are not highly selective, a merge scan is required to return large amounts of data.
Effective rules for multi-column indexes:

For example (A,B,C), ABC is a good sequence, in any section of a of the B is a good order, any section B below C is a good sequence. The effective principle of multi-column index is that it takes effect in the past, and if an index in the middle is not used, the index part before the breakpoint is working, and the index after the breakpoint does not work;

also be aware that: (A,B,C) Multi-column index and (A,C,B) is not the same, see the above figure also can see that the relationship order is not the same;

Analyze several practical examples to enhance understanding

(0)Select *  fromMyTablewhereA=3  andB=5  andC=4; --ABC Three indexes are used within the where condition, and all play a role (1)Select *  fromMyTablewhereC=4  andB=6  andA=3; --This statement lists only to show that MySQL is not so stupid, where the order of the conditions in the query will be automatically optimized by MySQL, the effect is the same as the previous sentence (2)Select *  fromMyTablewhereA=3  andC=7; --a uses the index, B is useless, so C is not used for the index effect (3)Select *  fromMyTablewhereA=3  andB>7  andC=3; --a used, b also used, C is not used, this place B is the range value, also calculate a breakpoint, but itself used to index (4)Select *  fromMyTablewhereB=3  andC=4; --Because the a index is not used, there is no index effect on this BC (5)Select *  fromMyTablewhereA>4  andB=7  andC=9; --a use B is not used, C is not used (6)Select *  fromMyTablewhereA=3 Order  byb; --a used the index, b in the result sort also uses the index effect, said before, a below any paragraph of B is a good order (7)Select *  fromMyTablewhereA=3 Order  byC; --a used the index, but this place C did not play the sorting effect, because the intermittent point, using explain can see Filesort (8)Select *  fromMyTablewhereB=3 Order  byA; --b does not use indexes, and a in sort does not have an index effect

There is a rule of thumb for how to select the column order of an index: Place the highest-selectivity column at the forefront of the index. (Refer to ①)

When sorting and grouping are not to be considered, it is usually good to put the highest selectivity column in front of it. This time the index is only used to optimize the lookup of the Where condition

Prefix index and index selectivity: the prefix index can effectively reduce the size of the index file and increase the speed of the index. But the prefix index also has its drawbacks:

1. The prefix index can no longer be used in Oorder by or GROUP by;

2. It is also not possible to use them as an overlay index (covering).

Syntax for establishing a prefix index:

ALTER TABLE table_name ADD KEY (column_name (prefix_length));

Example:

ALTER TABLE City ADD KEY (CityName (7));

What is called the selectivity of the index? ①

The so-called Index selectivity (selectivity) refers to the ratio of non-repeating index values (also known as cardinality, cardinality) to the number of table records (#T)

selectivity = cardinality/#T

Obviously the selectivity of the value range is (0,1], the higher the selectivity of the index value of the greater the value

SELECT Count(DISTINCT(title))/Count(*) asSelectivity fromEmployees.titles;
    1. +-------------+
    2. | selectivity |
    3. +-------------+
    4. | 0.0379 |
    5. +-------------+

For example, the Employees table has only one index <emp_no> if we want to search for a person by name, we can only scan the whole table:

EXPLAINSELECT * fromEmployees.employeesWHEREFirst_Name='Eric' andLast_Name='Anido';

+----+-------------+-----------+------+---------------+------+---------+------+--------+-------------+
| ID | Select_type | Table | Type | Possible_keys | Key | Key_len | Ref | Rows | Extra |
+----+-------------+-----------+------+---------------+------+---------+------+--------+-------------+
| 1 | Simple | Employees | All | NULL | NULL | NULL | NULL | 300024 | Using where |
+----+-------------+-----------+------+---------------+------+---------+------+--------+-------------+

So the full table scan efficiency is very low, so we consider to index the name, there are two options, build <first_name> or <first_name,last_name>, see the two index of selectivity:

SELECT Count(DISTINCT(first_name))/Count(*) asSelectivity fromemployees;+-------------+|Selectivity|+-------------+|      0.0042 |+-------------+SELECT Count(DISTINCT(Concat (first_name, last_name)))/Count(*) asSelectivity fromemployees;+-------------+|Selectivity|+-------------+|      0.9313 |+-------------+

It is obvious from the result that <first_name> selectivity is too low,<first_name,last_name> selectivity is good. But First_Name and last_name add up to a length of 30, is there a way to balance length and selectivity? You might consider indexing the first few characters of First_Name and last_name, such as <first_name, left (last_name, 3), and see its selectivity:

SELECT Count(DISTINCT(Concat (first_name, Left(Last_Name,3))))/Count(*) asSelectivity fromemployees;+-------------+|Selectivity|+-------------+|      0.7879 |+-------------+

Selectivity is good, but a little distance from 0.9313, then add the last_name prefix to 4:

SELECT Count(DISTINCT(Concat (first_name, Left(Last_Name,4))))/Count(*) asSelectivity fromemployees;+-------------+|Selectivity|+-------------+|      0.9007 |+-------------+

At this point the selectivity is very good, and the index length is only 18, more than <first_name, last_name> short nearly half, we put this prefix index:

ALTER TABLE Employees ADD INDEX ' first_name_last_name4 ' (first_name, Last_Name (4));

Then do it again by name query, compare and analyze the results before the index:

SHOW PROFILES;+----------+------------+---------------------------------------------------------------------------------+|query_id|Duration|Query|+----------+------------+---------------------------------------------------------------------------------+|        the | 0.11941700 | SELECT *  fromEmployees.employeesWHEREFirst_Name='Eric'  andLast_Name='Anido' ||        - | 0.00092400 | SELECT *  fromEmployees.employeesWHEREFirst_Name='Eric'  andLast_Name='Anido' |+----------+------------+---------------------------------------------------------------------------------+

Performance improvements are significant and query speed is increased by more than 120 times.

Clustered index Overlay index redundant index and overwrite index unused index

Indexes that are not used should be deleted. There are two tools to help locate unused indexes.

1. In Percona server or MARIADB, open the Userstates server variable (which is off by default), then let the server run for a period of time before querying information_schema. Index_statistics will be able to find out how often each index is used.

2. Pt-index-usage in Percona Toolkit, the tool can read the query log and explain each query in the log, and then print a report on the index and the query

Indexes and Locks

InnoDB can only be chained to a row when it is accessed, and the index reduces the number of rows accessed by InnoDB, thereby reducing the amount of locks

InnoDB uses a shared (read) lock on a level two index, but an exclusive (write) lock is required to access the primary key index. This eliminates the possibility of overwriting the index and makes the select for update much slower than the lock in SHARE MODE or non-locking query

InnoDB primary key selection and insert optimization when using the InnoDB storage engine, if there is no special need, always use a business-independent, self-increment field as the primary key. Why is it?

Because InnoDB uses a clustered index, the data record itself is stored on the leaf node of the primary index (one b+tree). This requires that each data record in the same leaf node (the size of a memory page or a disk page) be stored in the primary key order, so that whenever a new record is inserted, MySQL inserts it into the appropriate node and position according to its primary key, if the page reaches the load factor (InnoDB defaults to 15/16). Opens a new page (node).

If the table uses the self-increment primary key, each time a new record is inserted, the record is added sequentially to the subsequent position of the current index node, and when a page is full, a new page is automatically opened. As shown in the following:

This creates a compact index structure that fills in the approximate order. Because there is no need to move existing data each time it is inserted, it is highly efficient and does not add much overhead to maintaining the index.

If you are using a non-self-increasing primary key (such as a social security number or a school number, etc.), each time a new record is inserted into an existing index page, because the value of the primary key is approximately random, it is placed somewhere in the middle:

                          

At this point, MySQL had to move the data in order to insert the new record in the appropriate location, even the target page may have been written back to disk and cleared from the cache, and then read back from the disk, which adds a lot of overhead, while the frequent movement, paging operations caused a lot of fragmentation, the lack of compact index structure, Later, the table is rebuilt and the fill page is optimized by optimize table.

Therefore, as long as possible, please try to use the self-increment field key on InnoDB.

Reference documents:

[1] Baron Schwartz, Ninghai Yuanhao and other translations; "High-performance MySQL" (3rd edition); electronics Industry Press, 2013

[2] Zhang Yang blog, http://blog.codinglabs.org/articles/theory-of-mysql-index.html

[3] Anonymous blog, http://www.cnblogs.com/codeAB/p/6387148.html

mysql-High Performance indexing strategy

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.