MySQL Indexing knowledge grooming (creating high-performance indexes)

Last Update:2017-08-02 Source: Internet

Author: User

Tags mysql index

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Objective:

Index optimization should be the most effective means of optimizing query performance. Indexes can easily improve query performance by a few orders of magnitude.

Solid-state hard drive with mechanical hard drive starter, with completely different performance characteristics;

But even solid-state drives, the principle of indexing is still true,

It's just that bad indexes that need to be avoided have an impact on SSDs that are not as bad as mechanical hard drives.

Many companies now rely on DBAs to work on database optimization, which in my view should be a must-have skill for programmers,

Experienced and inexperienced programmers also have a big difference in database usage, depending on the developer's knowledge of the data structures within the index,

Be familiar with all responsible business so that you can build an excellent index to maximize performance.

First, the index base

The role of indexes in the database, rough introduction of the principle does not explain, too basic.

In MySQL, an index can contain values for one or more columns.

If the index contains more than one column, the order of the columns is important because MySQL can only use the index's most-prefixed column efficiently.

It is very different to create an index that contains two columns, and to create two indexes that contain only one column.

Ii. Types of indexes

There are two types of indexes in MySQL: BTree and Hash;

The indexes of different storage engines do not work the same way, and not all storage engines support all types of indexes.

Even if multiple storage engines support the same type of index, the underlying implementation may be different.

MyISAM uses prefix compression technology to make the index smaller, but InnoDB is stored in the original data format.

The MyISAM index references the rows that are indexed by the physical location of the data, while InnoDB references the rows that are indexed according to the primary key.

B-tree Index

In fact, many storage engines use B+tree, where each leaf node contains a pointer to the next leaf node,

So as to facilitate the range traversal of leaf nodes. (For more detailed details on B-tree reference data structure related books)

B-tree indexes can speed up access to data because the storage engine no longer needs to perform token scans to get the data it needs,

Instead, search begins at the root node of the index. A pointer to a child node is stored in the slot of the root node, and the storage engine looks down the layer based on those pointers.

You can find the appropriate pointer to the next level of child nodes by comparing the values of the node pages and the values you want to find. These pointers actually define the upper and lower bounds of the values in the child nodes page.

The final storage engine either finds the corresponding value or the record does not exist.

B-tree usually means that all values are stored sequentially, and each leaf page is the same distance from the root.

B-tree The index is stored sequentially, so it is well suited to look up range data.

B-tree indexes apply to full-key values, key-value ranges, or key-prefix lookups.

Full value match: Matches (retrieves) all the columns in the index file;

Match the leftmost prefix: Suppose the person looking for name= "Allen" retrieves only the first column of the index;

Match column prefix: Matches the beginning of the value, assuming that the lookup Name like ' A% ' also retrieves only the first column of the index;

Match Range Value: Suppose that the person matching Allen ~ Bell only retrieves the first column of the index;

Limitations of the Index:

1. You cannot use the index if you do not start the search by the leftmost column of the index. (String type)

2. Columns in the index cannot be skipped. Did not understand the possible combination of the index of the case. (note ①)

3. If there is a range query for a column in the query, none of its right columns will be able to use index-optimized lookups.

　Note ①

Hash Index

//Currently only Memory engine supports hash index, so it is not commonly used;

Hash indexes (hash index) are implemented based on hash tables (key-value pairs), and only queries that accurately match all columns of the index are valid;

For each row of data, the storage engine computes a hash code (KEY) for all indexed columns, and the hash code is a smaller value;

Hash Index Limitations: hash indexes contain only hash and row pointers, not field values , so values in the index cannot be used to avoid reading rows.

Hash indexes cannot be used for sorting. The hash index also does not support partial indexed column matching lookups, because the hash index always computes the hash value using the entire contents of the indexed column.

For example, if a hash index is established on a data column (a, b), the index cannot be used if the query has only data column A.

The hash index supports only equivalent comparison queries, including =, in (), <=> (note <> and <=> are different operations). Also does not support any scope queries.

The data that accesses the hash index is very fast, unless there are many hash conflicts (different indexed columns have the same hash value).

Some index maintenance operations can be costly if there is a lot of hash conflicts. Because of these limitations, hash indexes are only available for specific occasions.

And once the hash index is appropriate, the performance gains it brings are significant.

For details, refer to the previous note, "Knowledge collation on MySQL index"

Full-Text indexing

A full-text index is a special type of index that looks for keywords in text instead of directly comparing the values in the index.

Full-text search and several other types of indexes match in a completely different way.

There is no conflict between creating a full-text index and a value-based B-tree index on the same column;

The full-text index applies to the MATCH against operation, not to the normal where condition operation.

Third, the advantages of the index

//This is nothing good record, the data is fast, no need to doubt;

The most common b-tree indexes, which store data sequentially, are used by MySQL to do order by and GROUP by operations.

Because the data is ordered, B-tree also stores the related column values together.

Finally, because the values of the actual columns are stored in the index, some queries use only the index to complete the query.

Three major advantages:

1. The index greatly reduces the amount of data that the server needs to scan;

2. Indexing can help the server avoid sorting and staging tables;

3. The index can turn random I/O into sequential I/O;

In general, the index is valid only if the index helps the storage engine to quickly find the benefits of records that outweigh the additional work that it brings.

For very small tables, a simple full table scan is more efficient in most cases.

Indexes are very effective for medium to large tables. For large tables, however, the cost of establishing and using an index increases.

Four, prefix index and index selectivity?

//Prefix index: For example, a string field must be used to index the prefix part of the string field;

Sometimes you need to index a very long character column, which can make the index a lot easier.

You can usually use some of the characters at the beginning of the index, which saves the disk space occupied by the index and improves indexing efficiency.

However, this can reduce the selectivity of the index.

Index selectivity refers to the ratio of non-repeating index values (also known as cardinality) to the total number of records (#T) of the data table.

range from 1~ #T, the higher the selectivity of the index, the higher the query efficiency,

Because the selective indexing allows MySQL to filter out more rows in the lookup.

Find the appropriate prefixes, such as:

SELECT COUNT (*) as CNT, left (city, 7) as Pref

From City_demo GROUP by Pref ORDER by CNT DESC LIMIT 10;

How to create a prefix index:

ALERT TABLE City_demo ADD KEY (city (7));

Reference:

http://www.cnblogs.com/gomysql/p/3628926.html

Five, multi-column index

Establishing multiple single-column indexes on multiple columns does not improve the query performance of MySQL.

Some experts, such as the vague proposal "to index columns in a WHERE condition", are actually very wrong in this proposal.

The best case can only be a "one-star" index, whose performance may be several orders of magnitude worse than a truly optimal index.

Sometimes if you can't design a "Samsung" index, you might as well ignore the WHERE clause,

focus on optimizing the order of indexed columns, or create a fully-covered index.

The index merge strategy is sometimes an optimization result, but in fact it is more of an indication that the index on the table is poorly established:

1. When the server intersects multiple indexes (by having multiple and operations),

It usually means that you need a multi-column index that contains all the relevant columns, rather than a single single column index;

2, when the server needs to do a combination of multiple indexes (through multiple OR operations), some of the indexes are not high selectivity,

When a merge scan is required to return large amounts of data;

You can view the index and query through EXPLAIN, the details are as follows:

http://www.cnblogs.com/-simon/p/5887428.html

Vi. How to select the Order of indexed columns

How to choose the order of indexed columns there is a rule of thumb: Place the highest-selectivity columns at the forefront of the index.

If you do not consider sorting and grouping, it is usually nice to have the highest selectivity column in front of you.

This way, the effect is only used to optimize the find of WHERE conditions.

Give me a chestnut:

SELECT * FROM payment WHERE staff_id = 2 and customer_id = 584;

Should I create an index (staff_id, customer_id) or should I reverse the order?

1. How large the data cardinality of the WHERE conditional branch is queried:

SELECT sum (staff_id = 2), sum (customer_id = 584) from payment

1. Row **************

SUM (staff_id): 7992

SUM (customer_id): 30

According to the previous rule of thumb, the index column customer_id should be put in front, because the customer_id number of corresponding conditions is smaller.

From this small case open source see the rules of thumb and inference is useful in most cases, but be careful not to assume performance on average,

Special circumstances can destroy the performance of the entire application.

Although empirical rules on selectivity and cardinality refer to research and analysis,

But don't overlook other factors such as sorting, grouping, and scope conditions in the WHERE clause, which can have a significant impact on query performance.

VII. Clustered Index

Innerdb The primary key is the clustered index;

A clustered index is not a separate index type, but a way of storing data.

The clustered index of the Innerdb actually holds the B-tree index and data rows in the same structure.

In the Innerdb engine, if no primary key is defined, Innerdb chooses a unique non-empty index instead.

Without such an index, INNERDB implicitly defines a primary key to be used as the clustered index.

Some important advantages of clustered indexes:

1, the relevant data can be saved together. For example, when implementing e-mail, you can aggregate data based on user ID.

This only requires reading a small number of data pages from the hard disk to get all the mail for a user. If you do not use a clustered index,

Each message can cause disk IO to occur once.

2, data access faster. (Clustered index saves indexes and data in the same b-tree)

3. Queries using the Overwrite index Scan can use the primary key values directly in the page node.

Disadvantages of Clustered indexes:

1, cluster data to maximize the performance of IO-intensive applications;

2, insertion speed depends heavily on the insertion order;

3, the cost of updating the clustered index is very high, because it forces Innerdb to move each updated row to a new location;

4, clustered index may cause the whole table scan to become slow, especially the row is sparse or the data is discontinuous;

The 5, two-level index (nonclustered index) may be larger than expected because the leaf node in the two-level index contains the primary key column of the reference row;

6, two-level index access requires two index lookups, not one at a time;

Viii. Summary

There are three principles to keep in mind when selecting indexes and writing queries using these indexes:

1, single-line access is the slowest. Especially in mechanical hard drives (SSD random I/O is much faster, but this is still true).

If the server reads a chunk of data from storage just to get one of the rows,

So a lot of work is wasted, and it's best to read a block that contains as many rows as needed.

2, sequentially access to the range of data is very fast, two reasons:

A, sequential I/O does not require multiple hard drive seek, so much faster than random I/O (especially mechanical hard disk);

b, if the server can read the data sequentially, so long no longer need additional sorting operations,

also, the GROUP by query does not need to be sorted and the rows are aggregated.

C, index overwrite query quickly, if an index contains all the columns required by the query,

Then the storage engine does not need to return to the table to find rows, which avoids a large number of single-row access;

In short, write a query statement to choose the appropriate index to avoid single-row lookups.

//Import from Evernote, formatting slightly biased.

MySQL Indexing knowledge grooming (creating high-performance indexes)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More