High performance MySQL reading notes creation high-performance index

Source: Internet
Author: User
Tags crc32 mysql version prefix lookup percona server

An index is a data structure that the storage engine uses to quickly find records. Index optimization is the most effective way to optimize query performance. Indexes can easily improve query performance by a few orders of magnitude. Creating an optimal index often requires rewriting the query.
5.1 Index Basics
In MySQL, the storage engine first finds the corresponding value in the index and then finds the corresponding data row based on the matching index record.
An index can contain values for one or more columns. If the index contains more than one column, the order of the columns is also important because MySQL can use only the leftmost prefix column of the index efficiently.
5.1.1 Types of indexes
There are many types of indexes that can provide better performance for different scenarios. In MySQL, indexes are implemented at the storage engine layer rather than at the server level.
B-tree index: B-tree usually means that all values are stored sequentially, and each leaf page is the same distance from the root.
The storage engine uses the B-tree index in different ways, with different performance and merits. MyISAM uses prefix compression technology to make the index smaller, but InnoDB is stored in the original data format. The MyISAM index references the rows that are indexed by the physical location of the data, while InnoDB references the rows that are indexed according to the primary key.
B-tree The index is stored sequentially, so it is well suited to look up range data.
B-tree indexes apply to full-key values, key-value ranges, or key-prefix lookups. Where the key prefix lookup applies only to lookups based on the leftmost prefix.
Query types that can use the B-tree index:
Full value match: Refers to matching all columns in the index.
Match the leftmost prefix: Use only the first column of the index.
Match column prefixes: You can also match the beginning of a column.
Match Range Value: Only the first column of the index is used.
Match exactly one column and range to another: that is, the first column is fully matched and the second column range matches.
Only queries that access the index: that is, the query only needs to access the index without having to access the data rows.
Because the nodes in the index tree are ordered, you can also use the order by operation in the query in addition to lookup by value.
Here are some limitations on the B-tree index:
You cannot use an index if you are not looking in the leftmost column of the index.
Columns in the index cannot be skipped.
If the query has a range lookup for a column, none of its right columns will be able to use index-optimized lookups.
So the order of the indexed columns is important, and when optimizing performance, you might want to use the same columns but with different sequential indexes to meet different types of query requirements.
    

Hash Index: Based on a hash table implementation, only queries that exactly match all columns of the index are valid. For each row of data, the storage engine computes a hash code (hash code) for all indexed columns, and the hash code is a smaller value, and the hash code for the rows of different key values is calculated differently. The hash index stores all the hash codes in the index, and a pointer to each row of data is saved in the hash table.
in MySQL, only the memory engine explicitly supports hash indexes. If multiple columns have the same hash value, the index holds multiple record pointers to the same hash entry in a linked list.
Because the hash index only needs to store the corresponding hash value, the structure of the index is very compact, which also makes the hash index lookup fast.
Limits for Hash indexes:
Hash indexes contain only hash and row pointers, not field values, so values in the index cannot be used to avoid reading rows. The
Hash index cannot be used for sorting. The
Hash index also does not support partial indexed column matching lookups, because the hash index always computes the hash value using the entire contents of the indexed column. For example, if a hash index is established on a data column (a, b), the index cannot be used if the query has only data column A. The
Hash index supports only equivalent comparison queries, including =, in (), <=> (note <> and <=> are different operations). Also does not support any scope queries. The
accesses the hash index data very quickly, unless there are many hash conflicts (different indexed columns have the same hash value).
If there are many hash conflicts, some index maintenance operations can be expensive.
Because of these limitations, hash indexes are only available for certain occasions. And once the hash index is appropriate, the performance gains it brings are significant.

Create a custom hash index: If the storage engine does not support hash indexes, you can create a custom hash index. The
idea is simple: create a pseudo-hash index on B-tree. This is not the same as a real hash index, because it is still used to find with b-tree, but it uses a hash instead of the key itself for index lookups. What needs to be done is to manually specify the hash function in the WHERE clause of the query, you can use CRC32 to do the hash, do not use SHA1 () and MD5 () as the hash function (because the hash values computed by these two functions are very long strings, will waste a lot of space and will be slower compared).
If the data table is very large, CRC32 () will have a large number of hash collisions (the probability of a conflict when the index has 93,000 Records is 1%), you can consider implementing a simple 64-bit hash function, the requirement of this function is to return an integer. An easy way to do this is to use the MD5 () function to return a portion of the value as a custom hash function, as well as a hash function such as the FNV64 () function, which is a function that is ported from Percona Server and can be used in any MySQL version as a plug-in. The resulting hash value is 64-bit, fast, and has much less conflict than CRC32 ().
Handling Hash Conflicts: when querying using a hash index, you must include constant values in the WHERE clause, for example:
Mysql>select ID from URL WHERE url_ctc = CRC32 ("http://www. MySQL.com ") and url=" http://www.mysql.com ";

5.2 Advantages of the index
The most common b-tree indexes, which store data sequentially, are used by MySQL to do order by and group by operations. Because the data is ordered, B-tree also stores the related column values together. Finally, because the values of the actual columns are stored in the index, some queries use only the index to complete the query.
Three major advantages:
1. The index greatly reduces the amount of data that the server needs to scan
2. Indexing can help the server avoid sorting and staging tables
3. Indexes can turn random I/O into sequential I/O
In general, the index is valid only if the index helps the storage engine to quickly find the benefits of records that outweigh the additional work that it brings. For very small tables, a simple full table scan is more efficient in most cases. Indexes are very effective for medium to large tables. For large tables, however, the cost of establishing and using an index increases.

5.3 High-performance indexing strategy
5.3.1 Stand-alone columns
If the columns in the query are not independent, then MySQL does not use the index. A "stand-alone column" refers to an indexed column that cannot be part of an expression or an argument to a function.
Mysql>select actor_id from Sakila.actor WHERE actor_id + 1 = 5;
Mysql>select ... WHERE to_days (current_date)-To_day (Date_col) <= 10;

5.3.2 prefix index and index selectivity

Sometimes you need to index a very long character column, which can make the index a lot easier. One policy is the previously mentioned simulated hash index. However, sometimes this is not enough, you can usually index the beginning of some of the characters, which can greatly save the index space, thereby improving the efficiency of the index, but this will reduce the selectivity of the index. The selectivity of an index is the ratio of non-repeating index values (also known as cardinality, cardinality) to the total number of records (#T) of the data table, ranging from 1/#T到1之间. The higher the selectivity of the index, the higher the query efficiency. The selectivity of a unique index is 1, which is the best index selectivity and performance is the best.

5.3.3 Multi-column index
A common mistake is to create a stand-alone index for each column, or create a multicolumn index in the wrong order.
Creating indexes on multiple columns in most cases does not improve the query performance of MySQL.
For example, table Film_actor has a single-column index on fields film_id and actor_id. However, for the following query, neither of these two-column indexes is a good choice:
Mysql>select film_id,actor_id from Sakila.film_actor
->where actor_id = 1 OR film_id = 1;
In MySQL 5.0 and later versions, queries can use both single-column indexes to scan and merge the results. The algorithm has three variants: the Union of an OR condition (union), the intersection of an and condition (intersection), and the Union and intersection of the first two cases. This can be seen through the extra in explain:
Mysql>explain SELECT film_id,actor_id from Sakila.film_actor
->where actor_id = 1 OR film_id = 1\g;
1. row****************************
...
...
Extra:using Union (PRIMARY,IDX_FK_FILM_ID), Using where

The index merge strategy sometimes is an optimization result, but in fact more often the index on the table is poorly built.
When a server intersects multiple indexes (usually with multiple and conditions), it usually means that a multi-column index containing all the related columns is required, rather than multiple independent single-column indexes.
When a server requires multiple indexes for a federated operation (usually with multiple or operations), it often takes a lot of CPU and memory resources to cache, sort, and merge operations on the algorithm.

(not to be continued)

High performance MySQL reading notes creation high-performance index

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.