MySQL's index

Source: Internet
Author: User
Tags one table

The simplest way to understand how an index works in MySQL is to look at the "index" section of a book (the directory). The specified page number is then found by index.

In MySQL, the storage engine uses the index in a similar way, first finding the corresponding value in the index, and then finding the corresponding data row based on the matching index record.

In MySQL, indexes are implemented at the storage engine layer rather than at the server level.

    index benefits:    




finally because the actual value column values are stored in the index, some queries use only the index to complete the query. In summary, there are three advantages:
1. Index big hit reduces the amount of data that the server needs to scan.
2. Indexes can help the server avoid sorting and staging tables.
3. So you can turn random I/O into sequential I/O.

type of index :

B-tree Index:

B-tree usually means that all values are stored sequentially, and each leaf page is the same distance from the root. Structures such as:

  

The B_tree index speeds up access to data because the storage engine no longer needs a full table scan to get the data it needs.

Instead, search begins at the root node of the index. A pointer to a child node is stored in the slot of the root node, and the storage engine is based on

These pointers look down the layer. By comparing the values of the node pages and the values you are looking for, you can find the appropriate pointers to the underlying child nodes, which

The pointer actually defines the upper and lower bounds in the child node page.

b-tree indexes are valid for queries of the following types
1. Full value match full value match refers to matching all columns in the index, for example, the index mentioned earlier can be used to find the name of Cuba Allen, born in 1960-01-01 of 2. Match leftmost prefix the index mentioned earlier can be used to find all people with the surname Allen, that is, only the indexed The first column 3. Matching column prefixes can also match only one column worth the beginning. For example, the index mentioned earlier can be used to find all people who have a surname starting with J. That is, only the first column of index 4 is used. Match range values find the person surnamed Allen and Barry. That is, only the first column of index 5 is used. Exact one column, and range match the index mentioned earlier in the other column can be used to find the person whose surname is Allen, and the first name is the beginning of the letter K, that is, last_name full match for column one, and the second column first_name range matching. 6. Query that accesses the index only. B_tree can often support "indexed queries only", where queries only need to access the index without having to access the data rows.
instead, the B-tree index has the following limitations: (This is the multi-column index)
1. Indexes cannot be used if lookups are not started by the leftmost column of the index.
2. Columns in the index cannot be skipped.
3. If there is a range query for a column in the query, none of the columns to the right of it will be able to use index optimization lookups.

Hash Index:
  
hash indexes are based on hash table implementations, and only queries that accurately match all columns of the index are valid.
for each row of data, the storage engine computes a hash code for all indexed columns. The hash code is a smaller value, and the hash code for the rows of the different key values is calculated differently.
The hash index stores all the hash codes in the index, and a pointer to each row of data is saved in the hash table.
in MySQL, only the memory engine explicitly supports hash indexes, which is also the default index type for the memory engine. The memory engine also supports B-tree indexes.
It is worth mentioning that the memory engine supports a non-unique hash index, which is quite different in the database world. If multiple columns have the same hash value, the index is stored as a linked list
multiple record pointers to the same hash entry.

Limitations of Hash indexes:
The hash index contains only the hash and row pointers, not the field values, so you cannot use the values in the index to avoid reading the rows. However, accessing in-memory rows is fast, so in most cases this
performance has little impact.
  
Hash index data is not stored in the order of index values, so it cannot be used for sorting.
  
hash Therefore, matching lookups for partial indexed columns are not supported, because the hash index always computes the hash value using the entire contents of the indexed column. For example:
The hash index is recommended in the data column (a, b) and cannot be used if the query has only data column A.

The hash index supports only equivalent comparison queries, including =, in (), <=> (note <> and <=> are different operations), and does not support any range queries, such as where price >

the data that accesses the hash index is very fast, unless there are many hash conflicts. When a hash conflict occurs, the storage engine must traverse all the row pointers in the linked list, compare rows by row, and know that all qualifying rows are found.

Some index maintenance operations can be costly if there is a lot of hash conflicts. For example, if a hash index is recommended on a column with a low selectivity (many hash conflicts), then when a row is deleted from the table,
The storage engine needs to traverse the corresponding hash for each row in the linked list, find and delete the reference to the corresponding row, and the more conflicts the more costly.

because of these limitations, hash indexes are only used with certain specific occasions. And once the hash index is appropriate, the performance gains it brings are significant.


Spatial Data Index: How to get to know your funeral

Full-text index: and listen to tell



High Performance indexes:
1. Stand-alone columns

The index column cannot be part of an expression, nor is it a function parameter.
Example: Select actor_id from actor where actor_id + 1 = 5;
This will result in the inability to use the actor_id index, so we should develop the habit of simplifying where conditions and always place the index columns on one side of the comparison symbol.
2. Prefix index and index selectivity
    
  Index selectivity # Index selectivity refers to: the ratio of non-duplicated index values to the total number of records (#T) of the data table, from 1/#T to 1 # The higher the selectivity of the index, the higher the query efficiency, because the higher the selectivity of the index is available to allow the MySQL lookup to filter more rows # Unique index selectivity of 1, which is The best index selectivity, performance is also the best.
    Prefix index
is an efficient way to make indexes smaller and faster.
mysql> ALTER TABLE Sakila.city_demo add key (city (7))
It also has drawbacks, and MySQL cannot use the prefix index for group by and order by, nor does it use a prefix index to overwrite the index.

3. Multi-column index
"index Merge":
The index merge strategy is sometimes an optimization result, but in fact it is more of an indication that the indexes on the table are poorly built.
1. When a server intersects multiple indexes (usually with multiple and conditions), it usually means that a multicolumn index containing all the related columns is required.
Instead of multiple independent single-column indexes.
2. When a server needs to do a federated operation on multiple indexes (usually multiple or conditions), it typically consumes a lot of CPU and memory resources on the algorithm's cache, sorting, and merging operations.
especially when some of these indexes are not highly selective, you need to merge the large amount of data returned by the scan.
3. More importantly, the optimizer does not put these calculations into "query cost", the optimizer only relations random pages read.

         

4. Select the appropriate index column order.
The most confusing problem we encounter is the order of the indexed columns. The correct order depends on the query that uses the index, and also the need to consider how to better meet the needs of sorting and grouping.
In a multi-column b-tree, the order of the indexes means first sorting by the leftmost column, followed by the second column, and so on. Therefore, the index can be scanned in ascending or descending order to meet the precise compliance column
order By,group by and DISTINCT clauses.
      
There is an empirical principle for how to choose the Order of indexed columns: Place the highest-selectivity column at the forefront of the index.
This suggestion may be helpful in some scenarios, but it is often not as important to avoid random IO and sequencing as to consider the issue more comprehensive.
      
when sorting and grouping are not to be considered, it is usually nice to put high-selectivity columns in front of them. This time the index is only used to optimize the lookup of the Where condition.
in this case, the designed index does the fastest filtering of the required rows and is more selective for queries that only use the indexed partial prefix column in the WHERE clause. "
However, performance depends not only on the selectivity of all indexed columns (the overall cardinality), but also on the specific values of the query criteria, which are related to the distribution of merit. This is the same as the length of the selection prefix described earlier,
you may want to adjust the order of indexed columns based on those queries that run most frequently.

Although the rules of thumb for selectivity and cardinality are worth studying and analyzing, it is important to remember that other factors such as the sort, grouping, and scope conditions of the WHERE clause are not forgotten. These factors can have a very large impact on the performance of the query.

5. Clustered index (TODO needs to be replenished)
    
#聚簇索引是一种数据存储方式, it actually holds the B + Tree index and data rows in the same structure, and the InnoDB table is organized by clustered index (similar to the Oracle Index Organization table). #InnoDB通过主键聚簇数据, if no primary key is defined, a unique non-empty index is chosen instead, and if there is no such index, a primary key is implicitly defined as the clustered index. #对于非聚簇索引表来说 (right), table data and indexes are stored in storage, and there is no difference between primary key indexes and level two index storage. #而对于聚簇索引表来说 (left), the table data is stored with the primary key, the leaf node of the primary key index stores the row data, and the leaf node of the two-level index stores the primary key value of the row. The performance of #聚簇索引表最大限度地提高了I/O-intensive applications, but it also has the following limitations: 1) The insertion speed is heavily dependent on the insertion order, which is the quickest way to insert in the order of the primary key, or the page splits, which can severely affect performance. Therefore, for InnoDB tables, we typically define a self-increment ID column as the primary key. 2) Updating the primary key is expensive because it will cause the rows to be updated to move. Therefore, for InnoDB tables, we generally define the primary key as not updatable. 3) Level Two index access requires two index lookups, first finding the primary key value, and the second time finding row data based on the primary key value. #二级索引的叶节点存储的是主键值, rather than a row pointer (a non-clustered index stores pointers or addresses), to reduce the maintenance of level two indexes when row movement or data page splitting occurs, but it takes more space for a two-level index. #聚簇索引的叶节点就是数据节点, the leaf nodes of the non-clustered index are still indexed, and a link is left to the corresponding data block. #聚簇索引主键的插入速度要比非聚簇索引主键的插入速度慢很多. #相比之下, clustered indexes are suitable for sorting, and non-clustered indexes are unsuitable for sorting. Because the clustered index itself is already placed in physical order, the sorting is fast. Non-clustered indexes are not stored sequentially and require additional resources to be sorted. #当你需要取出一定范围内的数据时, it is better to use a clustered index than a nonclustered index.


   6. Overwrite index  Typically, you create an index based on the where condition of the query. But this is just one aspect of index optimization. A well-designed index should consider the entire query, not just the Where condition part. Indexes are really an efficient way to find data, but MySQL can also use indexes to get data directly from columns, so that you don't need to read rows of data. If the index's leaf node already contains the data to be queried, then what is necessary to return to the table query? If an index contains all the values of the fields that need to be queried, what we call "Overwrite index" is that not all types of indexes can be overwritten indexes. The overwrite index must store the value of the indexed column, and the hash index, the spatial index, and the full-text index do not store the value of the indexed column. So MySQL can only use the B-tree index to do the overwrite index, in addition, different storage engine implementation of the way to overwrite the index is different, and not all engines support overwriting the index. When an indexed query is launched (also called an index overlay query), the information "Using index" can be seen in explain's extra. For example, the table sakila.inventory has a multicolumn index (store_id, film_id). MySQL If you only access these two columns, you can use this index to do the overwrite index. Mysql> EXPLAIN SELECT store_id, film_id from Sakila.inventory  
Mysql cannot perform the like operation in the index. This is a limitation of the underlying storage engine API. mysql5.5 only allows simple comparison operations in the index (for example, equals, greater than, less than) MySQL can make a like comparison of the leftmost prefix match in the index, because it can be converted to a simple comparison operation. However, if you start with a wildcard character, the storage engine cannot match. Only the values of the data rows can be extracted for comparison. Explain select *from Productswhere actor = ' Sean Carrey ' and title like '%apollo% ' optimization-->explain select *from products
    join (        Select prod_id from Products        where actor = ' Sean Carrey ' and the title like '%apollo% '    ) as T1 on (t1 . prod_id = products.prod_id)    We call this a deferred association because we delay access to the column. You can use the overwrite index in the first stage of the query. And then match again.        
7. Use the index scan to do the sorting.
There are two ways in which MySQL can produce ordered results, sort operations, or sort by index, and if the type of explain is index, then the index scan is used to sort the order.
scanning the index itself is very fast, because only one index record needs to be moved to the next record immediately thereafter. However, if the index cannot overwrite all the columns required by the query, it will have to be returned without scanning an index record.
the table queries the corresponding row at a time. This is basically random io, so the speed of reading data in indexed order is usually slower than the sequential full table scan, especially in the IO cheats workload.
  
MySQL can use the same index to both sort and find rows, so if possible, it is best to design the index at the same time as possible to satisfy both of these tasks.

MySQL can use the index to sort the results only if the column order of the index is exactly the same as the ORDER BY clause, and if all columns have the same sort direction (reverse or positive order).
If the query requires more than one table to be associated, the index can be used only if the field referenced by the ORDER BY clause is all of the first table. The limit for the ORDER BY clause and the lookup query is the same:
The requirement to satisfy the leftmost prefix of the index is required, otherwise MySQL will need to perform a sort operation instead of using the index ordering.

8. Compressing the index

9. Redundant indexes and duplicate indexes

10. Delete Unused indexes
      
11. Index and lock
    

Index optimization:
  
Now you need to look at which columns have many different values and which ones appear most frequently in the WHERE clause. The selectivity of creating an index on more distinct and worthwhile columns is higher. This is generally true because it allows MySQL to filter unwanted rows more efficiently. The selectivity of country and sex columns is usually not high, but many queries may be used. So consider the frequency of use, or is it recommended that when creating different combinations of indexes, the (sex, Country) column is prefixed, but according to the traditional experience is not to say that you should not create an index on a column with low selectivity? Yes, but we do this for two reasons, the first: As mentioned above, almost all queries will use the second of the sex column: The more important point is that there is no harm in adding a column to the index, even if the query does not use the sex column, you can bypass the trick with the following "trick": If a query doesn't restrict sex, You can let MySQL select the index by adding and sex in ("M", "F") to the query criteria. This does not filter any rows, and returns the same results without this condition, but it must be added to MySQL to match the leftmost prefix of the index, a trick that works well in such a scenario, but if the column has too many different values, it makes the in () list too long, and that does not work. A fundamental principle of designing an index: Consider all the options on the table. When designing an index, do not just consider which indexes are required for an existing query, but also consider optimizing the query. If you find that some queries need to create new indexes, but this index will reduce the efficiency of other queries, then you should think about whether you can optimize the original query. You should find the best balance between optimizing queries and indexes, rather than designing the perfect index behind closed doors. Another basic principle is to put the columns that need to be scoped queries behind the index columns whenever possible. So that the optimizer can use more indexed columns.

# force an Index SELECT * from table1 using index (COL1_INDEX,COL2_INDEX) WHERE col1=1 and col2=2 and col3=3;

  

MySQL's index

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.