Impact of MySQL indexes and storage methods on Performance

Source: Internet
Author: User

This article draws from high-performance MySQL (version 2).

In a database, the policies that have the greatest impact on performance include database lock policies, cache policies, index policies, storage policies, and execution plan optimization policies.

The index policy determines the efficiency of the database to quickly locate data, and the storage policy determines the efficiency of data persistence.

MyISAM and InnoDB, two major storage engines in MySQL, adopt different indexing and storage policies. This article will analyze their similarities and differences and performance.

MySQL provides two Indexing Methods: B-tree (including B + tree) indexes and hash indexes.

The B-tree index can be used for range search and prefix search. For the B-tree with N nodes, the complexity of retrieving a record is O (logn ).

Hash indexes can only be used for equal searches. However, no matter how large a hash table is, the search complexity is O (1 ).

Obviously, if the value difference is large and the primary value is equal to search, the hash index is more efficient, and it has the complexity of searching for O (1. If the value difference is relatively poor and the range search is dominant, Tree B is a better choice. It supports range search.



Hash indexes are similar in different engines, and there is not much discussion. This article mainly discusses different forms of B-tree indexes.

B belongs to the binary Balance Tree. The Balance Tree is a tree with no more than 1 height gap between the left and right nodes of any node. This is the absolute Balance Tree.

AVL is a better algorithm for the Balance Tree. It can ensure the absolute balance of the tree through the left-hand, right-hand and their combination operations.

All rotation operations in the AVL algorithm are as follows:


If the balance tree is in an imbalance on any left side, it can be transferred to the balance on the right side through the corresponding rotation operation.

The B-tree used by the database records information only on the leaf node, while the non-leaf node records the range information, which is different from the general search tree (the General Search Tree also records information on non-leaf nodes ).

This is a B + tree structure. InnoDB indexes all adopt this form. It adds a pointer to each leaf node on the basis of the B-tree, in this way, you can quickly perform range search.

I still cannot determine whether MyISAM is also a B + tree, but I didn't think of a method that can quickly perform range search. It should also be a B + tree.

For example, the B + tree example:


If I want to find all the information starting with name a, I just need to get the first leaf node and then follow the pointer to the next leaf in sequence, it is found that the leaf node does not start with a in the current year, so that as long as the first leaf node (O (logn) is found and then retrieved along the pointer (O (n )), you can obtain all indexes. If the tables on N nodes scan for M consecutive values, they are O (logn) + O (M). If the B-tree is used, it must be traced back to the upper node, the worst efficiency is O (logn) * m.

For InnoDB, an improved B + tree index is used, which is called a clustered index. The difference is that indexes not only have index value information, there is also the information of the row where the entire index value is located, eliminating the need for an operation to retrieve the entire row through the position on the index value.

Suppose we have a table with (col1, col2) columns, col1 is the primary key, and col2 also creates an index. In the MyISAM engine, files are recorded as follows, because MyISAM writes data to files in the order of insertion. If a row is deleted, it is left blank. If the row is inserted again, it is left blank. For rows with an indefinite length, the storage engine reserves the position on the page, to update a longer value (generally varchar). If it cannot be placed, it will be added to the end of the file and deleted from the original location. So sometimes there is a waste of space, and optimize table is required for optimization.

Therefore, the fixed-length row is more efficient than the variable-length row! Separated storage of fixed-length and indefinite data can improve efficiency in many cases.

Let's take a look at the primary key index of MyISAM. The index key is the primary key value, and the index value is the file location of the row. You can directly read the row through this location. In this figure, MyISAM also uses the B + tree.

MyISAM's non-primary key index is not different from the gradual index, and is also the File Location of the index row.

Let's look at InnoDB's primary key index. The index key is the primary key value, and the index value is the whole row of data.

InnoDB non-primary key index, index key is the column value, and index value is the primary key value.

Compare the index policies of MyISAM and InnoDB:


We can find that the indexes of all MyISAM columns are the same, the index key is the column value, and the index value is the file location of the row.

The primary key index of InnoDB contains all information about the row. The index key is the primary key value and the index value is the value of the whole row. The index key instead of the primary key is the column value, and the index value is the primary key value.

In InnoDB, a clustered index is required. If no primary key is defined, InnoDB also implicitly creates a clustered index as the primary key, because the primary key index of InnoDB also has an important function: Row lock, which I analyzed in another article.

Let's see what happens when we insert values:


InnoDB organizes files according to the primary key index order. If files are inserted in the primary key order, they can be directly added at the end. And only fill in 15/16 of the page, so that some space can be reserved for row modification, so that the organization of data is very compact.

If the primary key is not in order, let's take a look at what will happen, because the data will be moved and the page will be adjusted as the primary key is stored in order.


Therefore, it is necessary for the InnoDB engine to insert records in the order of primary keys. Otherwise, the performance will face great risks.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.