2014 questions from Alibaba intern-how to index mysql

Source: Internet
Author: User

This is an exam for 2014 Alibaba intern on the Beijing site:

In MySQL, indexes belong to the storage engine level. Different storage engines implement indexes differently, such as MyISAM and InnoDB Storage engines.

MyISAM index implementation:

The MyISAM storage engine uses B + Tree as the index structure. The data domain of the leaf node stores the data record address. The index method of MyISAM is also called "non-Ju set". The reason for this is to distinguish it from the index of the InnoDB ju set.

InnoDB Index implementation:

Although InnoDB uses B + Tree as the index structure, the implementation method is different from that of MyISAM.

The first major difference is that the InnoDB data file itself is an index file.

The second difference from the MyISAM index is that InnoDB's secondary index data domain stores the value of the primary key of the corresponding record rather than the address. The implementation of the ju SET index makes the search by primary key very efficient, but the secondary index search needs to retrieve the index twice: first, retrieve the secondary index to obtain the primary key, then, use the primary key to search for the record in the primary index.

In fact, the implementation of database indexes can adopt the data structure of the red and black trees and the B-Tree trees.

But why is B + Tree actually used?

This should begin with computer storage principles and operating system knowledge. Because data tables have large indexes and cannot be stored in memory, they are stored in disks as files. Therefore, I/O operations are required when querying data. The goal of efficient query is to reduce the number of I/O operations. One I/O generally reads a page (usually 4 k) of data (Locality Principle ). In this way, every time a new node is applied for in the B-tree, the page size is applied. That is to say, I/o can read data from one node (including many keys) at a time. In the red-black tree structure, physically adjacent nodes are not necessarily adjacent. That is to say, reading the same data requires multiple I/O times. Therefore, B-tree is more efficient.

So why did we finally choose B + tree?

Because the nodes in the B + tree remove the data domain, they can have a higher degree of output, that is, a node can store more internal nodes, so the I/O efficiency is higher.

Understanding the index implementation methods of different storage engines is very helpful for correct use and optimization of indexes. For example, after knowing the index Implementation of InnoDB, it is easy to understand why it is not recommended to use too long fields as the primary key, because all secondary indexes reference the primary index, too long primary index will make the secondary index too large. For example, it is not a good idea to use non-monotonous fields as the primary key in InnoDB, because the InnoDB data file itself is a B + Tree, non-monotonous primary keys will cause frequent split and adjustment of data files to maintain the features of B + Tree during the insertion of new records, which is very inefficient, using an auto-increment field as the primary key is a good choice.

 

Ju SET index and non-ju SET index:

InnoDB is a ju SET index because its leaf node of the B + tree contains a complete data record. The leaf node of the MyISAM Method B + tree only stores the data address, so it is called a non-ju SET index.

Index Usage Policy and Optimization

MySQL optimization mainly includes schema optimization and Query optimization ). View this article for details:

Data Structure and algorithm principles behind MySQL Indexes

Note: Clustering is replaced by ju. The ju set is actually a sensitive word. It is too difficult to find a solution.


 

 



Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.