1. What is the index of the structure used by MySQL?
1). The MyISAM engine uses B+tree as the index structure, and the data domain of the leaf node holds the address of the record. Therefore, the algorithm of index retrieval in MyISAM is to search the index according to the B+tree search algorithm first, and if the specified key exists, the value of its data field is taken out.
The data record is then read with the value of the database field as the address.
2). InnoDB also uses B+tree as the index structure, but the specific implementation is different from MyISAM.
A. The first major difference is that the InnoDB data file itself is the index file. The MyISAM index file and the data file are detached, and the index file only holds the address of the data record. In InnoDB, the table data file itself is an index structure organized by B+tree,
The tree's leaf node data field holds a complete record of the records. The key of this index is the primary key of the data table, so the InnoDB table data file itself is the primary index.
B. The second difference from the MyISAM index is that the secondary index of the InnoDB data field stores the value of the corresponding record primary key instead of the address. In other words, all secondary indexes of InnoDB refer to the primary key as the data domain. This implementation of clustered indexes makes search by primary key very efficient,
However, a secondary index search needs to be retrieved two times index: retrieves the secondary index first to obtain the primary key, and then retrieves the record with the primary key to the primary index.
2. mysql Index optimization policy
1). Use the index for the columns that appear in Where,on,group by,order by.
2). Use the prefix index for longer strings.
3). Do not create indexes too much, except for additional disk space, which has a significant impact on the speed of DML operations because they have to be re-indexed each time they are incremented.
4). Using composite indexes, you can reduce the file index size, which is better than multiple single-column indexes when used.
5). Create indexes on columns with high dimensions. If there are 8 rows of data in the data table, a, B, the c,d,a,b,c,d dimension of this table is 4.
6). It is not recommended to use a field that is too long as the primary key, because all secondary indexes refer to the primary index, and the long primary index makes the secondary index too large.
7). Using non-monotonic fields as primary keys is not a good idea in InnoDB because the InnoDB data file itself is a b+tree, and a non-monotonic primary key causes the data file to be frequently split in order to maintain b+tree characteristics when inserting new records, which is inefficient, Using the self-increment field as the primary key is a good choice.
8). If the query condition contains a function or expression, then MySQL does not use the index for that column (although some are mathematically useful).
3. Why do indexes use B+tree?
In general, the index itself is large and cannot be stored in memory, so the index is often stored as an index file on the disk. In this case, the index lookup process will generate disk I/O consumption, relative to memory access, I/O access to the consumption of a few orders of magnitude, so the evaluation of a data structure
The most important indicator as an index is the progressive complexity of the number of disk I/O operations during the lookup process. In other words, the structural organization of the index minimizes the number of disk I/O accesses during the lookup process.
Due to the high efficiency of disk sequential reads (no seek time required and minimal rotational time), pre-reading can improve I/O efficiency for programs with locality. The length of the read-ahead is generally the integer multiple of the page. Page is the logical block of Computer Management memory, hardware and operating system
Main memory and disk storage are often divided into contiguous chunks of equal size, each storage block is called a page (in many operating systems, the page size is typically 4k), and main memory and disk Exchange data in pages. When the data to be read by the program is not in main memory, a fault is triggered, and the system will
Disk reads a signal, the disk will find the starting position of the data and sequentially read a page or several pages into memory, and then return the exception, the program continues to run.
As mentioned above, the index structure is generally evaluated using disk I/O times. First, from the B-tree analysis, according to the definition of b-tree, it is necessary to retrieve up to H nodes at a time. The designer of the database system skillfully exploits the principle of disk pre-reading, setting the size of a node equal to one page, so that each node
It can be fully loaded with just one I/O. To achieve this, the following techniques are required to implement B-tree in practice:
Each time you create a new node, request a page space directly, so that a node is physically stored in a page, and the computer storage allocation is page-aligned, the implementation of a node only one time I/O. B-tree requires a maximum of h-1 I/O (root node resident memory) in a single retrieval, with progressive complexity of
(h)=o(logdN) O (h) =o (LOGDN). In general practice, the out-of-size D is a very large number, usually more than 100, so H is very small (usually not more than 3).
In summary, using B-tree as index structure efficiency is very high.
And the red-black tree structure, H is obviously much deeper. Because the logically close node (parent-child) may be far away physically, it is not possible to take advantage of locality, so the I/O asymptotic complexity of the red-black tree is also O (h), and the efficiency is significantly worse than B-tree.
As mentioned above, B+tree is more suitable for external memory index, because it is related to the internal node out d. From the above analysis, it can be seen that the better the performance of the larger index, and the upper limit depends on the size of the key and data within the node:
DMAX=FLOOR(PAGESIZE/(KEYSIZE+d Ata si ze +p Oin ts iz e) ) dmax=floor (pagesize/(keysize+datasize+pointsize))
d m a x = f l o o r ( p a g e s i z e / ( k e y s i z e + D a T a s I Z e + P o I N T s I Z e ) ) "> Floor indicates rounding down. because the nodes in the B+tree are stripped of the data domain, they can have a greater degree of granularity and better performance.
4. Database optimization scheme:http://www.cnblogs.com/Jtianlin/p/8832896.html
5. Explain usage:
1. B-Tree, here is balance (balanced meaning), B-Tree is a multi-path self-balancing search tree It resembles a normal balanced binary tree, the difference is that the B-tree allows each node to have more child nodes.
2. B-Tree has the following characteristics:
1). All key values are distributed throughout the tree;
2). Any one of the keywords appears and appears only in one node;
3). Search may end at non-leaf nodes;
4). Do a search in the keyword complete, the performance approximation of the binary search;
3. The B + Tree is a variant of B-tree and is a multi-path search tree, which differs from that of a tree:
1). All keywords are stored on leaf nodes, internal nodes (non-leaf nodes do not store true data)
2). Added a chain pointer for all leaf nodes
4. Data structures such as red and black trees can also be used for indexing, but file systems and database systems generally use B-/+tree as the index structure. MySQL is a disk-based database system, indexing is often stored as an index file on a disk, index lookup
Disk I/O consumption will be generated in the process, relative to memory access, I/O access is consumed a few orders of magnitude, the structure of the index to minimize the number of disk I/O access during the lookup process. Why using B-/+tree is also related to the principle of disk access.
5. Principle of locality and disk pre-reading: Due to the gap between the disk's access speed and memory, in order to improve efficiency, disk I/O should be minimized. The disk is often not read strictly on-demand, but every time it is read-ahead, the disk reads the required data, sequentially reading back a certain length of data into memory.
The rationale behind this is the well-known local principle in computer science:
When a data is used, the data around it is usually used immediately.
The data required during the program run is typically more centralized
Due to the high efficiency of disk sequential reads (no seek time required and minimal rotational time), pre-reading can improve I/O efficiency for programs with locality. The length of the read-ahead is generally the integer multiple of the page.
6. MySQL (using the InnoDB engine by default), records are managed by page, and the size of each page defaults to 16K (this value can be modified). Linux default page size is 4K
7. Performance analysis of B-/+tree indexes:
The following techniques are also required to implement B-tree:
Each time you create a new node, request a page space directly, so that a node is physically stored in a page, in addition to the computer storage allocation is page-aligned, the implementation of a node only one time I/O.
Assuming that the height of the b-tree is H,b-tree, a maximum of h-1 I/O (root node resident memory) is required for one retrieval, and the progressive complexity is O (h) =o (LOGDN) O (h) =o (LOGDN). In general practice, the out-of-size D is a very large number, usually more than 100, so H is very small (usually not more than 3).
And the red-black tree structure, H is obviously much deeper. Because the logically close node (parent-child) may be far away physically, it is not possible to take advantage of locality, so the I/O asymptotic complexity of the red-black tree is also O (h), and the efficiency is significantly worse than B-tree.
8. Why use B + trees
1). B + Tree is more suitable for external storage, because the inner node has no data domain, a node can store more internal nodes, each nodes can be indexed larger and more accurate, also means that B + tree single disk IO more information than B-tree, I/O efficiency is higher.
2). MySQL is a relational database, interval access is a common situation, B + leaf node added chain pointers, enhanced interval access, can be used in range query, etc., and B-tree each node key and data together, it is not possible to find the interval.
9. mysql Index implementation:
1). The MyISAM engine uses B+tree as the index structure, and the data domain of the leaf node holds the address of the record. The data field holds the address of the record. Therefore, the algorithm of index retrieval in MyISAM is to search the index according to the B+tree search algorithm first.
If the specified key is present, the value of its data field is taken out, and then the record is read with the value of the database field as the address. The index of MyISAM is also called "non-aggregation", and the reason for this is to differentiate it from InnoDB's clustered index.
2). InnoDB Index Implementation: Although InnoDB also uses B+tree as the index structure, the implementation is quite different from MyISAM.
A. The first major difference is that the InnoDB data file itself is the index file. As you know above, the MyISAM index file and the data file are detached, and the index file only holds the address of the data record. In InnoDB, the table data file itself is an index structure organized by B+tree,
The tree's leaf node data field holds a complete record of the records. The key of this index is the primary key of the data table, so the InnoDB table data file itself is the primary index.
B. The second difference from the MyISAM index is that the secondary index of the InnoDB data field stores the value of the corresponding record primary key instead of the address. In other words, all secondary indexes of InnoDB refer to the primary key as the data domain.
10. Understanding the index implementations of different storage engines is very helpful for proper use and optimization of indexes, such as knowing the InnoDB index implementation, it is easy to understand why it is not recommended to use a long field as the primary key, because all secondary indexes refer to the primary index.
A long primary index can cause the secondary index to become too large. For example, it is not a good idea to use a non-monotonic field as the primary key in InnoDB because the InnoDB data file itself is a b+tree, and a non-monotonic primary key causes the data file to maintain the B+tree feature when the new record is inserted
The frequent split adjustment is very inefficient, and using the self-increment field as the primary key is a good choice.
11. The index is theoretically sensitive to order, but because the query optimizer of MySQL automatically adjusts the condition order of the WHERE clause to use the appropriate index
12. If the wildcard% does not appear at the beginning, you can use the index, but depending on the situation, you may only use one of the prefixes. (Title like ' senior% ';)
13. Unfortunately, if a query condition contains a function or expression, MySQL does not use the index for that column (although some are mathematically useful).
14. The index implementation of InnoDB is discussed above, InnoDB uses a clustered index, and the data record itself is stored on the leaf node of the primary index (one b+tree). This requires that each data record in the same leaf node (the size of a memory page or a disk page) be stored in the primary key order.
So whenever a new record is inserted, MySQL inserts it into the appropriate node and location according to its primary key, and if the page reaches the load factor (InnoDB defaults to 15/16), a new page (node) is opened.
If the table uses the self-increment primary key, each time a new record is inserted, the record is added sequentially to the subsequent position of the current index node, and when a page is full, a new page is automatically opened.
This creates a compact index structure that fills in the approximate order. Because there is no need to move existing data each time it is inserted, it is highly efficient and does not add much overhead to maintaining the index.
If you are using a non-self-increasing primary key (such as a social security number or a school number, etc.), each time a new record is inserted into an existing index page, because the value of the primary key is approximately random, it is placed somewhere in the middle:
At this point, MySQL had to move the data in order to insert the new record in the appropriate location, even the target page may have been written back to disk and cleared from the cache, and then read back from the disk, which adds a lot of overhead, while the frequent movement, paging operations caused a lot of fragmentation,
The index structure is not sufficiently compact, and subsequent tables have to be rebuilt by optimize table and the fill page is optimized.
Therefore, as long as possible, please try to use the self-increment field key on InnoDB.
See:
MySQL index behind the data structure and algorithm principle: http://blog.codinglabs.org/articles/theory-of-mysql-index.html
B + Tree Description: https://www.cnblogs.com/wade-luffy/p/6292784.html
See MySQL index structure by b-/b+ tree: 1190000004690721
Main code index, clustered index, non-primary index (secondary index), unique index, foreign key index, composite index, non-primary code index, clustered main code (clustered index), single-column index, multicolumn Index, normal index, etc.: http://www.cnblogs.com/xiangyangzhu/p/index.html
A brief discussion on the algorithm and data structure: Ten-balance search tree B-Tree: http://www.cnblogs.com/yangecnu/p/Introduce-B-Tree-and-B-Plus-Tree.html
The essence and realization of rotation operation in AVL balanced binary tree: 13614403
Tree structure: http://www.cnblogs.com/xrq730/p/5187032.html
MySQL Database index