Understanding of B + tree and index in MySQL

Source: Internet
Author: User

Understanding of B + tree and index in MySQL
Overview

  • Nature: Databases maintain certain data structures and reference (point to) data in some way
  • Index selection principle: the structure of the index should minimize the number of disk I/O accesses during the search process.
Conditions satisfied by Tree B
  • D is a positive integer greater than 1, called the degree of B-Tree.
  • H is a positive integer called the height of B-Tree.
  • Each non-leaf node consists of N-1 keys and n pointers, where d <= n <= 2d
  • Each leaf node contains at least one key and two pointers, and at most two D-1 keys and two 2d pointers. The pointers of leaf nodes are null.
  • All leaf nodes share the same depth, equal to the height of the tree H
  • The key and pointer are separated from each other, and the two ends of the node are pointers.
  • Keys in a node are arranged from left to right in non-descending order.
  • Tree structure composed of all nodes
  • Each pointer is either null or directed to another node.
  • If a B-Tree with a degree of d is set to index N keys, the upper limit of H in the Tree is logd (N + 1)/2 ), the progressive complexity of retrieving the number of nodes for a key is logd (N)
Updated operation
  • Inserting and deleting a new data record will undermine the nature of B-Tree. Therefore, you must perform operations such as splitting, merging, and transferring the Tree to maintain the B-Tree nature.
B + tree
  • The maximum pointer value for each node is 2d instead of 2d + 1.
  • Internal nodes do not store data, but only store keys
  • Leaf nodes do not store pointers
  • On the basis of the classic B + tree, sequential access pointers are added --> to Improve the Performance of range access

Why is B/B + tree used? Read from primary storage
  • When the system needs to read the master memory, the address signal is put into the address bus and uploaded to the master memory.
  • After the primary storage reads the address signal, it parses the signal and locates the specified storage unit, and then places the data in the data bus for other parts to read.
  • The time of primary storage access is only linearly related to the number of accesses. Because there is no mechanical operation, the "distance" of the data accessed twice will not affect the time.
Disk access principle
  • Disk rotation, each head does not move, responsible for reading content
  • However, we already have multiple heads of independent technology.
  • Locality Principle
  • Disk pre-read: The length is generally measured in integer multiples of pages.
MyISAM index implementation
  • Using the B + tree as the index structure, data stores the data record address
  • Separation of index files and data files
  • The primary index and Secondary index (Secondary key) have no difference in structure, but the primary index requires the key to be unique, and the Secondary index key can be repeated.
  • Non-aggregation: The index search algorithm in MyISAM is to first search for indexes based on the B + Tree search algorithm. If the specified Key exists, the value of its data field is retrieved, then, read the corresponding data records using the data domain value as the address.

Composition of the. MYI File
  • State of the basic information of the entire index file
  • Restricted information base of each index
  • Keydef
  • Recinfo
Index reading process
  • The query request directly reads the cache block in the key cache and returns
  • No data is read in the. MYI file as a file block.
  • Then access the key cache in the same format
  • Then, return the data in the key cache.
InnoDB Index implementation
  • B + tree is also used
The first difference between MyISAM and MyISAM
  • The first major difference is that the InnoDB data file itself is an index file, and the table data file itself is an index structure organized by B + Tree.
  • InnoDB data files must be clustered by primary keys.
  • Therefore, InnoDB requires that the table have a primary key (MyISAM may not)
  • The unique ID column is automatically selected because it is not explicitly specified.
  • If this field does not exist, an implicit field of 6 long integers is generated.
The second difference with MyISAM
  • InnoDB's secondary index data domain stores the value of the primary key of the corresponding record instead of the address
  • In other words, all secondary indexes of InnoDB reference the primary key as the data domain.
  • Secondary index search requires two indexes: First retrieve secondary index to obtain the primary key, and then use the primary key to search for the primary index to obtain the record.
Optimization points
  • Too long fields are not recommended as the primary key, because all secondary indexes reference the primary index. A too long primary index will make the secondary index too large.
  • Using non-monotonous fields as the primary key is not good in InnoDB, because the InnoDB data file itself is a B + Tree, non-monotonous primary keys will cause frequent split and adjustment of data files in order to maintain the features of B + Tree during the insertion of new records, which is very inefficient. Using Auto-incrementing fields as the primary key is very good.
  • The cost caused by the update of the clustered index key, except that the index data may be moved, all the relevant record data will also be moved.
Index Usage Policy and optimization of Full-column matching
  • When exact match is performed based on all columns IN the index (here exact match refers to "=" or "IN" match), the index can be used.
  • Theoretically, indexes are sensitive to order, but the MySQL query optimizer automatically adjusts the conditional order of the where clause to use suitable indexes.
Leftmost prefix match
  • When the query condition precisely matches one or more consecutive columns on the left of the index, the index can be used.
The query condition uses the exact match of columns in the index, but a condition in the middle is not provided.
  • Only indexes can be used, and columns before being disconnected from the center
  • Response
  • Secondary indexes can be added.
  • When there are few intermediate conditions, isolate columns and use IN to include
  • Depending on the situation, comparison is established
The first column of the index is not specified in the query condition.
  • The index cannot be used.
Match the prefix string of a column
  • Indexes available
  • If the wildcard % does not appear at the beginning, the index can be used, but only one of the prefixes may be used according to the actual situation.
Range Query
  • The index can be used for a range column (the leftmost prefix must be used), but the index cannot be used for the column following the range column.
  • At the same time, the index can be used for at most one range column. Therefore, if there are two range columns in the query condition, the index cannot be fully used.
  • Using explain alone may not distinguish between range index and multi-value matching.
The query conditions contain functions/expressions.
  • Generally, it is not used.
  • Manually calculate and import
Index selection and prefix index MyISAM and InnoDB base statistics
  • The base values of the MyisAM index (as can be seen by the Cardinality, show index Command) are precise, while InnoDB is the estimated value.
  • MyisAM statistics is to save the disk, update this information in the alter table or Analyze table operation
  • While InnoDB stores the estimated value in the cache zone when the table is opened for the first time.
Index creation is not recommended.
  • Fewer table records
  • Low indexing selectivity: the ratio of non-repeated index values (also called Cardinality) to the number of table records (# T)
Prefix Index
  • Replace the entire column with the column prefix as the index key. When the prefix length is appropriate, you can make the prefix index selectively close to the full column index, at the same time, the index file size and maintenance overhead are reduced because the index key becomes short.
Disadvantages
  • Cannot be used for order by or group by operations
  • It cannot be used for Covering index (that is, when the index itself contains all the data required for the query, the data file itself is no longer accessed)
InnoDB primary key selection and insertion Optimization
  • If there is no special need, always use an auto-increment field that is not related to the business as the primary key.
  • InnoDB uses clustered indexes, and data records are stored on the leaf nodes of the primary index (a B + Tree ).
  • This requires that each data record in the same leaf node (the size is a memory page or disk page) be stored in the primary key order. Therefore, when a new record is inserted, mySQL inserts an appropriate node and Location Based on its primary key. If the page reaches the load factor (InnoDB defaults to 15/16), a new page (node) is created)
  • If a non-auto-incrementing primary key is used, insertion is approximately random each time, which may easily lead to data movement. The target page is re-read, and there are more fragments. Although optimize table can also be used for reconstruction and optimization, It is troublesome.
References
  • Image Source Network
  • High-performance MySQL (3rd) Chinese PDF with clear directory Version Download

This article permanently updates the link address:

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.