MySQL index and structure in-depth explanation

Last Update:2016-08-02 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

B-tree

B-tree also called the Balanced Multipath lookup tree (not binary) using the B-TREE structure can significantly reduce the intermediate process that is experienced when locating records, thus speeding up access.
Left Dial hand node key value < The keyword value < right child node key value
The algorithm of retrieving data by key in B-tree is very intuitive: first, binary lookup from the root node, if found to return the corresponding node of data, otherwise the pointer to the corresponding interval of the node pointing to a recursive lookup, until the node is found or a null pointer found, the latter lookup failed.

(Key is the key value of the record, for different data records, the key is not the same; data records the data in addition to the key)

B+tree

B+tree is an improved b+tree.

(Key is the key value of the record, for different data records, the key is not the same; data records the data in addition to the key)

Compared with B-tree, B+tree has the following differences:

The pointer to each node is capped at 2d instead of 2d+1.
The inner node does not store data, only the key, and the leaf node does not store the pointer.

Why does the database use B-tree

The mechanical disk of a computer, in order to amortize the waiting time spent on mechanical movement, the disk accesses multiple data items at a time instead of one, so that a single read unit of information is page, and we can use the number of pages read or write as the primary approximation of the total disk access time , At any given moment, the B-tree algorithm simply keeps a certain number of pages in memory. B-Tree design considering disk prefetching, a node of a B-tree is usually as large as a full disk page (page) , and the size of the disk page limits the number of children (branching factor) that a B-tree node can contain, and of course it depends on the size of one keyword relative to a page.

To minimize I/O operations, disk reads are read-ahead every time and are typically multiples of the page size. Even if only one byte is to be read, the disk reads a page of data (typically 4K) into memory, and the memory and disk Exchange data in pages. Because of the principle of locality, it is common for a data to be used, and the data near it will be used immediately.

B-tree: If a retrieval requires access to 4 nodes, the database System Designer uses the principle of disk pre-reading, the size of the node is designed as a page, then read a node only need one I/O operation, complete the retrieval operation, up to 3 times I/O (root node resident memory). The smaller the data record, the more data each node holds, the smaller the height of the tree, the less I/O operations, and the retrieval efficiency .

B+tree: Non-leaf nodes only exist key, greatly reducing the size of non-leaf nodes, then each node can store more records, the tree is shorter, I/O operations less . So the B+tree has better performance.

What is an index

An index is a data structure.

The cost of the index

Indexes also cost: The index file itself consumes storage space, and the index increases the burden of inserting, deleting, and modifying records, and MySQL consumes resource maintenance indexes at run time, so the index is not as good as possible. Generally, it is not recommended to build an index in two cases
The first case is that the table records are relatively small
Another case where indexing is not recommended is the low selectivity of the index. The so-called Index selectivity (selectivity) refers to the ratio of non-repeating index values (also called cardinality, cardinality) to the number of table records (#T)

Categories of indexes

I. GENERAL index
Second, unique index
Third, primary key index
Iv. Combined Index

The indexes used in MySQL

B+tree is commonly used for indexing in MySQL, but it differs in implementation based on clustered indexes and nonclustered indexes.

Clustered index vs. nonclustered index

The so-called clustered index refers to the primary index file and data file as the same file, clustered index is mainly used in the InnoDB storage engine. The data on the leaf node of the b+tree in the way of the index implementation is the key, the key. Such as:

(T1 table)

(T2 table)

(database corresponding to the file)
Because the InnoDB data file itself is clustered by the primary key, the INNODB requires that the table must have a primary key (MyISAM can not), and if it is not explicitly specified, the MySQL system automatically selects a column that uniquely identifies the data record as the primary key, and if no such column exists, Then MySQL automatically generates an implicit field for the InnoDB table as the primary key, which is 6 bytes long and has a length of type.

MyISAM and INNODB data storage engines in MySQL database

Main differences:
MyISAM is a non-transactional security type, and InnoDB is a transaction-safe type.
The granularity of MyISAM locks is table-level, while InnoDB supports row-level locking.
MyISAM supports full-text type indexing, while InnoDB does not support full-text indexing.
MyISAM is relatively simple, so in terms of efficiency is better than INNODB, small applications can consider the use of MyISAM.
MyISAM tables are saved as files, and using MyISAM storage in cross-platform data transfer saves a lot of hassle.
The InnoDB table is more secure than the MyISAM table and can be switched from non-transactional tables to transaction tables (ALTER TABLE tablename TYPE=INNODB) without ensuring that the data is not lost.
Application Scenarios:
MyISAM Manage non-transactional tables. It provides high-speed storage and retrieval, as well as full-text search capabilities. MyISAM is a better choice if you need to perform a large number of select queries in your application.
InnoDB is used for transactional applications and has many features, including acid transaction support. If you need to perform a large number of insert or update operations in your app, you should use InnoDB, which can improve the performance of multiple user concurrency operations.

Replenish the storage of main memory

Take process
When the system needs to read main memory, the address signal is put to address bus upload to main memory, main memory read the address signal, parse the signal and locate the designated storage unit, and then put this storage unit data on the bus for other parts to read.
The process of writing main memory is similar, the system will write the unit address and data on the address bus and bus, respectively, main memory reads two bus contents, does the corresponding write operation.
As can be seen here, the main memory access time is only linear with the number of accesses, because there is no mechanical operation, two times access to the data "distance" will not have any impact on time, for example, first take A0 and then take A1 and take A0 and then take D3 time consumption is the same

Disk Access principle

When data needs to be read from disk, the system will pass the logical address of the data to the disk, and the control circuit of the disk will translate the logical address into the physical address according to the addressing logic, that is, determine which track and which sector the data is to be read. In order to read the data in this sector, the head needs to be placed above this sector, in order to achieve this, the head needs to move to the corresponding track, this process is called seek, the time spent is called seek time, and then the disk rotation of the target sector is rotated to the head, the process of time is called rotation time.

Reference: http://blog.jobbole.com/24006/

MySQL index and structure in-depth explanation

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

MySQL index and structure in-depth explanation

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support