B-tree
B-tree also called the Balanced Multipath lookup tree (not binary) using the B-TREE structure can significantly reduce the intermediate process that is experienced when locating records, thus speeding up access.
Left Dial hand node key value < The keyword value < right child node key value
The algorithm of retrieving data by key in B-tree is very intuitive: first, binary lookup from the root node, if found to return the corresponding node of data, otherwise the pointer to the corresponding interval of the node pointing to a recursive lookup, until the node is found or a null pointer found, the latter lookup failed.
(Key is the key value of the record, for different data records, the key is not the same; data records the data in addition to the key)
B+tree
B+tree is an improved b+tree.
(Key is the key value of the record, for different data records, the key is not the same; data records the data in addition to the key)
Compared with B-tree, B+tree has the following differences:
- The pointer to each node is capped at 2d instead of 2d+1.
- The inner node does not store data, only the key, and the leaf node does not store the pointer.
Why does the database use B-tree
The mechanical disk of a computer, in order to amortize the waiting time spent on mechanical movement, the disk accesses multiple data items at a time instead of one, so that a single read unit of information is page, and we can use the number of pages read or write as the primary approximation of the total disk access time , At any given moment, the B-tree algorithm simply keeps a certain number of pages in memory. B-Tree design considering disk prefetching, a node of a B-tree is usually as large as a full disk page (page) , and the size of the disk page limits the number of children (branching factor) that a B-tree node can contain, and of course it depends on the size of one keyword relative to a page.
To minimize I/O operations, disk reads are read-ahead every time and are typically multiples of the page size. Even if only one byte is to be read, the disk reads a page of data (typically 4K) into memory, and the memory and disk Exchange data in pages. Because of the principle of locality, it is common for a data to be used, and the data near it will be used immediately.
B-tree: If a retrieval requires access to 4 nodes, the database System Designer uses the principle of disk pre-reading, the size of the node is designed as a page, then read a node only need one I/O operation, complete the retrieval operation, up to 3 times I/O (root node resident memory). The smaller the data record, the more data each node holds, the smaller the height of the tree, the less I/O operations, and the retrieval efficiency .
B+tree: Non-leaf nodes only exist key, greatly reducing the size of non-leaf nodes, then each node can store more records, the tree is shorter, I/O operations less . So the B+tree has better performance.
What is an index
An index is a data structure.
The cost of the index
Indexes also cost: The index file itself consumes storage space, and the index increases the burden of inserting, deleting, and modifying records, and MySQL consumes resource maintenance indexes at run time, so the index is not as good as possible. Generally, it is not recommended to build an index in two cases
The first case is that the table records are relatively small
Another case where indexing is not recommended is the low selectivity of the index. The so-called Index selectivity (selectivity) refers to the ratio of non-repeating index values (also called cardinality, cardinality) to the number of table records (#T)
Categories of indexes
I. GENERAL index
Second, unique index
Third, primary key index
Iv. Combined Index
The indexes used in MySQL
B+tree is commonly used for indexing in MySQL, but it differs in implementation based on clustered indexes and nonclustered indexes.
Clustered index vs. nonclustered index
The so-called clustered index refers to the primary index file and data file as the same file, clustered index is mainly used in the InnoDB storage engine. The data on the leaf node of the b+tree in the way of the index implementation is the key, the key. Such as:
(T1 table)
(T2 table)
(database corresponding to the file)
Because the InnoDB data file itself is clustered by the primary key, the INNODB requires that the table must have a primary key (MyISAM can not), and if it is not explicitly specified, the MySQL system automatically selects a column that uniquely identifies the data record as the primary key, and if no such column exists, Then MySQL automatically generates an implicit field for the InnoDB table as the primary key, which is 6 bytes long and has a length of type.
MyISAM and INNODB data storage engines in MySQL database
Main differences:
MyISAM is a non-transactional security type, and InnoDB is a transaction-safe type.
The granularity of MyISAM locks is table-level, while InnoDB supports row-level locking.
MyISAM supports full-text type indexing, while InnoDB does not support full-text indexing.
MyISAM is relatively simple, so in terms of efficiency is better than INNODB, small applications can consider the use of MyISAM.
MyISAM tables are saved as files, and using MyISAM storage in cross-platform data transfer saves a lot of hassle.
The InnoDB table is more secure than the MyISAM table and can be switched from non-transactional tables to transaction tables (ALTER TABLE tablename TYPE=INNODB) without ensuring that the data is not lost.
Application Scenarios:
MyISAM Manage non-transactional tables. It provides high-speed storage and retrieval, as well as full-text search capabilities. MyISAM is a better choice if you need to perform a large number of select queries in your application.
InnoDB is used for transactional applications and has many features, including acid transaction support. If you need to perform a large number of insert or update operations in your app, you should use InnoDB, which can improve the performance of multiple user concurrency operations.
Replenish the storage of main memory
Take process
When the system needs to read main memory, the address signal is put to address bus upload to main memory, main memory read the address signal, parse the signal and locate the designated storage unit, and then put this storage unit data on the bus for other parts to read.
The process of writing main memory is similar, the system will write the unit address and data on the address bus and bus, respectively, main memory reads two bus contents, does the corresponding write operation.
As can be seen here, the main memory access time is only linear with the number of accesses, because there is no mechanical operation, two times access to the data "distance" will not have any impact on time, for example, first take A0 and then take A1 and take A0 and then take D3 time consumption is the same
Disk Access principle
When data needs to be read from disk, the system will pass the logical address of the data to the disk, and the control circuit of the disk will translate the logical address into the physical address according to the addressing logic, that is, determine which track and which sector the data is to be read. In order to read the data in this sector, the head needs to be placed above this sector, in order to achieve this, the head needs to move to the corresponding track, this process is called seek, the time spent is called seek time, and then the disk rotation of the target sector is rotated to the head, the process of time is called rotation time.
Reference: http://blog.jobbole.com/24006/
MySQL index and structure in-depth explanation