Clustered Index of "Learning notes" MySQL indexing principle
Index is divided into clustered index and nonclustered index.
Take an English textbook for example, to find the 8th lesson, directly turn the book, if first turn to the 5th lesson, then turn back, and then turn to the 10th lesson, then turn. The book itself is an index, or "clustered index".
If you want to find the word "fire", it will go to the appendix in the back of the book, the appendix is sorted alphabetically, find the F letter that piece, and then find "Fire", corresponding to it in the first few lessons. This appendix, "Nonclustered Index".
This shows that the clustered index, the order of the index is the order of data storage, so it is easy to understand that a data table can only have a clustered index.
Clustered index is much more efficient than nonclustered index query, especially when the range is queried. So, as for the clustered index should be the primary key, or other fields, this can be discussed again.
1, the MySQL index
In MySQL, different storage engines have different implementations of the index, roughly MyISAM and InnoDB two storage engines.
The data on the leaf node of the MyISAM B+tree is not the information itself, but the address where the data resides. The primary and secondary indexes are no different, but the key in the primary index must be unique. The indexes here are not clustered.
MyISAM also uses a compression mechanism to store indexes, for example, the first index is "her" and the second index is "here", then the second index is stored as "3,e", which is the disadvantage that indexes in the same node can only be found in order.
InnoDB data file itself is the index file, b+tree leaf node on the data itself, key key, which is clustered index. Non-clustered index, the data on the leaf node is the primary key (so the key of the clustered index cannot be too long). The reason why the primary key is stored, not the address of the record, is quite simple because the address of the record is not guaranteed to change, but the primary key can be guaranteed.
As for why the primary key is usually recommended to use the self-increasing ID.
2. Clustered index
The physical order of the data in the clustered index is consistent with the index order, that is, as long as the index is adjacent, the corresponding data must also be stored on the disk adjacent to it. If the primary key is not an ID, imagine what it will do, constantly adjust the physical address of the data, paging, and of course there are other measures to reduce these operations, but it can not be completely avoided. However, if it is a self-added, it is simple, it only needs a page to write, the index structure is relatively compact, less disk fragmentation, high efficiency.
Clustered index not only can improve the efficiency greatly in the retrieval, but also in the data reading. For example: You need to query f~t all the words.
A primary index that uses the MyISAM, and a clustered index using the InnoDB. The B+tree retrieval time of both indexes is the same, but there is a difference in reading.
Because the main index of MyISAM is not a clustered index, then the physical address of his data is necessarily messy, get these physical addresses, I/O read according to the appropriate algorithm, and then start to constantly seek the way to rotate. The clustered index is only once I/O.
However, when it comes to sorting large amounts of data, full table scans, count-like operations, or MyISAM, because the index takes up less space, these operations need to be done in memory.
Given the range query efficiency of the clustered index, many people think that using primary keys as a clustered index is too much waste, after all, almost no scope query using the primary key. However, if you consider the storage of the clustered index, it is not a good conclusion.
Learning Notes "The B+/-tree of MySQL indexing principle
Index, is to query the data faster, query algorithm has many, corresponding data structure is also many, database commonly used index data structure generally for b+tree.
1, B-tree
The official definition of b-tree is more difficult to understand, and the popular point is to cite an example. If: An English dictionary, Word + detailed explanation made up a record, now need to index words, then word for key, Word + explained in detail as Data,b-tree is a two-tuple {key,data} to define a record. If a node has 3 records, the corresponding 4 pointers are available to point to the next node. The b-tree is orderly and balanced, with all leaf nodes at the same level, that is, there is not a single branch hierarchy and a small branch level.
Because the B-tree is ordered, it is easy to find, starting from the root node binary lookup, find the return node, otherwise along the interval pointer to find the next node. For example, query false for the word.
2, B+tree
Unlike B-tree, b+tree each node has only key, no data, and leaf nodes have no pointers. That is to say, the b+tree of the leaf node and the internal node of the data structure is not the same.
The general database uses the B+tree, and after some optimization, such as on the leaf node to increase the sequential access pointer, improve the efficiency of the interval query. For example: Query the first letter is f~t all the words. Then just look up the first word fabric at the beginning of F and then walk along the leaf node until you find the last word that starts with T.
Briefly introduced B-/+tree, as for many data structure, why the database index chooses Btree, and chooses the B+tree, the following from the computer storage principle aspect to say simply.
3, read memory and read disk
The efficiency of memory reads and disk reads varies widely.
In simple words, memory reads, memory is made up of a series of storage units, each storage unit stores fixed-size data and has a unique address.
When the memory needs to be read, the address signal is placed on the address bus select、read to the memory, the memory parsing signal is positioned to the storage unit, and then the data on the storage unit is placed on the bus and returned.
Write memory, the system will be written to the data and cell address on the bus and address buses, memory read two bus content, do the corresponding write operation.
Memory access efficiency, in relation to the number of times, read a data first or later read a data does not affect access efficiency. Disk access is different, and disk I/O involves mechanical operations.
A disk is made up of a circular platter of the same size and axis, and the disks can rotate (each disk must rotate simultaneously). On one side of the disk is a head bracket, and the head bracket is secured with a set of heads, each of which is responsible for accessing the contents of a disk. The head does not move, the disk rotates, but the magnetic arm can be moved back and forth to read data on different tracks. A track is a series of concentric rings (such as the red circle of icons) that are divided into platters. The track is divided into a small section, called sector, is the disk's smallest storage unit.
When the disk is read, the system passes the data logical address to the disk, and the control circuit of the disk resolves the physical address, which is which sector of the track. So the head needs to move back and forth to the corresponding track, the time consumed is called seek time, and then the disk rotation to the corresponding sector to the head, the time spent is called rotation time. Therefore, the appropriate sequence of operations and data storage can reduce the seek time and rotation time.
In order to minimize I/O operations, disk reads are read every time, usually in the size of an integer multiple of the page. Even if you only need to read one byte, the disk reads one page of data (typically 4K) into memory, and the memory and disk Exchange data in page units. Because of the local principle that a data is usually used, the data near it will be used immediately.
4. Search Performance Analysis
B-tree: If a retrieval requires access to 4 nodes, the database System designer uses the disk prefetching principle to design the size of the node as a page, and reading a node requires only one I/O operation to complete this retrieval operation with a maximum of 3 I/O (root node resident memory). The smaller the data records, the more data each node holds, the smaller the height of the tree, the less I/O operation, and the efficiency of the search.
B+tree: The inside node only save key, greatly reduce the size of the inner node, then each node can store more records, the tree is shorter, I/O operation less. So B+tree has better performance.
5. Other index methods
Hash Index: An index that is positioned through a hash that is less referenced by a single value query. InnoDB's Adaptive Index is the hash index.
Bitmap index: Field values are fixed and few, such as gender, status. It is extremely efficient to and/or queries on multiple such fields at the same time, and you can get results directly by bitwise AND/OR. So, the scope of application is limited.