An index is a structure that sorts the values of one or more columns in a database table and uses an index to quickly access specific information in a database table.
Depending on the capabilities of your database, you can create three indexes in the Database Designer: Unique indexes, primary key indexes, and clustered indexes.
Unique index: A unique index is an index that does not allow any two rows to have the same index value. When duplicate key values exist in existing data, most databases do not allow a newly created unique index to be saved with the table. The database may also prevent the addition of new data that will create duplicate key values in the table. Primary key index: database tables often have one or more column combinations whose values uniquely identify each row in the table. This column is called the primary key of the table. Defining a primary key for a table in a database diagram automatically creates a primary key index, which is a specific type of unique index. The index requires that each value in the primary key be unique. When a primary key index is used in a query, it also allows quick access to the data. Clustered index: In a clustered index, the physical order of rows in a table is the same as the logical (indexed) Order of the key values. A table can contain only one clustered index. In a clustered index, the ordering of indexes means that the order of the indexes corresponds to the order of the corresponding index keywords in the database, regardless of the physical order of the data on disk. (http://www.cnblogs.com/lwzz/archive/2012/08/05/2620824.html) Clustered and nonclustered indexes: (http://www.2cto.com/database/201609/ 544681.html) B + Tree index can be divided into clustered index and secondary index, they are different points, the clustered index of row data and the primary key B + tree stored together, secondary index only stores the secondary and primary keys. A variety of B-tree details (http://www.cnblogs.com/lwzz/archive/2011/07/27/2119021.html) here mainly about MySQL used to the B + tree.
B + Tree
Nature: B + Tree is a variant of B-tree and is also a multi-path search tree:
1. Its definition is basically the same as the B-tree, except:
2. The sub-tree pointer of non-leaf node is the same as the number of keywords;
3. The subtree pointer of the non-leaf node p[i], pointing to the subtree (b-tree is open interval) of the key value belonging to [K[i], k[i+1]) ;
4. Add a chain pointer for all leaf nodes;
5. All keywords appear at the leaf node;
B + 's search is basically the same as that of the B. C-tree, except that the difference is that B + trees only hit the leaf nodes (b-trees can be
Non-leaf node hit), and its performance is equivalent to the keyword in the complete collection of two-point search;
Features of B +:
1. All keywords appear in the list of leaf nodes (dense index), and the key words in the list are in order;
2. Cannot be hit on non-leaf nodes;
3. The non-leaf node is equivalent to the index of the leaf node (sparse index), and the leaf node is equivalent to the data layer of storing (key) data;
4. More suitable for file indexing system;
B + Tree Insert delete Change chart and source code analysis (http://blog.csdn.net/cdnight/article/details/11772621)
Performance analysis of B-/+tree indexes
The number of disk I/O is generally used to evaluate the index structure. First, from the B-tree analysis, according to the definition of b-tree, it is necessary to retrieve up to H nodes at a time. The designer of the database system skillfully exploits the principle of disk pre-reading, setting the size of a node equal to one page, so that each node can be fully loaded with only one I/O. To achieve this, the following techniques are required to implement B-tree in practice:
Each time you create a new node, request a page space directly, so that a node is physically stored in a page, and the computer storage allocation is page-aligned, the implementation of a node only one time I/O.
B-tree requires a maximum of h-1 I/O (root node resident memory) in a single retrieval, and a progressive complexity of O (h) =o (LOGDN). in general practice, the out D is a very large number, usually more than three (B + trees has very high fanout (number of the pointers to the child nodes in a node, typically on The order of or more), so H is very small (usually not more than 3).
In summary, using B-tree as index structure efficiency is very high.
And the red-black tree structure, H is obviously much deeper. Because the logically close node (parent-child) may be far away physically, it is not possible to take advantage of locality, so the I/O asymptotic complexity of the red-black tree is also O (h), and the efficiency is significantly worse than B-tree.
As mentioned above, B+tree is more suitable for external memory index, reason and inner node out degree D (> node degree: The number of sub-trees that a node contains is called the degree of the node;> the degree of the tree: the degree of the largest node in a tree is called the degree of the tree). From the above analysis, it can be seen that the better the performance of the larger index, and the upper limit depends on the size of the key and data within the node:
DMax = Floor (pagesize/(keysize + datasize + pointsize))
Floor indicates rounding down. Because the B+tree node removes the data domain, it can have a greater degree of granularity and better performance.
is a simple b+tree signal.
MySQL Index implementation
In MySQL, the index is the concept of storage engine level, different storage engine implementation of the index is different, this article mainly discusses the MyISAM and InnoDB two storage engine index implementation way.
Into the search engine, the author of the Liang Bin teacher for B-tree, + + tree gave his opinion (for the sake of authenticity, special quote its exact words, did not make any changes): "B + Tree has one of the biggest benefits, easy to sweep the library, the second tree must use the method of sequential traversal in order to sweep the library, and the second- Tree support Range-query is very convenient, and B-tree is not supported. This is the main reason why the database chooses B + trees.
For example, to check between 5-10, B + Tree one to 5 this mark, and then to 10, and then string up on the line, the second tree is very troublesome. The advantage of the B-tree is that successful queries are particularly advantageous because the height of the tree is higher overall than the B + tree. In unsuccessful cases, B-trees are a little bit cheaper than a + + tree.
B-tree, like your example, 17, one will get the result,
There are a lot of frequency-based search is the choice of B-tree, the more frequent the node of the query to go to the root, if you need to do statistics on the query, and to make some changes to key.
In addition B-tree is better than the second-level tree or the root or the above several layers because it is repeatedly query, so these blocks are basically in memory, will not appear read disk IO, generally started, will be actively swapped into memory. "
MyISAM Index Implementation (http://blog.codinglabs.org/articles/theory-of-mysql-index.html)
The MyISAM engine uses B+tree as the index structure, and the data domain of the leaf node holds the address of the record. is a schematic diagram of the MyISAM index:
There are three columns in the table, assuming that we are Col1 as the primary key, is a MyISAM table main index (Primary key) schematic. You can see that the index file of MyISAM only saves the address of the data record (by default, in order of storage). In MyISAM, the primary index and secondary index (secondary key) are structurally indistinguishable, except that the primary index requires that key be unique, and the secondary index key can be duplicated. If we establish a secondary index on Col2, the structure of this index is as follows:
It is also the address of a b+tree,data field that holds data records. Therefore,the algorithm of index retrieval in MyISAM is to search the index according to the B+tree search algorithm first, if the specified key exists, the value of its data field is fetched, then the data record is read with the value of the data field.
The index of MyISAM is also called "non-aggregation", and the reason for this is to differentiate it from InnoDB's clustered index.
InnoDB Index Implementation
Although InnoDB also uses B+tree as the index structure, the implementation is very different from MyISAM.
The first major difference is that the InnoDB data file itself is the index file. as you know above, the MyISAM index file and the data file are detached, and the index file only holds the address of the data record. In InnoDB, the table data file itself is an index structure organized by B+tree, and the tree's leaf node data field holds the complete data record. The key of this index is the primary key of the data table, so the InnoDB table data file itself is the primary index.
Is the InnoDB primary index (also the data file), you can see that the leaf node contains the complete data record. This index is called a clustered index. Because the InnoDB data file itself is clustered by the primary key, the INNODB requires that the table must have a primary key (MyISAM can not), and if it is not explicitly specified, the MySQL system automatically selects a column that uniquely identifies the data record as the primary key, and if no such column exists, Then MySQL automatically generates an implicit field for the InnoDB table as the primary key, which is 6 bytes long and has a length of type.
The second difference from the MyISAM index is that the secondary index of the InnoDB data field stores the value of the corresponding record primary key instead of the address. in other words, all secondary indexes of InnoDB refer to the primary key as the data domain. For example, to define a secondary index on the Col3:
Here is the ASCII code of the English character as the comparison criterion. Clustered index This implementation makes search by primary key efficient, but secondary index search needs to retrieve two times index: first retrieves the secondary index to obtain the primary key, and then retrieves the record with the primary key to the primary index.
Understanding how index implementations of different storage engines can be useful for proper use and optimization of indexes, such as knowing the InnoDB index implementation, makes it easy to understand why it is not recommended to use too-long fields as primary keys, because all secondary indexes refer to the primary index, and the long primary index makes the secondary index too large. For example, using non-monotonic fields as primary keys is not a good idea in InnoDB because the InnoDB data file itself is a b+tree, and a non-monotonic primary key causes the data file to be frequently split in order to maintain b+tree characteristics when inserting new records, which is inefficient, Using the self-increment field as the primary key is a good choice.
To optimize SQL using explain:
1) There is a maximum of one matching row (primary key or unique index) in the Consts single table, and the data can be read at the optimization stage.
2) ref refers to using a normal index. (normal index)
3) range to retrieve the scope of the introduction of the cable.
Counter example: Explain table results, Type=index, index physical file full scan, very slow, this index level
Compare range is also low, with full table scan is dwarfed.
When an index is used, InnoDB locks the tuples it does not need. Worse, if the query cannot use the index, MySQL will scan the whole table and lock each tuple, whether or not it really needs
Hash index
MySQL, only memory storage engine display Support hash index, is the memory table default index type, although memory table can also use B-tree index.
CREATE TABLE Lookup
(ID INT, INDEX USING HASH (ID))
ENGINE = MEMORY;
CREATE TABLE Lookup
(ID INT, INDEX USING BTREE (ID))
ENGINE = MEMORY;
The memory storage engine supports non-unique hash indexes, which are rare in the database domain, and if multiple values have the same hash code, the index saves their row pointers to the same hash table item with the linked list.
Suppose you create one of the following tables:
CREATE TABLE Testhash (
FName VARCHAR () not NULL,
LName VARCHAR () not NULL,
KEY USING HASH (fname)
) Engine=memory;
The following data is included:
Suppose the index uses the hash function f (), as follows:
F (' Arjen ') = 2323 F (' Baron ') = 7437 F (' Peter ') = 8784 F (' Vadim ') = 2458 |
At this point, the structure of the index is probably as follows:
The slots are orderly, but the records are not orderly. When you perform
Mysql> SELECT lname from Testhash WHERE fname= ' Peter ';
MySQL calculates the ' Peter ' hash value and then queries the index's row pointer through it. Because F (' Peter ') = 8784,mysql finds 8784 in the index, it gets a pointer to record 3.
Because indexes only store very short values, the index is very compact. The hash value does not depend on the data type of the column, and the index of a tinyint column may be as large as the index of a long string column.
The hash index has some of the following limitations:
(1) Because the index contains only hash code and record pointers, MySQL cannot avoid reading records by using an index. However, accessing the in-memory records is very rapid and does not have a significant impact on performance.
(2) Cannot use hash index to sort.
(3) The hash index does not support partial matching of keys, because the hash value is computed by the entire index value.
(4) The hash index only supports equivalent comparisons, such as using =,in () and <=>. For where PRICE>100 does not speed up the query.
The difference between MySQL Btree and hash index (https://dev.mysql.com/doc/refman/5.5/en/index-btree-hash.html)
Bitmap Index:
Bitmap indexing is a simple query for multiple fields design a special index, the scope of the application is small, only for fixed field values and the value of a few kinds of situations, such as gender, only male and female, or level, state, and so on, and only when the number of such fields query to reflect the advantages of bitmaps.
The basic idea of a bitmap is to use 0 or one for each condition, if there are 5 records, gender is male, female, male, male, female, then if using a bitmap index will build two bitmaps, corresponding to the male 10110 and the corresponding female 01001, what is the benefit of doing this? That is, if you are querying multiple fields of this type at the same time, you can use bitwise-and-bitwise OR to directly get results.
Reverse key index:
Consider this situation: the value of a field is 1-1000 sequential, the establishment of the B-tree index is still incremented, and then the B-Number index continues to add branches later, will form the following asymmetric tree:
The reverse key index is a special B-tree index that is identical to the B-tree index in the storage construct, but for numeric values, the reverse key index reverses the bytes of each key value first, and then indexes the new data after the reverse. For example, input 2008 is converted to 8002, so that when the value increases once, the distribution of its inverse key in size is still relatively average.
Example of creating a reverse key index:
CREATE index ind_t on T1 (id) reverse;
Note: The key inversion is done by the system itself. is transparent to the user.
Block Nested LoopMySQL's multi-table join uses the nested loop, see the MySQL official website of the document description: https://dev.mysql.com/doc/refman/5.5/en/nested-loop-joins.html
Summary:This article just made a general description of the MySQL index, only to do a few days to the MySQL index of the study of a summary, the text of the reference to the part of the specific link, there are many links to the details of the index is more clearly described, but also for my learning cable caused very good help, thanks!
MySQL Database index