Summary of usage of indexing and optimization in MySQL

Last Update:2016-06-21 Source: Internet

Author: User

Tags modifier ticket

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1, what is the index in the database? What does an index do?

The index is introduced to speed up the query. If the amount of data is large, large queries will load data from the hard disk into memory.

2. What is the index principle in InnoDB?

InnoDB is the default storage engine for MySQL, and InnoDB has two indexes: B + Tree index and hash index, where the hash index is self-adaptive, and the storage engine automatically creates a hash index based on the usage of the table, not human intervention.

B-Tree, B-tree,+ + tree,b* tree Four kinds of data structures used in the index, the order of the four data structures must be such. Described separately as follows:

B-Tree: Two fork tree, each node only stores one keyword, equals the hit, less than the left node, greater than the right node; B-Tree: Multi-path search tree, each node storage M/2 to M keywords, non-leaf nodes store points to the key range of sub-nodes; All keywords appear in the entire tree and appear only once, Non-leaf nodes can be hit; B + Tree: On the basis of the B-tree, for the leaf nodes to increase the link list pointer, all the keywords appear in the leaf nodes, non-leaf nodes as the index of the leaf nodes; B + tree is always hit by the leaf node; b* tree: On the basis of B + tree, the non-leaf nodes also increase the list pointer, Increase the minimum utilization of nodes from 1/2 to 2/3;

First, the B-tree is also called a binary search tree, the word as its meaning. The B-Tree has the following three characteristics: all non-leaf nodes have at most two sons; all nodes store a keyword; the left pointer of a non-leaf node points to a subtree that is smaller than its keyword, and the right pointer points to a subtree larger than its keyword.

B-Tree search, starting from the root node, if the query keyword is equal to the node's keywords, then hit,Otherwise, if the query keyword is smaller than the node keyword, go to the left son; if the keyword is larger than the node, enterright son; If the pointer to the left son or right son is empty, the report cannot find the corresponding keyword. If the number of nodes of the left and right subtrees of all non-leaf nodes in the B-tree remains approximately (balanced), then the B-TreeSearch performance approximation to binary lookup; but it has the advantage of a binary lookup over contiguous memory space, changing the B-tree structure(insert and DELETE nodes) do not need to move large segments of memory data, or even constant overhead. The far right is also a B-tree, but its search performance is already linear; the same keyword set can lead to differenttree structure Index; therefore, the use of B-trees should also be considered as far as possible to keep the B-tree structure of the left graph, and avoid the structure of the right graph, alsois the so-called "balance" problem;The actual B-Tree is based on the original B-tree with the balance algorithm, that is, "balanced binary tree"; How to keep B-TreeThe equilibrium algorithm of node distribution is the key to balance binary tree. The equilibrium algorithm is a kind of inserting and deleting nodes in the B-tree .strategy;

Second, B-tree. The larger the amount of data, the higher the height of the B-tree, and the higher it is, mainly due to the binary fork. So on this basis we define the B-Tree specification as follows: B-Tree is not two-pronged, so it is called as a multi-path search tree.

< K [i+1];6. Pointers to non-leaf nodes: p[1], p[2], ..., p[m]; where p[1] a subtree that points to a keyword less than k[1], p[m] a subtree that points to a keyword greater than k[m-1], other p[i] a subtree that points to (K[i-1], k[i]) ; 7. All leaf nodes are located on the same layer; Medium (M=3)

B-Tree search, starting from the root node, a binary search of the keyword (ordered) sequence within the node, if The hit ends, otherwise the son node that enters the range of the query keyword repeats until the corresponding son pointer is empty, or already a leaf knot.

B-Tree properties:       1. The keyword set is distributed throughout the tree;       2. Any keyword appears and appears only in one node;       3. Search may end at non-leaf nodes;       4. Its search performance is equivalent to doing a binary lookup in the keyword complete;       5. Automatic level control;       because it restricts non-leaf nodes outside of the root node, it contains at least M/2 sons, ensuring at least the utilization of the end point, its minimum search performance, where m is the maximum number of subtree of non-leaf nodes set, and n is the total number of keywords , so the performance of B-tree is always equivalent to binary lookup (independent of M-value), there is no problem of B-tree balance; Due to the limitation of M/2, if the node is full at the point of insertion, it is necessary to divide the node into two M/2 nodes, and to delete the nodes, we need to merge the two M/2 brothers nodes;

　　Second, B + trees. B-Tree, B-tree, + + tree, b* tree. B-Tree is a two-fork search tree, B-tree, plus + tree, b* tree are multi-path search tree. B-Tree defines the basic specification, it has a characteristic, the keyword appears in the non-leaf node or the leaf node, the subtree's pointer is bigger than the keyword number one. The B + Tree has been upgraded in these two areas, defined as follows:

B + Tree is a variant of B-tree and a multi-path search tree:       1. Its definition is basically the same as B-tree, except:       2. The subtree pointer of the non-leaf node is the same as the number of keywords;       3. Subtree pointer of non-leaf node p[i], pointing to keyword value belongs to [K[i], k[i+1] Subtree (b-tree is open interval);       5. Add a chain pointer for all leaf nodes;       6. All keywords appear at the leaf node; b + searches are basically the same as those of the B. C-tree, except that the + + tree only hits the leaf nodes (b-trees can be hit on non-leaf nodes), Its performance is also equivalent to doing a binary search in the keyword complete; b + Features:       1. All keywords appear in the list of leaf nodes (dense index), and the keywords in the list happen to be orderly;       2. It is impossible to hit a non-leaf node;       3. The non-leaf node is equivalent to the index of the leaf node (sparse index), and the leaf node is equivalent to the data layer of storing (key) data;       4. More suitable for file indexing system;

Finally the b* tree, which is a variant of the B + tree, adds a pointer to the brother in the non-root and non-leaf nodes of the B + tree.

The b* tree defines the number of non-leaf node keywords at least (2/3) *m, that is, the minimum usage of the block is 2/3 (instead of the B + tree), and the division of the tree: When a node is full, a new node is allocated, and 1/2 of the data in the original node is copied to the new node, and the pointer to the new node is added to the parent node. B + The division of the tree only affects the original node and the parent node, and does not affect the sibling node, so it does not need to point to the brother's pointer; b*: When a node is full, if its next sibling node is not full, move part of the data to the sibling node and insert the keyword at the original node. Finally, the keyword of the sibling node in the parent node is modified (because the keyword range of the sibling node has changed);
  If the brother is full, the new node is added between the original node and the sibling node, and each copy 1/3 of the data to the new node, and finally the pointer of the new node is added to the parent node; therefore, the probability of allocating new nodes in b* tree is lower than that of B + tree, and the space utilization rate is higher.

3. How do I add an index to a table in Navicat?

#删除表DROP table test.idc_work_order_main# to create tables structure idc_work_order_maincreate "Idc_work_order_main" (' id ' int (one) not N ULL auto_increment COMMENT ' primary key ID ', ' creator ' varchar (+) not NULL DEFAULT ' 0 ' COMMENT ' creator ', ' gmt_create ' timestamp NUL L default NULL COMMENT ' creation time ', ' modifier ' varchar (+) Default ' 0 ' COMMENT ' modifier ', ' gmt_modified ' timestamp NULL default Null COMMENT ' Modified time ', ' title ' varchar (+) default null COMMENT ' ticket title ', ' category ' varchar (+) default NULL COMMENT ' Work order category ' , ' subject ' varchar (+) default null COMMENT ' ticket type ', ' demander ' varchar (+) default null COMMENT ' demand side ', ' Is_atomic ' Cha R (1) Default ' Y ' COMMENT ' Atomic ticket ', ' atomic_id ' int (one) default NULL COMMENT ' current atomic ticket in list id ', ' site ' varchar () default N ULL COMMENT ' work order in the room ', ' operationer ' varchar (+) default NULL COMMENT ' current handler ', ' operation_role ' varchar () default NULL   COMMENT ' Current processing role ', ' state ' varchar (default null COMMENT ' ticket status ', ' sub_state ' varchar ') default NULL COMMENT ' work order status ', ' Expect_time ' timestamp Null default NULL COMMENT ' expected statement time ', ' SLA ' bigint ($) default null COMMENT ' SLA ', ' evaluation ' varchar ($) Default null COMMENT ' evaluation ', ' create_source ' varchar (+) Default ' Tboss ' COMMENT ' create source ', ' source_key ' varchar (+) Default NULL COMMENT ' Create source unique flag ', ' is_deleted ' char (1) Default ' n ' COMMENT ' is deleted y,n ', ' remark ' varchar ($) Default NULL COMMENT ' remarks ', ' paren t_id ' int (one) default ' 0 ' COMMENT ' Parent ticket ID ', ' asset_total ' int (one) default ' 0 ' COMMENT ' total devices ', ' sla_standard ' double Defau LT null COMMENT ' Standard Time ', ' Sla_unit ' char (1) default null COMMENT ' SLA unit ', ' effective_date ' timestamp null DEFAULT null C Omment ' Bill of lading time (effective time) ', ' Is_timeout ' char (1) DEFAULT ' n ' COMMENT ' whether timeout, ' y ' timeout, ' n ' not timed out ', ' statement_date ' timestamp NULL defau LT null COMMENT ' statement time ', ' source_creator ' varchar (+) DEFAULT NULL COMMENT ' third party creator information (domain account) ', ' atomic_order_id ' int (one) DE FAULT NULL COMMENT ' current atomic ticket number ', ' order_device_type ' varchar (*) DEFAULT ' server ' COMMENT ' ticket device type (server= server, Network_ Serve, etc.) ', ' finish_asset_total' int ' (one) DEFAULT ' 0 ' COMMENT ' complete number of devices ', PRIMARY key (' ID '), key ' Idx_statement_date ' (' statement_date '), key ' Idx_parent _id ' (' parent_id '), key ' idx_gmt_modified ' (' gmt_modified '), key ' Idx_gmt_create ' (' gmt_create ')) Engine=innodb AUTO_IN crement=182431 DEFAULT Charset=utf8 comment= ' Work Order Master table '; #显示建表信息SHOW CREATE table idc_work_order_main# add an index ALTER TABLE IDC_ Work_order_main ADD index atomic_order_id (atomic_order_id) SHOW index from Idc_work_order_mainexplain SELECT * from Idc_w Ork_order_main WHERE atomic_order_id = ' 9956 ' #添加主键 (unique) ALTER TABLE idc_work_order_main ADD PRIMARY KEY source_creator (sou Rce_creator) #添加唯一索引ALTER TABLE idc_work_order_main ADD UNIQUE source_creator (source_creator) SHOW INDEX from Idc_work_ order_main# Add a federated index alter TABLE idc_work_order_main add index id_source_parent_create_atomic (id,source_creator,parent_ id,gmt_create,atomic_order_id) SHOW INDEX from Idc_work_order_main

About indexes:

1. A book is half a book, and the Catalogue (index) has meaning? Too many indexes can cause the index file to be too large (exponential growth), and the system will increase the query time when addressing.
2. Gender field two for men and women, plus index pure waste. An index increases the I/O at update or INSERT, which is very lossy for the underlying operating system.

3. mysql is the B + Tree index first, and the benefit of this index is that it is possible to make LOGN-level lookups of ordered records, but it is better to build a hash index for data with no size, because the time complexity of the hash index is basically log1. (Note the concept of order and disorder here.)

4. Naming rules of the index: Table name _ Field name, fields that need to be indexed, in the Where condition, a field with a small amount of data does not need to be indexed, if the Where condition is an or relationship, the index does not work, conforms to the leftmost principle.

4. Use of index and key in indexes

Key is the physical structure of the database, at the model level, it contains two levels of meaning and function, one is the constraint (emphasis on the structure integrity of the constraint and canonical database), and the second is the index (auxiliary query). Includes primary key, unique key, foreign key, and more.

Primary key has two functions, one is the binding function (constraint), used to standardize a storage primary key and uniqueness, but also on this key to establish a Index;unique key also has two functions, one is the binding effect (constraint), Specification data uniqueness, but also on this key to establish a Index;foreign key also has two functions, one is the constraint (constraint), canonical data referential integrity, but also on this key set up an index;

It can be seen that MySQL key is both constraint and index meaning, which may differ from other database performance. Index is the physical structure of the database, at the implementation level, it is only a secondary query, it is created in a separate table space (MySQL InnoDB table space) in a similar directory of the structure of the storage. An index is simply an index, and it does not constrain the behavior of the indexed fields (that's what key does). MySQL Common indexes are: Primary key index, unique index, normal index, full-text index, combined index.

Reference:

1.http://www.data.5helpyou.com/article392.html

2. About an or effect on an index in a Where condition: http://blog.csdn.net/hguisu/article/details/7106159

Summary of usage of indexing and optimization in MySQL

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More