MySQL Index design

Last Update:2018-05-11 Source: Internet

Author: User

Tags delete key one table mysql index

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I. A table of 200 million records, assuming that a record has 200 bytes, query a record using indexes and performance calculations that do not use indexes

1. Assume that the IOPS of the hard disk is 100
2. Index Length: 10 bytes (4-byte index + 6-byte pointer length)
3. Each page size is 16KB and the number of 16k*1024/(6+4) can be stored on a single page =1638.4
4. A page can store a data size of 16k*1024/200=81, or 81 rows of data
5. Find the page to find the data
6. The number of pages required is 1638.4*1638.4=2683044
8. The number of pages multiplied by the number of rows in the table size: 2683044*81=217326564, about 200 million records
9. If in a tree with a height of 3, you need only 3 times to find the required records, need to 3/100s, no index to need 400s?

Two. Binary search-two-point search method

1. Searching for an ordered set
12. Take the middle value, compare with the value you want to find, and if it is equal, find the end
13. If the lookup value is larger than the median value, then the recursive binary lookup is performed from the right until it is found
14. If the lookup value is smaller than the middle, then the recursive binary search is done from the left until you find
15. Find time complexity of LOG2 (n)

Two-point lookup time-consuming: The deepest node needs 4, the fastest node needs 1, and the cost of each node is divided by the number of nodes
Three. Binary search tree-two forks find trees
1. The left node is always smaller than the right node
2. The node on the right is always larger than the left node
3. The left and right child nodes are also a binary search tree
Four. Balanced binary tree-balanced binary trees

1. First, it must be a two-pronged tree.
2. The height difference between the left and right nodes is either 0, or 1, the level difference does not exceed 1, and when the layer is above 1 o'clock, rotation is required to maintain balance.
3. Balanced binary Tree Insert number 3

Five. B-tree-a node can have a balanced multi-fork tree with more than 2 nodes

1.P is a pointer, each node by the keys+1 pointer
2. All keywords appear in the whole tree, and values can be stored on non-leaf nodes
3. No linked list between leaf nodes
4. For M-order B-trees, each node has a maximum of M subtrees tree
5. Root node has at least 2 subtrees trees
6. Outside the root node, each of the remaining branch nodes has at least m/2 subtrees tree
7. All leaf nodes are on the same layer
8. Branch node with K subtrees Tree stores k-1 Key,key arranged in ascending order
The number of 9.key needs to meet Ceil (M/2)-1 <= n <= m-1
10. The higher the tree, the more disk IO is required, and the height is proportional to the number of disk accesses
5.1 New node insertion, M-order B-tree operation with height h
1. Inserted node key does not reach m-1, insert directly

2. Insert node key reaches M-1, and parent node does not reach m-1, insert key, insert node split, intermediate key ascend to parent node

3. Into the node key reached M-1, and the parent node also reached m-1, insert key, insert node split, intermediate key promoted to parent node, parent node split, intermediate key promoted to parent parent node

If the root node has reached the m-1, then the whole tree adds a layer of

5.2 Deleting a node
1. Delete the node or a B-tree, delete it directly

2. After deleting the node, not a B-tree, to the sibling node to borrow key, and the brothers have to borrow the situation, delete key, sibling node move up, parent node down to maintain balance

3. If the sibling node is not borrowing, delete, move the parent node down, delete the spare leaf node, and move the new node up the whole

Six. B+tree

The 1.b+ tree has the advantage of storing data for efficient retrieval at the block-block level, such as file systems
The non-leaf nodes and leaf nodes of the 2.b-tree can store data, but the nodes of the B + Tree store only the key values (indexes) without storing the data, and the leaf node is the real data
The 3.b+ tree has a very high fanout-fan out, which points to other nodes, usually more than 100, thus reducing the number of I/O operations when searching for elements
4. Linked table relationships between leaf nodes
6.1 B + Tree operation
1. Similar to B-tree
2. If the node is full when the value is increased, the value in the node is used as the new index
3. When deleted, the key of the index will not be deleted, and there will be no case of parent keyword downlevel.
4. Leaf node and parent node are not full, insert directly

5. The leaf knot is full, the parent node is not occupied
Split child nodes
Promote the key intermediate value as a new index, such as 60, to the parent node
The left side of the leaf node is the value to the left of the middle value of the original leaf node, such as 50,55
The right side of the leaf node is the index with the middle value and the value of the original leaf node, plus the newly inserted value

6. When the leaf node is full and the parent node is full
Insert a new value at the appropriate position of the leaf node
The leaf node splits, the middle values of the leaf nodes move up, the right side is the value of the new value + the right side of the original leaf node split
The parent node is full and the middle value of the leaf node is inserted
The parent node splits, the middle value of the parent node moves up, and the right side is the value of the parent node's middle value + the right side of the original parent node split
Add one layer to the whole tree until you can keep the balance.

Seven. What is the difference between B-and + + trees?
Different B +-
The number of keywords different branch nodes have m keyword, leaf node also has m, the keyword just play the INDEX function branch node has m child node but only m-1 keyword
Storage location The data of all the leaf nodes stored on the leaf node is linked together with an ordered complete set of data stored on each node, root node + branch node + leaf node
Branching nodes construct different branch nodes to store only the keyword information and branch node pointers, which means that the index Information Branch node not only stores the index, but also stores the data
Querying different data storage locations through index of all nodes to reach the leaf node, each time consistent data can be returned when data is found on any node
Eight. What is the difference between a hash index and a B + Tree index in MySQL?
Using the hashing algorithm, the key value is converted to a hash value, the retrieval time does not need a similar B + tree from the root node to the leaf node to retrieve data, only need one hash algorithm can be located to the data storage location
Different hash B +
Different query process according to the key value after the hash must be located to the data storage location every time you look for different data need to go through all the nodes
Query condition can only satisfy =/in/<=> query and use range query
Sort operation different hash sort and original key value sort inconsistent index order is data order
Indexed columns use a different combination of columns to calculate the hash value, so you cannot use a partial column to find a combined index that can be found with the previous column
Full table scan to avoid different hash values and corresponding row pointers are saved in the hash table, when there is the same hash value, unable to obtain data, require full table scan in some cases, a full table scan will also occur.
Efficiency a lot of hash value is equal, the selection rate is too low efficiency may be less efficient than the B-tree efficiency is the same every time

Nine. What is the difference between a clustered index and a secondary index?
Type Clustered Index Secondary index
Stores data for each node store row all column information node storage column field KEY+PK
Data Lookup Find PK is also found in the corresponding column of all column information bookmark Lookup: First find the Column key field, and then find the PK, through the PK find a row of data to find other column values
Data volume storage Each node stores no more data rows per node store more index information
Efficiency is better suited for sorting scenarios because the sorted range query is more efficient and requires a return table, which may require a sort of consumption performance
The number of a table can have only one clustered index may have multiple secondary indexes

10. mysql's index type
10.1 B + Tree Index
1.Cluster Index Clustered Indexes
Stores the entire record information,
In Oracle, it becomes the Index organization table IoT, data and index are placed in a table, the data is indexed, the index is also the data
2.Secondary Inex, Level Two index, secondary index, Non clustered index non-clustered indexes
As long as the field and primary key of the secondary index are stored, the key value + primary key
3.b+ Tree Index
Clustered index and secondary index data are stored, assuming that the primary key is of type int
Type Clustered Secondary
Index primary key length 4 bytes 4 bytes
Index pointer length 6 bytes 6 bytes
The average row record length assumes that the bytes assumes a bytes
One page size 16k bytes =16384bytes 16k bytes=16384bytes
Average per page usage 70% 70%
Number of fan-out pointers (the size of a page uses 70% divided by index primary key length, index length = primary key + pointer) 16384 * 70%/(4+6) = 1000 pointers 16384 * 70%/(4+6) = 100 pointers
The average number of records that can be stored per page is a full row of data 1638470%/300=35 row data only holds primary key + pointer 1638470%/(4+6) = 1000 rows of data
Layer height is 2 o'clock record number 100035 Records1000 indexes
Number of records when layer height is 3 1000 +35 Records 1000 +1000 indexes
Number of records with a layer height of 4 o'clock 1000 +100035 Records1000 +1000 indexes
10.2 Clustered index and secondary index representation in tables
CREATE TABLE UserInfo (
UserID int NOT NULL auto_increment,
Username varchar (30),
Registdate datetime,
Email varchar (50),
Primary KEY (UserID),
Unique key Idx_username (username),
Key Idx_registdate (Registdate)
);
Three indexes are primary key userid, unique key idx_username, normal index Idx_registdate
Three indexes can be interpreted as logically creating 3 tables, respectively,
1. Primary key Index table with all field information
CREATE TABLE UserInfo (
UserID int NOT NULL auto_increment,
Username varchar (30),
Registdate datetime,
Email varchar (50),
Primary KEY (UserID)
);
2. Idx_username Index Table with only Usernmae field +PK
CREATE TABLE Idx_username (
UserID int NOT NULL,
Username varchar (30),
Primary KEY (Username,userid)
);
3. idx_registdate Index Table with only Registdate field +PK
CREATE TABLE Idx_registdate (
UserID int NOT NULL,
Registdate datetime,
Primary KEY (Registdate,userid)
);
4. A unique index can also be understood as a table with only one column, and when there is no primary key, the unique index is used as the primary key
CREATE TABLE Idx_username_constraint (
Username varchar (30),
Primary KEY (username)
);
10.3 Performance of indexes when inserting data
Start transaction;
INSERT into userinfo values (AAA,BBB,CCC);
INSERT into Idx_username_constraint (BBB);
Insert into Idx_username (BBB,AAA);
Insert into Idx_registdate (CCC,AAA);
Commit
This is why the more indexes are causing the DML operation to slow
10.4 How to use the B+tree index
1.Cardinality Cardinality
Number of records that do not contain unique records
High selection of columns as index
B + Tree index is used to access less data
2.Not use B + Tree index situation not applicable B + trees
The premise is that a secondary index
Requires a lot of random reads
Access more than 20% of all rows
The optimizer may choose sequential traversal instead of an index lookup, and may discard the secondary index using a primary key full table scan
For example, to find 10,000 records, each record cost 0.00003s, you need 30s to find out all
The whole table has 500,000 records, according to the primary key sequence read each time spent 0.00003s, only need to find 15s all

3.Compound Index Federated
Index of multiple column fields, index on (column A,column B)
Column A is all sorted, column B is not
Can use the index, as long as a condition is written in front
SELECT * from t where a=?
SELECT * from t where a=? and b=?
If you can't use the index, write B in the front.
SELECT * from t where b=?
Also use a A, b column, the second column as a sort case, directly in the index arrangement, no need to Filesort
SELECT * from t where a=? ORDER BY B

4.Covering Index Overlay
Samsung Index
First Star: The columns in the WHERE predicate are included in the index, and the columns that start with the index are
Second Star: an indexed column in order BY, not duplicates the where column
Third star: Columns in select are the remaining columns
Select Cno,fname
From Cust
where lanme= ' xxx '
and
City= ' CityName '
Order BY fname;
(LNAME,CITY,FNAME,CNO) or (CITY,LNAME,FNAME,CNO)
Do not need to use the bookmark lookup method, that is, the return table
All the required data is on the column that the index contains.
The use of B in (A, b) can also be used to index the situation: simple statistics record number of records not found
Select COUNT (1) from T where b>=? and b<=?
5.Index with included column contains the index of columns (SQL Server)
Select email from userinfo where username= ' Joe '
Do not need to return the table, because the index contains an email column
MySQL No, you can create more than one table, the table contains an email column
To change time in space, you need to update both tables
11. Index Specification
1. The number of single-sheet indexes is not more than 5
2. The number of fields in a single index does not exceed 5
3. Index name All lowercase
4. Non-unique index naming: Idx_ field name _ Field name, Idx_age_name
5. Unique index naming: Uniq_ field name _ Field name, Uniq_age_name
6. Composite Index recommendation contains all field names, long field name abbreviations, Idx_age_name_add
7. The table must have a primary key, it is recommended to use the unsigned self-increment column as the primary key
8. Unique key consists of 3 following fields, you can use the unique key as the primary key, other cases use the self-increment column or the generator key
9. Prohibit redundant indexes, such as (A,b,c), (A, B)
10. Prohibit duplicate index, such as primary key A, Uniq index A, will occupy disk space, increase maintenance burden
11. Prohibit the use of foreign keys
12.join table query, the data type of the join column must be consistent and indexed
13. Do not index keys on columns with low cardinality, such as gender
14. Read the Lie Jian index with high selection rate, in the combined index, the column with high selection rate is placed in the top
15. Use a prefix index for a string with a prefix index of no longer than 8 characters
16. Not too long the VARCHAR field column key index, preference for prefix index, CRC32/MD5 as index
17. Reasonable creation of a federated index (A,B,C) equivalent to (a), (A, B), (A,B,C)
18. Proper use of overlay index to reduce IO, avoid sorting
12. Summary
12.1 Index of a bit
1. Accelerate data retrieval efficiency
2. Create a Uniqueness constraint index to ensure uniqueness of each row of data in the table
3. Speed up the connection efficiency of tables and tables
4. When using grouping and sorting words memory data retrieval, you can significantly reduce the grouping and sorting events in the query
12.2 Disadvantages of the index
1. Use more physical storage space
2. When the data in the table is added, deleted, modified, the index also needs to maintain the update, reduce the data maintenance efficiency
12.3 Those scenarios suggest creating an index
1. The column that is the primary key, the index column with the unique constraint
2. Often used in Select/where columns
3. Columns that are frequently used in table joins
4. Columns that are often used in order By,group by
12.4 Overwrite the index, through the index data structure, find the required data, do not need to return to the table
12.5 Union Index, put the filter number field in front
12.6 Primary key features, a table can only have one primary key, with the display of the declared primary key, preferably with self-growing primary key
12.7 cannot use the index condition
1. Columns that are not frequently searched
2. Columns with very low cardinality values
3. Long Text field type column
12.8 does not support the function index/Expression index, the index column after the use of functions, the index can not be used, will cause a full table scan
Https://www.cnblogs.com/hanybblog/p/6485419.html
Https://www.cnblogs.com/George1994/p/7008732.html
Https://www.cnblogs.com/eudiwffe/p/6207196.html

MySQL Index design

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More