Database index, exactly what it is made of

Last Update:2018-09-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Question 1. Why is the database designed to be indexed?

Library to save 1000W books, to find the "architect of the Road", a check, to find out when to go?

The librarian then devised a set of rules:

On the first floor of the history class, the second floor of literature, third floor put it class ...
It class, sub-software class, hardware class ...
Software classes, sorted by book title Sequencer ...

To find a book quickly.

With the analogy, the database stores 1000W data, to find Name= "Shenjian" records, a search, to find out when to go?

Therefore, an index is used to improve the database lookup speed.

Question 2. The hash (hash) is faster than the tree, why is the index structure designed to be tree-shaped?

There are two common types of data structures that speed up the search speed:

Hash, such as HashMap, the average time complexity of query/insert/modify/delete is O (1);
Trees, such as the balanced binary search tree, the average time complexity of query/insert/modify/delete is O (LG (n));

You can see that the index of the hash type, whether it's a read request or a write request, is faster than the tree index, so why is the index structure designed to be tree-shaped?

VoiceOver: 80% of the students, the interview can not be answered.

The index is designed to be tree-related and is relevant to the requirements of SQL.

For SQL requirements for such a single-line query:

It is true that the hash index is faster because only one record is queried at a time.

VoiceOver: So, if the business requirements are single-line access, such as passport, you can actually use a hash index.

But for the SQL requirements of the sort query:

Group: GROUP BY
Sort: ORDER BY
Compare:<, >
...

A hash index, the time complexity degrades to O (n), and the "ordered" nature of the tree is still able to maintain the efficiency of O (log (n)).

Any design that is out of demand is bullying.

To say the word, InnoDB does not support hash indexes.

Question 3. Why do database indexes use B + trees?

In order to maintain the integrity of the knowledge system, a brief introduction of the next several trees.

1. The first type: two fork search tree

Binary search tree, for example, is the most well-known data structure, do not expand the introduction, why it is not suitable for database index?

When the amount of data is large, the height of the tree will be higher, the amount of data is large, the query will be relatively slow;
Each node stores only one record, which may cause a query to have multiple disk IO at a time;

The tree often appears in university textbooks, so it is most well known.

2. Second type: B-Tree

B-Tree, for example, it is characterized by:

is no longer a two-fork search, but M-fork search;
Leaf nodes, non-leaf nodes, all store data;
The middle sequence traversal, can obtain all nodes;

Outside the voice, really do not want to introduce this feature: non-root node contains the number of keywords J satisfied, (┌m/2┐)-1 <= J <= m-1, node splitting to meet this condition.

The B-Tree is created as a data structure for indexing, because it makes perfect use of the "local principle".

(1) What is the principle of locality?

The logic of the local principle is this:

Memory read-write block, disk read and write slow, and much slower;
Disk pre-reading: Disk read-write is not read on-demand, but read by the page, read one page of data at a time, load more data, if the future to read the data on this page, you can avoid future disk IO, improve efficiency; (voiceover: Typically, a page of data is 4K.) )
Local principle: Software design to follow the "data read set" and "use to a data, the approximate rate will use the data near it", so that disk pre-reading can fully improve disk IO;

(2) Why is B-tree suitable for indexing?

Because of the M fork, the height can be greatly reduced;
Each node can store J records, if the node size is set to the page size, such as 4 K, can fully utilize the pre-read features, greatly reduce disk IO;

Third type: B + Tree

B + trees, such as the M-fork search tree, have made some improvements on the basis of the tree:

Non-leaf nodes no longer store data, and the data is stored only on leaf nodes of the same layer; (voiceover: B + Tree has the same path length as the root to each node, and the second tree is not.) )
Among the leaves, the chain list is added, and all nodes are retrieved, and the sequence traversal is no longer required;

These improvements have made the B + tree more superior than that of the tree:

Range lookup, positioning min and Max, the middle leaf node, is the result set, not the middle sequence backtracking; (voice-over: Range queries are used much in SQL, which is the biggest advantage of B + trees over the tree. )
Leaf node stores the actual record line, the record line relatively tight storage, suitable for large data volume disk storage, non-leaf node storage record PK, for query acceleration, suitable for memory storage;
A non-leaf node, which does not store the actual record, but stores only the key of the record, the B + tree can store more indexes in the same memory;

Finally, to quantify, why is the B + tree of M fork significantly lower than the height of the binary search tree?

Probably calculate:

(1) The principle of locality, the size of a node is set to one page, a page 4K, assuming a key has 8 bytes, a node can store 500 keys, that is j=500

(2) M fork tree, probably m/2<= J <=m, that can be almost 1000 fork tree

So:

One-layer tree: 1 nodes, 1*500 key, size 4K
Two-tier tree: 1000 nodes, 1000*500=50w key, size 1000*4k=4m
Three-layer tree: 1000*1000 nodes, 1000*1000*500=5 billion key, size 1000*1000*4k=4g

Voice: Well, help to see whether it is wrong.

As you can see, storing large amounts of data (500 million) does not require too high tree depth (height 3), and the index is not too much memory (4G).

Summarize

(1) database index for accelerated query

(2) Although the hash index is O (1), the tree index is O (log (n)), but SQL has a lot of "ordered" requirements, so the database uses a tree-type index

(3) InnoDB does not support hash indexes

(4) The idea of data pre-reading is that disk reads and writes are not read on-demand, but read-by-page, read one page at a time, and load more data at a time to reduce disk IO in the future

(5) Local principle: Software design to follow the "data read set" and "use to a data, the approximate rate will use the data near it", so that disk pre-reading can fully improve the disk IO

(6) Index of the database most commonly used B + trees:

Very suitable for disk storage, can take full advantage of the principle of local, disk pre-reading;
Very low tree height, capable of storing large amounts of data;
The index itself occupies very little memory;
Can be very good support of single-point query, range query, ordered query;

Database index, exactly what it is made of

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Database index, exactly what it is made of

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Database index, exactly what it is made of

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support