Database Principle 2 (storage and indexing)

Last Update:2018-12-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I. Basic Concepts

On a disk, the unit of storage information is page, and the page size is a DBMS parameter. The typical value is 4 kb or 8 KB.

Clustered index: the sequence of data records is the same or similar to that of data items in a certain index. Otherwise, it is a non-clustered index.

Why can indexes improve performance: the search range can be quickly reduced !!!

Index data structure:

A) hash-based index:

Files are stored in different buckets. One bucket contains one or more pages. The bucket in which a record belongs can be determined by a special function used to search for codes. This function becomes a hash function. A bucket number is provided. A hash-based index structure allows us to retrieve the bucket's home page once or twice.

B) tree-based indexes:

Data items are arranged according to the search code value, and a hierarchical search data structure is maintained to direct the search to the page to which the data item belongs.

Non-clustered hash indexes and tree indexes provide high-speed insertion and deletion performance. However, queries with a large number of matching results are slow. Hash indexes are faster in equivalent search, but they do not support selection of range search conditions.

Ii. index case:

A) design example of clustering index:

Select E. Do

From employees e

Where E. age> 40

Analyze the use of B + tree indexes in age. Whether such indexes are worth building depends on the selection of conditions (popularity ). In fact, if everyone is over 40 years old, this index advantage will be very small. If only 10% of employees are over 40 years old, is this index valid? The answer depends on whether the index is clustered. If the index is not clustered, each qualified employee will spend one I/O. For clustering, you only need to scan 10% of I/O.

Select E. DNO, count (*)

From employees e

Where E. age> 10

Group by E. DNO

Assume that there is a B + tree index on age, and the retrieved records are sorted on DNO. Then the query result is displayed. If almost all employees are over 10 years old, this is not an execution plan. If this index is not clustered, this plan may be the worst.

Suppose whether the index on DNO meets our goal: to use this index to retrieve all records and group them according to the value of DNO, then, the number of records with an age greater than 10 is calculated for each DNO value (this policy only requires records to be grouped and records do not need to be sorted ). Therefore, the validity of this method depends on whether the index is clustered. If it is not clustered, it is possible to perform an I/O for each record. The cost of this solution is amazing.

Conclusion: if the conditions on age are not very selective (that is, the number of records meeting the conditions is more than 80%), a clustered index should be created on DNO. If conditions on age are very selective, you should consider creating indexes on age (not clustered) to quickly narrow the search scope.

Clustering is also important for building indexes on search codes that do not contain candidate codes.

Select E. DNO

From employees e

Where E. holobby = 'stamps'

If many people collect stamps, It is very inefficient to retrieve records through a non-clustered index. Even if no index is available (simply scanning all records may be cheaper), you should consider setting up a clustered index on Hoby.

Let's take a look at how clustering affects the selection of indexes:

Select E. DNO, count (*)

From employees e

Group by E. DNO

The most direct query scheme is to sort employees by DNO, and then calculate the number of employee records for each DNO value. However, if there is a DNO index, you can only scan the index without scanning the record to answer the query.

B) Example of composite code design

A compound search code supports a wider range of queries because it can match more conditions.

For the following query, 20 <age <30 and 3000 <Sal <5000 are returned.

Select E. Eid

From employees e

Where E. Age between 20 and 30

And E. Sal between 3000 and 5000

If the where condition is good, the compound codes in <age, SAL> or <Sal, age> may be helpful for the query. Obviously, this index is a B + tree index instead of a hash index, because the hash index cannot answer the range query.

Then the order of the search code is sometimes quite different:

Select E. Eid

From employees e

Where E. Age = 25

And E. Sal between 3000 and 5000

B + tree indexes of <age, SAL> have good performance, because records are sorted by age and then by Sal. In <Sal, age>, the performance is not so good. Even if the two have the same age value, the return is far away.

Composite indexes are also useful when processing clustered queries:

Select AVG (E. Sal)

From employees e

Where E. Age = 25

And E. Sal between 3000 and 5000

In the <age, SAL> or <Sal, age> index, you can reply to the query only by scanning the index.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Database Principle 2 (storage and indexing)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Database Principle 2 (storage and indexing)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support