Database Principle 2 (storage and indexing)

Source: Internet
Author: User

I. Basic Concepts

On a disk, the unit of storage information is page, and the page size is a DBMS parameter. The typical value is 4 kb or 8 KB.

Clustered index: the sequence of data records is the same or similar to that of data items in a certain index. Otherwise, it is a non-clustered index.

Why can indexes improve performance: the search range can be quickly reduced !!!

Index data structure:

A) hash-based index:

Files are stored in different buckets. One bucket contains one or more pages. The bucket in which a record belongs can be determined by a special function used to search for codes. This function becomes a hash function. A bucket number is provided. A hash-based index structure allows us to retrieve the bucket's home page once or twice.

B) tree-based indexes:

Data items are arranged according to the search code value, and a hierarchical search data structure is maintained to direct the search to the page to which the data item belongs.

Non-clustered hash indexes and tree indexes provide high-speed insertion and deletion performance. However, queries with a large number of matching results are slow. Hash indexes are faster in equivalent search, but they do not support selection of range search conditions.

Ii. index case:

A) design example of clustering index:

Select E. Do

From employees e

Where E. age> 40

Analyze the use of B + tree indexes in age. Whether such indexes are worth building depends on the selection of conditions (popularity ). In fact, if everyone is over 40 years old, this index advantage will be very small. If only 10% of employees are over 40 years old, is this index valid? The answer depends on whether the index is clustered. If the index is not clustered, each qualified employee will spend one I/O. For clustering, you only need to scan 10% of I/O.

Select E. DNO, count (*)

From employees e

Where E. age> 10

Group by E. DNO

Assume that there is a B + tree index on age, and the retrieved records are sorted on DNO. Then the query result is displayed. If almost all employees are over 10 years old, this is not an execution plan. If this index is not clustered, this plan may be the worst.

Suppose whether the index on DNO meets our goal: to use this index to retrieve all records and group them according to the value of DNO, then, the number of records with an age greater than 10 is calculated for each DNO value (this policy only requires records to be grouped and records do not need to be sorted ). Therefore, the validity of this method depends on whether the index is clustered. If it is not clustered, it is possible to perform an I/O for each record. The cost of this solution is amazing.

Conclusion: if the conditions on age are not very selective (that is, the number of records meeting the conditions is more than 80%), a clustered index should be created on DNO. If conditions on age are very selective, you should consider creating indexes on age (not clustered) to quickly narrow the search scope.

Clustering is also important for building indexes on search codes that do not contain candidate codes.

Select E. DNO

From employees e

Where E. holobby = 'stamps'

If many people collect stamps, It is very inefficient to retrieve records through a non-clustered index. Even if no index is available (simply scanning all records may be cheaper), you should consider setting up a clustered index on Hoby.

Let's take a look at how clustering affects the selection of indexes:

Select E. DNO, count (*)

From employees e

Group by E. DNO

The most direct query scheme is to sort employees by DNO, and then calculate the number of employee records for each DNO value. However, if there is a DNO index, you can only scan the index without scanning the record to answer the query.

B) Example of composite code design

A compound search code supports a wider range of queries because it can match more conditions.

For the following query, 20 <age <30 and 3000 <Sal <5000 are returned.

Select E. Eid

From employees e

Where E. Age between 20 and 30

And E. Sal between 3000 and 5000

If the where condition is good, the compound codes in <age, SAL> or <Sal, age> may be helpful for the query. Obviously, this index is a B + tree index instead of a hash index, because the hash index cannot answer the range query.

Then the order of the search code is sometimes quite different:

Select E. Eid

From employees e

Where E. Age = 25

And E. Sal between 3000 and 5000

B + tree indexes of <age, SAL> have good performance, because records are sorted by age and then by Sal. In <Sal, age>, the performance is not so good. Even if the two have the same age value, the return is far away.

Composite indexes are also useful when processing clustered queries:

Select AVG (E. Sal)

From employees e

Where E. Age = 25

And E. Sal between 3000 and 5000

In the <age, SAL> or <Sal, age> index, you can reply to the query only by scanning the index.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.