Database index learning and database index

Source: Internet
Author: User

Database index learning and database index
I. Index Introduction

Database index optimization is a technical activity and cannot rely solely on theory. Because the actual situation is ever-changing, and various databases have complicated mechanisms, such as query optimization policies and implementation of various engines. At the same time, these theories are the basis of index optimization. Only by understanding the theory can we reasonably infer the optimization strategy and understand the mechanisms behind it, then, we will continue to experiment and explore in practice.

In fact, the index is in addition to the database's data storage, it also maintains another data structure that meets a certain query algorithm. These data structures then point to the real storage data in some way, then, you can quickly query the existing data structure. (I found a very good Binary Tree Graph to understand the index. Currently, most of the indexes are B + TREE STRUCTURES AND basically do not use binary trees ).



Ii. database operation time is occupied by I/O operations

The index itself is also very large, and it is impossible to store all the indexes in the memory. It is often stored on the disk in the form of index files. Therefore, disk I/O consumption is required during index search. Compared with memory access, the I/O access consumption is much higher, therefore, the most important indicator for evaluating a data structure as an index is the complexity of the number of disk I/O operations during the search process. In other words, the structure of the index should minimize the number of disk I/O accesses during the search process. The B + tree structure meets the requirements of reducing physical I/O operations in actual applications, so it is no longer suitable for the index structure (as described below ).


Iii. Random IO and sequential IO

Database queries consume I/O most of the time, and memory I/O is much faster than physical I/O, therefore, the number of physical I/O operations in a query determines the final time of the query. At the same time, the physical I/O types are divided into sequential, random I/O, sequential, and random. It refers to the initial sector address given by this IO and the end sector address of the last IO, whether it is completely continuous or not separated. If yes, this IO should be regarded as a continuous IO. If the difference is too large, it is considered a random IO. Continuous IO, because the initial sector is very close to the last ending sector, the head does not need to change lanes or the transition time is very short. If the difference is too large, the head needs a long transition time, if there are a lot of random I/O, the head will not stop changing, and the efficiency will be greatly reduced.

Random Access featuresThe data of each IO request has a long time span on the disk.(For example, distributed in different sectors), so N very small IO requests can obtain the corresponding data only after n io requests.

Sequential access features the opposite of random access,The requested data is continuous on the disk.. When the system initiates N very small IO requests (for example, 1 K), because one IO has a price, the system will take a complete piece of data (such as 4 K and 8 K), so when the first IO is completed, the data for subsequent IO requests may already exist. This reduces the number of IO requests.

Random Access featuresThe data of each IO request has a long time span on the disk.(For example, distributed in different sectors), so N very small IO requests can obtain the corresponding data only after n io requests.

Sequential access features the opposite of random access,The requested data is continuous on the disk.. When the system initiates N very small IO requests (for example, 1 K), because one IO has a price, the system will take a complete piece of data (such as 4 K and 8 K), so when the first IO is completed, the data for subsequent IO requests may already exist. This reduces the number of IO requests.


IV. B + TREE STRUCTURES commonly used in file systems and database systems

When I understood the query in the database at school, I thought it was a general sequential query. Now I can't think about this algorithm with the complexity of O (n) when the data volume is large, so many awesome query algorithms, such as binary search and binary tree search, have never been well considered in practical applications. I found some materials comparing the B-Tree and B + Tree STRUCTURES (which is a good interpretation of the differences between the two structures ).


1. Main differences between B-Tree and B + Tree:
The non-leaf node of the B + tree only contains the navigation information and does not contain the actual value. All the leaf nodes are connected to the connected nodes using the linked list, which facilitates searching and traversing the intervals.


2. Advantages of B + tree:

  • The B + tree stores only keys instead of data on non-leaf nodes. Therefore, more keys can be stored when loaded into the memory;
  • Non-leaf nodes are especially small. When the output is large, there are few non-leaf nodes, and they can be fully loaded into the memory to reduce IO times;
  • Data key values are stored more closely and have better spatial locality. Therefore, accessing the data associated with the leaves also has a better cache hit rate.
  • The leaf nodes of the B + tree are all chained. Therefore, the convenience of the entire tree only needs to traverse the leaf nodes linearly at a time. In addition, the data is sequentially arranged and connected, so it facilitates range search and can be loaded continuously;
  • Tree B requires recursive traversal of each layer. Adjacent elements may not be adjacent in the memory, so the cache hit is not as good as the B + tree;
  • Tree B also has advantages. Its advantage is that, because every node of Tree B contains keys and values, frequently accessed elements may be closer to the root node, so access is faster;



V. application scenarios of composite indexes

Regular indexes include unique indexes, primary key indexes, and clustered indexes. Sometimes you can create a composite index to improve mysql efficiency and follow the "leftmost prefix" principle. Leftmost Prefix: As the name implies, it is the leftmost prefix. For example, for the table title (id, no, name, age), we have created a multi-column index (no, name, age) composed of three fields, which is equivalent to creating (no) single-Column index, (name, fname) combination index, and (no, name, age) combination index.

I will not talk much about other things. This time I will mainly learn composite indexes, the scenario where the "leftmost prefix" of the composite index is listed below (taking the title of the above table as an example, check whether the execution plan is indexed ):

1. Full-column matching of indexes is supported.

  • When the query condition where no = n and name = B and age = 20 or where no = n and name = B or where no = n, you can enter the index;
  • The SQL query optimizer automatically adjusts the conditional order of the where clause to use the appropriate index. Therefore, the following adjustments do not affect the results, for example: where age = 20 and name = B and no = n can also go to the index;
  • Using the "in" syntax in the condition can also go to the index;


2. The leftmost prefix. A condition in the middle is not provided.

For example, if where no = n and age = 20, the name of the middle column of the index is missing, only the first column no of the index is used. The age is also in the index, the name column cannot be connected to the left prefix because it is missing;


3. the first column of the index is not specified in the query condition.

If no such as: where name = 'jeck' and age = 20 in the first column of the index does not match the leftmost prefix, the index cannot be used.


4.% fuzzy match syntax is used in index fields

If % fuzzy match is used in a column, it depends on the position of % to determine whether the index can be reached. % can be used at the end of the string, for example:
Where no = 1 and name like 'luc % '; however, if % is in front, it will not work, for example, where no = 1 and name like' % cy ';


5. Search by range

If the Range Query of <,> is used in SQL, and only one range column can be indexed, otherwise the index cannot be obtained;
For example, where no <20 and name = 'Lucy ';


6. the query results contain function expressions.

Function expressions cannot be indexed;


Vi. Notes for Indexing

1. The results of filtering a column as a condition are still very large.

For the "gender" attribute, there are only two types of databases: Male and female. At this time, there is no need to use indexes, but the query results will be worse. If the index is not used, it will be a full-Table sequential query (a random IO). However, at this time, the index is taken and a large part of the records of a table are returned, there will be a lot of random IO, And the IO of the operation will be added virtually.

2. Order of combined index fields

Therefore, when using a composite index, the order of the index fields has little impact on the index structure, but it will affect the final sorting. In the end, it will affect whether partial fields will be indexed when used as the query condition.

3. Which scenarios are not suitable for indexing?

For tables with fewer records, increasing indexes will not lead to speed optimization, but will waste storage space, because indexes require storage space, in addition, each execution of update, insert, or delete requires re-calculation of the index of the field;

4. Conditions for Indexing

The index should be as few as possible, the index should be as short as possible, the order of index fields should be reasonably organized, the index Field update frequency should be as few as possible, the index should be a common query field and a small number of records can be quickly located.


For more learning, refer:

Http://www.ituring.com.cn/article/986http://www.cnblogs.com/yangecnu/archive/2014/03/29/Introduce-B-Tree-and-B-Plus-Tree.htmlhttp://blog.codinglabs.org/articles/theory-of-mysql-index.htmlhttp://www.blogjava.net/happyenjoylife/archive/2011/12/17/366639.html

Copyright Disclaimer: This article is an original article by the blogger and cannot be reproduced without the permission of the blogger.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.