Database indexes and the use of SQLite Indexes

Source: Internet
Author: User

I haven't written a blog for a long time, and I don't have time. I always feel that my accumulation is lacking. I opened an independent blog foocoder.com and built it on GitHub with octopress. In the future, you may only update your blog here. (Csdn, cnblog, and 51cto are very tired ...)

To use indexes to optimize database data operations, you must clarify several issues:

1. What is an index?
2. indexing principles
3. Advantages and Disadvantages of Indexes
4. When and how to use Indexes
This article explores the role of indexes in database operations.
1. database index Overview

Recall the steps for dictionary searching when I was a child. The indexing and dictionary directory concepts are consistent. The dictionary Directory allows us to find the number of pages of the required content without having to flip the entire dictionary, and then go to that page. The same is true for indexes. indexes are used to sort records by multiple fields. Creating an index for a field in the table creates another data structure, which stores the value of the field. Each value also includes a pointer to the record related to it. In this way, you do not need to query the entire database, which naturally improves the query efficiency. At the same time, the data structure of the index is sorted, so you can perform binary search for it, which is faster.


2. B-tree and Index
Most databases use B-tree or B + tree as storage structures. B-tree indexes are also the most common indexes. First, we will briefly introduce B-tree to enhance the understanding of indexes.
B-tree is a multi-cross balancing tree designed for disks. The true and most accurate definition of B-tree is: A Tree Containing T (t> = 2) balanced Multi-path search tree with multiple keywords. A m-Level B tree meets the following conditions:
1) each node has up to m children;
2) Apart from the root node and leaf node, each other node has at least m/2 children;
3) There are at least two children (unless the tree contains only one node );
4) All leaf nodes are on the same layer. Leaf nodes do not contain any keyword information and can be considered as external nodes;

5) Non-leaf nodes with K keywords exactly contain k + 1 child;


Each node in the B-tree can contain a large amount of keyword information and branches based on the actual situation (of course, it cannot exceed the size of the disk block, depending on the disk drive, generally, the block size is 1 kb ~ 4 K or so); this reduces the depth of the tree, which means to find an element, as long as few nodes read from the external disk into the memory, quickly access the data to be searched. B-the operation time on the tree is usually composed of the disk access time and CPU computing time. Compared with the disk I/O speed, the CPU computing time is negligible, so the meaning of the B-tree is displayed, and the depth of the tree is reduced, which determines the number of I/O reads and writes.
B-tree index is a typical tree structure. Its components are mainly:
1) leaf node: contains entries that direct to the data rows in the table.
2) branch node: the entries in the table point to other branch nodes or leaf nodes in the index.
3) root node: a B-tree index has only one root node, which is actually the branch node at the top of the tree.

As shown in:




 
Each index contains two parts: the value of the index and the pointer to the data page or another index. Each node is an index page that contains multiple indexes.
When you create an index for an empty table, the database allocates an empty index page, which indicates the root node. Before you insert data, the index page is empty. Whenever you insert data, the database creates index entries on the root node ,. When the root node is fully inserted and data is inserted, the root node is split. For example, the data inserted by the root node. (If there are more than four nodes, the system splits the data into two nodes, moves G to the new root node, and places H and N in the New Right child node. :

The root node is filled with 4 nodes

Insert h to split.
The general splitting steps are as follows:
1) create two son nodes
2) data in the original node is roughly divided into two halves and written into two new child nodes.
3) place a pointer to the page node in the heel Node

When you constantly insert data into the table, the pointer to the leaf node in the root node is also fully inserted. When the leaves still need to be split, the root node has no space to create a pointer to the new leaf node. Then the database creates a branch node. With the split of leaf nodes, pointers in the root node point to these branch nodes. As data continues to be inserted, the index will add more branch nodes so that the tree structure becomes such a multi-level structure.


3. Index types

1) clustered index: the physical order of the row in the table is the same as the logic (INDEX) Order of the key value. Because there is only one physical order of data, one table can have only one clustered index. If a table does not have clustered indexes, this table does not have the concept of order, and all new rows will be inserted to the end of the table. For clustered indexes, leaf nodes store data rows without separate data pages. For example, when I was a child, I never looked up the dictionary. In my opinion, the dictionary itself is a directory. For example, if I look up the dictionary, I just need to flip it to the beginning of the P letter and then find e in order. In this way, I can quickly find the word the teacher said and get the praise of the teacher.


2) Non-clustered index: the physical order of the row in the table is irrelevant to the index order. For non-clustered indexes, leaf nodes store index Field Values and pointers to corresponding data pages. The leaf node is next to the data and corresponds to each row of the data page. Sometimes, I don't know what to read in the dictionary, so I have to look for it through the "beginning" of the dictionary directory. At this time, I will find that the sorting in the directory is different from the sorting of the actual body, which is very annoying to me, because I cannot be faster than others, I need to find the word in the directory first, and then find the word in the body based on the page number.


4. query, insert, and delete indexes and Data

1) query. The query operation is the same as the dictionary query operation. When we look for a specified record, the database first looks for the root node, compares the data to be queried with the data of the root node, and then queries the next record through the pointer of the root node, until this record is found. This is a simple binary search process of the Balance Tree. I will not repeat it here. In the clustered index, the data row is found when the page node is found. In the non-clustered index, we still need to read the data page.


2) insert. The insert operation for clustered indexes is complicated. In the simplest case, the insert operation will locate the data page, and then free up space for new data and perform the insert operation. If the data page has no space, you need to split the data page, which is a very resource-consuming operation. For tables with only non-clustered indexes, you only need to insert them at the end of the table. If a clustered index is also included, the insert operation required for the clustered index is also performed.


3) Delete. The data at the bottom of the deleted row is moved up to fill the gap. If the deleted data is the last row of the data page, the data page will be recycled and the pointer on the first and second pages will be changed, the recycled data page will also be reused under specific circumstances. At the same time, for clustered indexes, if there is only one record left on the index page, the record may be moved to the adjacent index table, and the original index page will be recycled. This is not possible for non-clustered indexes, which may result in a situation where only a small amount of data is available on multiple data pages.


5. Advantages and Disadvantages of Indexes
As a matter of fact, the advantages and disadvantages of indexes are clear at a glance.
Advantages:
1) greatly speeding up data retrieval, which is also the main reason for index creation
2) Accelerate the connection between tables, especially in terms of data reference integrity.

3) when using grouping and sorting clauses for data retrieval, it can also significantly reduce the time for grouping and sorting in queries.


Disadvantages:
1) It takes some time to create an index, but it is not a problem. Generally, you only need to build an index once.
2) indexes need to occupy physical space, especially clustered indexes, which requires large space.

3) when adding, deleting, and modifying table data, indexes must be maintained dynamically, reducing the Data Maintenance speed. This is a big problem.


6. Use of Indexes
According to the above analysis, we have a general idea about when to use indexes (if you don't have one, read it again ...). Generally, we need to create an index on these columns:
1) there is no doubt about columns that often need to be searched;
2) multiple columns are frequently queried at the same time, and each column contains duplicate values. You can create a composite index, composite indexes should be used as much as possible to form index overwrite (all the fields contained in the query are included in an index, so we only need to search the index page to complete the query ). At the same time, the leading column of the composite index must be the most frequently used column. For the problem of leading columns, we will discuss the issue later in the usage of SQLite indexes.
3) These columns are often used in connection columns. These columns are mainly foreign keys, which can speed up the connection. The connection conditions should fully consider the tables with indexes .;

4) Create an index on the columns that frequently need to search for a range. Because the index has been sorted, the specified range is continuous. Similarly, it is best to create an index on the columns that frequently need to be sorted.

6) Create an index on the columns that are often placed in the WHERE clause to speed up condition judgment. Note that any operations (such as calculation expressions and functions) on a column in the WHERE clause must be performed on the entire table without using the column index. Therefore, when querying, try to move the operation to the right of the equal sign.


We should not create indexes for the following columns:
1) columns rarely used in queries
2) columns with few non-duplicate data values, such as only 0, 1. Scanning the entire table is usually more effective.
3) for data defined as text, the image data should not be indexed. These fields are not fixed in length. They may be long or empty.
Of course, no index is created when the update operation is much larger than the query operation. You can also consider dropping indexes before large-scale update operations and re-creating indexes. However, you need to consider the resource consumption caused by index creation. In short, using indexes requires balancing input and output to find the best output point.


7. Using indexes in SQLite


1) SQLite does not support clustered indexes. Android requires a _ id field by default, which ensures that the data you insert will be inserted in the integer order of "_ id, the primary key of the integer type will assume the same role as the clustered index. Therefore, do not create an index on the primary key declared as: integer primary key.


2) many friends who are not familiar with the index have created an index in the table but have found that the index does not take effect. In fact, this is mostly related to what I will talk about next. For columns in the WHERE clause to take effect of the index, there are some restrictions, which are related to the leading column. The so-called leading column is the first or multiple consecutive columns in the composite index statement. For example, create index comp_ind on Table1 (X,
Y, z) Create an index. X, XY, and XYZ are all leading columns, but YZ, y, and z are not. As mentioned below, there may be some minor differences between other databases. Here we use SQLite as the standard. In the WHERE clause, the leading column must use the equals or in operation, and the rightmost column can use the inequality, so that the index can take effect completely. At the same time, you do not need to create an index for all the columns in the WHERE clause, but you must ensure that there is no gap between the columns that create the index. Let's take a few examples:


Use the following statement to create an index:
Create index idx_ex1 on ex1 (A, B, C, D, E,..., y, z );
Here is a query statement:
... Where a = 5 and B in (1, 2, 3) and C is null and D = 'hello'
This is obviously effective for the four columns of ABCD, because only the operations equal to and in are performed, and they are the leading columns.
Let's look at another query statement:
... Where a = 5 and B in (1, 2, 3) and C> 12 and D = 'hello'
Here, only the indexes of column A, column B, and column C are valid. The index of column D is invalid because it is on the right of column C, and column C uses an inequality, according to the limitations of the use of inequality, column C already belongs to the rightmost.
Finally, let's look at another article:
... Where B in (1, 2, 3) and C not null and D = 'hello'

The index will not be used, because the leading column is not used, this query will be a full table query.

In fact, in addition to indexes, there are many factors that affect query performance, such as table connection and sorting. More factors need to be considered to affect the overall performance of database operations. It is a great learning to use more appropriate techniques.


Finally, use SQLite on Android to write a simple example to see the impact of indexes on database operations.
Create the following table and index:
Db.exe csql ("create table if not exists T1 (A, B )");
Db.exe csql ("create index if not exists IA on T1 (A, B )");
Insert 0.1 million data records and perform the following operations on the table:
Select * from T1 where a = '20140901'
Insert: insert into T1 (a, B) values ('20170901', 'name1. 123 ')
Update: Update T1 Set B = 'name1. 999999 'Where a = '000000'

Delete: delete from T1 where a = '20140901'


The data is as follows (the average value is obtained for 5 different operations ):
The Operation has no index and has an index.
Query 170 Ms 5 ms
Insert 65 Ms 75 Ms
Update 240 Ms 52 Ms
Delete 234 Ms 78 Ms


It can be seen that the query speed is significantly increased, the insertion speed is slightly reduced, and the data update and deletion speeds are slightly improved. If you replace the column in The WHERE clause in update and delete with B, the speed is the same as that without an index because the index is invalid. Therefore, indexes can greatly increase the query speed. For Delete and update operations, if the columns in the WHERE clause use an index, even if the index needs to be re-built, it may be faster than not using the index. For insert and insert operations, indexing is obviously a burden. At the same time, the index doubles the size of the database.


The rawqurey method in Android returns a cursor after the SQL statement is executed. In fact, a query operation is not completed. I calculate the query time before and after rawquery, always 1 Ms... this makes me very depressed. After reading the source code, when the cursor calls the movetonext mobile cursor methods, the getcount method will be called first, And the getcount method will call the native method to call the real query operation. This design is obviously more rational.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.