Tags: simple orange Returns a pointer to specify the improved picture basicIntroduction
In this article, there is a query operation for a 41G size database that contains millions records, and if you take advantage of the index, you can reduce the operation time from 37s to 0.2s.
So what is an index? What is the rationale for using indexes to speed up database query operations?
Database provides a persistent way of data storage, querying the database from the database is a basic operation, the efficiency of the query operation is very important.
For query operations, the efficiency of query operations can be greatly improved if the data being queried is organized in some way.
In a database, there are many columns for a record. If these records are organized in a data structure according to column Col1, then the column Col2 must be out of order.
Therefore, in addition to the original data, the database maintains a data structure that satisfies a particular lookup algorithm, pointing to the original data, called an index .
For example, in the following figure, the database has two columns Col1, Col2. When stored, the rows are organized by column Col1, such as Col1, which is organized in binary tree mode. If you look for a value in col1, you can use a binary tree for binary lookups, and you do not need to traverse the entire database.
So the column Col2 is disorderly. To solve this problem, an index was established for Col2, which organizes the Col2 according to a data structure (here is a two-fork tree). Look for column Col2 this way you only need to do a two-point lookup.
Because the database is stored on disk, the data structures that implement the reference are stored on disk. Disk IO is an issue to be aware of.
B-Tree is a variant of the two-fork tree, the main difference is that each node can be more than 2 degrees, that is, each node can be divided into a lot of forks, greatly reducing the depth of the tree.
These features greatly reduce the depth of the B + tree and enable an orderly organization of the data.
B + Tree
The B + tree is an extension of the tree, characterized in that non-leaf nodes do not store data and store only key. If the size of each node is fixed (such as 4k, as in SQLite), you can further increase the degree of internal nodes and reduce the depth of the tree.
B + tree with sequential access pointers
A little change in the B + tree, each leaf node adds a pointer to the adjacent leaf node, which can improve the performance of the interval access.
, access key from 15 to 30 of data.
Find data by index in two steps
A full table scan was avoided with two B + tree lookups.
1. 对某一行或某几行添加PRIMARY KEY或UNIQUE约束，那么数据库会自动为这些列创建索引2. 指定某一列为INTEGER PRIMARY KEY，那么这一列和rowid被指定为同一列。即可以通过rowid来获取，也可以通过列名来获取。
The following is a statistical information for a table in a database, which is obtained through the Sqlite3_analyzer tool.
You can see a total of 3,651 records in the table, the depth of the B-tree is only 2, there are 33 leaf nodes, 1 non-leaf nodes. Therefore, a maximum of 2 disk IO is required to find a row of data based on the ROWID.
For example, we have such a watch.
The query statements are as follows
SELECT price FROM fruitsforsale WHERE fruit=‘Peach’
Because there is no index, you have to do a full table scan. Iterate through each record (record) with sequential access pointers, compare whether the fruit column and ' peatch ' are consistent, and if so, return the value of the price column for that row.
Index the ' Fruit ' column
As below, run the same statement, you can find the target column corresponding to the ROWID of 4 according to the index, and then find the corresponding row according to rowID, so as to select the price. All-table lookups are avoided by two B + tree lookups. This is the simplest case.
Multiple index Hits
Indexing is not required to be uique, that is, the key in the index table can be the same.
For example, there are two records in the Index table, and
orange when the first record is found, the next index can be easily found by accessing the pointer sequentially, avoiding another B + tree lookup. (Rowid=1 and rowid=23 may be located in two different leaf nodes)
That is, the process of finding the index can be done with a B + tree check and a next operation, and the next operation is fast.
Accelerate search and sequencing with indexes
In most cases, we need to find and sort operations at the same time, and if an appropriate index is established, the search efficiency can be improved.
For example, the fruit and state two columns are indexed in the following table, and when you run the following SQL statement, you do not need to sort operations because the index table is sequential.
SELECT price FROM fruitforsale WHERE fruit=‘Orange‘ ORDER BY state
There is a command called in SQLite to
explain query plan see how SQLite performs the find operation. The following database statement is not a query statement in the introduction, the same principle
37s operation (no index)
0.2s operation (indexed)
Note the detail column. Without indexing, the word "scan" is used, which is the full table scan. When using the index, the word "SEARCH" is used.
For a 41G table, the cost of a full table scan is obviously significant.
The principles of SQLite indexing