Key features of SQLite query optimization

Source: Internet
Author: User
Tags sqlite sqlite query

SQLite is lightweight, after compilation is very small, one of the reasons is that in the query optimization is relatively simple, it is only using the index mechanism to optimize, after the analysis of the query optimization of SQLite and the study of the source code, I will sqlite query excellent summarized as follows:

Factors that affect query performance:

1. The smaller the number of rows in the table, the better

2. Sort or not.

3. Whether to have an index.

4. form of a query statement

Ii. conversion of several query optimizations

1. For a single column of a single table, if all the clauses are tangible, such as t.c=expr, and are concatenated with the OR operator, the shape is as follows: x = expr1 OR expr2 = x or x = EXPR3 at this point it is not possible to use an index in SQLite to optimize it because of OR, so you can convert it Into the clause with the in operator: X in (EXPR1,EXPR2,EXPR3) so that it can be optimized with the index, the effect is obvious, but if there is no index, or statement execution efficiency is slightly better than the efficiency of the in statement.

2. If the operator of a clause is between and cannot be optimized with an index in SQLite, an equivalent conversion is also performed: For example: A between B and C can be converted to: (a between B and C) and (A>=b) and (a< ; =c). In the above clause, (A>=B) and (A<=c) are set to dynamic and are clauses (a between B and C), then if the between statement is encoded, then the clauses are ignored. If there is an available index so that the clause already satisfies the condition, then the parent sentence is ignored.

3. If the operator of a unit is like, then the following conversion will be done: x like ' abc% ', converted to: x>= ' abc ' and x< ' Abd '. Because the like in SQLite is not optimized with an index, if there is an index, then the conversion and non-conversion is far apart, because it does not work, but if there is no index, then like in terms of efficiency is still less than the efficiency of conversion.

Three, the processing of several query statements (compound query)
1. Query statement for:<selecta> <operator> <selectB> ORDER by <orderbylist> ORDER BY
Execution method: is one of the Union all, Union, EXCEPT, or INTERSECT. The execution of this statement is to execute and sort the Selecta and SELECTB first, then scan the two results, which is different for the above four operations, and divide the execution process into seven sub-processes:

OutA: Place a row of the results of the Selecta in the final result set

OutB: Puts a row of the result of the Selecta into the final result set (only the Union and union all operations, other operations are not placed in the final result set)

AltB: When the current record of Selecta is less than the current record of SELECTB

AEQB: When Selecta's current record equals SELECTB's current record

AGTB: When Selecta's current record is greater than SELECTB's current record

Eofa: When the results of Selecta are traversed

EOFB: When the results of SELECTB are traversed

Here are the procedures for the execution of four operations:

Execution order

UNION All

UNION

EXCEPT

INTERSECT

AltB:

OutA, Nexta

OutA, Nexta

Outa,nexta

Nexta

AEQB:

OutA, Nexta

Nexta

Nexta

OutA, Nexta

AGTB:

OutB, NEXTB

OutB, NEXTB

Nextb

Nextb

Eofa:

OutB, NEXTB

OutB, NEXTB

Halt

Halt

EOFB:

OutA, Nexta

OutA, Nexta

Outa,nexta

Halt

2. If possible, a statement using the group by query can be converted to a distinct statement, because the group by can sometimes use index, but not for distinct.

Four, sub-query flattening

Example: Select a from (select X+y as a from T1 where z<100) where a>5

The general default method of executing this SQL statement is to perform an internal query, put the result into a temporary table, and then make an external query to the table, which will be processed two times, and the temporary table is not indexed, so the external query can not be optimized, If the above SQL is processed, you can get the following SQL statement: SELECT x+y as a from T1 where z<100 and a>5, the result is obviously the same as above, but at this point only need to

It is enough to query the data once, and to avoid traversing the entire table if there is an index on the table t1.

Use the Flatten method to optimize the conditions of SQL:

1. Subqueries and out-of-query do not all use Set functions

2. Subquery does not use a set function or an outer query is not a connection to a table

3. Subquery is not a right-hand operand of a left outer join

4. Subqueries do not use distinct or external queries are not a connection to a table

5. Subquery does not use distinct or out of the query does not use Set function

6. Subquery does not use the SET function or the outer query does not use the keyword distinct

7. A subquery has a from statement

8. Subquery does not use limit or outer query is not a table connection

9. Subquery does not use limit or outer query does not use Set function

10. Subquery does not use SET function or external query No limit

11. Subqueries and out-of-query are not both ORDER BY clauses

12. Subqueries and out-of-query are not all using limit

13. Subquery does not use offset

14. The outer query is not part of a composite query or the subquery does not use the keyword ORDER BY and limit

15. The outer query does not have a set function subquery does not contain an order by

16. Flattening the composite subquery: The subquery is not a compound query, or he is a union all compound query, but he is composed of several non-set functions of the query, his parent query is not a query of composite queries, or use SET functions or distinct query, And there are no other tables or subqueries in the FROM statement, and the parent and subquery may contain the where statement, which is limited by the 11, 12, and 13 items above.

Example: SELECT a+1 from (

SELECT X from tab

UNION All

SELECT Y from tab

UNION All

SELECT ABS (z*2) from TAB2

) WHERE a!=5 ORDER by 1

Convert to:

SELECT x+1 from tab WHERE x+1!=5

UNION All

SELECT y+1 from tab WHERE y+1!=5

UNION All

SELECT ABS (z*2) +1 from TAB2 WHERE abs (z*2) +1!=5

ORDER by 1

17. If the subquery is a composite query, all the order by statements of the parent query must be a simple reference to the columns of the subquery

18. Subqueries do not have a where statement with a limit or an out-of-query

Sub-query flattening is implemented by a special function, which is:

static int Flattensubquery (

Parse *pparse,/* Parsing context */

Select *p,/* The parent or outer SELECT statement */

int IFrom,/* Index in p->psrc->a[] of the inner subquery */

int Isagg,/* True if outer SELECT uses aggregate functions */

int Subqueryisagg/* True If the subquery uses aggregate functions */

)

It is implemented in the Select.c file. Obviously for a more complex query, if the above conditions are satisfied when the query statement is flattened, you can achieve the optimization of the query. It would be better if there was an index!

Five, connection query

Before returning the query results, each row of the related table must have been connected, in SQLite, this is implemented with nested loops, in the earlier version, the leftmost is the outermost loop, the rightmost is the most inner loop, when connecting two or more tables, if there is an index in the inner loop, That is, to the last side of the from, because for each row that was selected earlier, if there is a corresponding line, if there is an index will be very fast, if not to traverse the entire table, so that the efficiency is very low, but in the new version, this optimization has been implemented.

The optimization method is as follows:

For each table you want to query, count the index information on the table, first assigning the cost to SQLITE_BIG_DBL (a constant defined by the system):

1) If there is no index, then find the query criteria for ROWID on this table:

1. If there is a rowid=expr, then returns the cost estimate for the table, the cost is zero, the number of records obtained is 1, and the cost estimate of the table is completed,

2. If there is no rowid=expr but there is rowid in (...), and in is a list, the record returns the number of elements in the in list, with an estimated cost of Nlogn,

3. If in is not a list but a subquery result, the specific subquery cannot be determined, so only one value can be estimated, the number of records returned is 100, and the cost is 200.

4. If the ROWID is a range of queries, then it is estimated that all eligible records are one-third of the total record, the total record is estimated to be 1000000, and the estimated cost is the number of records.

5. If the query also requires sorting, then the cost of sorting is added Nlogn

6. If the cost at this point is less than the total cost, then the total cost is updated, otherwise it is not updated.

2) If the OR operator exists in the WHERE clause, then separate all clauses of these or joins for analysis.

1. If a clause is composed of an and connector, then the clauses connected by and are analyzed separately.

2. If the joined clause is in the form of X<OP><EXPR>, then the clause is parsed.

3. The next step is to calculate the total cost of the whole or operation.

4. If the query requires sorting, then the total cost of the above is multiplied by the sort cost Nlogn

5. If the cost at this point is less than the total cost, then the total cost is updated, otherwise it is not updated.

3) If there is an index, the index information for each table is counted, for each index:

1. First find the column number corresponding to this index, and then find the corresponding can be used (the operator must be = or is in (...). )) The WHERE clause of the index, if not found, exits the loop for each index, and if so, what the operator of this clause is, and if it is =, then there is no additional cost, if in (Sub-SELECT), Then it is estimated that the additional cost inmultiplier is 25, if it is in (list), then the additional cost is N (n is the number of columns of the list).

2. Calculate the total cost and the total number of records and the cost of the query results.

3. Nrow = pprobe->airowest[i] * inmultiplier;/* count rows */

4. Cost = Nrow * Estlog (Inmultiplier);/* Statistical costs */

5. If the operator is not found = or is in (...). ) clause, but the scope of the query, then also had to estimate the query results record number is NROW/3, the estimated cost is COST/3.

6. Similarly, if this query requires sorting, then add Nlogn to the total cost above

7. If the cost at this point is less than the total cost, then the total cost is updated, otherwise it is not updated.

4) through the above optimization process, you can get the total cost of a table query (that is, the sum of the above costs), and then the second table to do the same, so that until all the tables in the FROM clause to calculate the cost of their own, and finally take the smallest, which will be the most inner layer of nested loops, In order to get the nesting sequence of the whole nested loop, it is optimal to achieve the optimization goal.

5) So the nesting order of loops is not necessarily consistent with the order in the FROM clause, because index optimizations are used to rearrange the order during execution.

Vi. Index

In SQLite, there are several indexes:

1) Single-column index

2) Multi-column index

3) Index of uniqueness

4) for a primary key declared as an INTEGER PRIMARY key, this column is sorted by default, so although it is not indexed in the data dictionary, it functions like an index. So if you're building an index on this primary key, it's a waste of space and no benefit.

Considerations for Using Indexes:

1) There is no need to index a very small table

2) in a table, if you often do is insert update operation, then the use of the index should be controlled

3) also do not build too many indexes on a table, if you build too much, then in the query SQLite may not choose the best to execute the query, one solution is to build a clustered index

The timing of the index application:

1) Operator: =, >, <, IN, etc.

2) operator between, like, or cannot be indexed,

such as Between:select * from MyTable WHERE myfield between and 20;

You should then convert it to:

SELECT * FROM MyTable WHERE myfield >= and MyField <= 20;

At this point, if you have an index on the MyField, you can use it, greatly improving the speed

Again such as Like:select * from the mytable WHERE myfield like ' sql% ';

It should now be converted to:

SELECT * FROM mytable WHERE myfield >= ' sql ' and MyField < ' sqm ';

At this point, if you have an index on the MyField, you can use it, greatly improving the speed

Again such as Or:select * FROM mytable WHERE myfield = ' abc ' OR myfield = ' xyz ';

It should now be converted to:

SELECT * FROM MyTable WHERE myfield in (' abc ', ' XYZ ');

At this point, if you have an index on the MyField, you can use it, greatly improving the speed

3) Sometimes the index is not available, then you should traverse the full table (program demo)

SELECT * FROM MyTable WHERE myfield% 2 = 1;

SELECT * FROM MyTable WHERE substr (myfield, 0, 1) = ' W ';

SELECT * FROM mytable WHERE length (MyField) < 5;

Key features of SQLite query optimization

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.