SQLite is a typical embedded DBMS with many advantages. It is lightweight and small after compilation. One of the reasons is that it is relatively simple in query optimization, it only uses the index mechanism for optimization:
I. Factors affecting query performance:
1. The smaller the number of rows in the table, the better.
2. sorting or not.
3. Check whether an index is required.
4. query statement format
Ii. Several query-optimized Conversions
1. For a single column in a single table, if the clauses such as t.c = expr are tangible, and they are connected by the OR operator, for example:
X = expr1 or expr2 = X or X = expr3 at this timeFor or, indexes cannot be optimized in SQLite, so it can be converted to a clause with the in operator.: X in (expr1, expr2, expr3) so that you can use the index for optimization, the effect is obvious, however, if no index is available, the execution efficiency of or statements is slightly higher than that of in statements.
2.If the operator of a clause is between, indexes cannot be optimized in SQLite., So the corresponding equivalent conversion is also required:
For example, a between B and C can be converted to (a between B and C) and (A> = B) and (A <= C ). In the preceding clause, (A> = B) and (A <= C) are set as dynamic and are (A between B and C) clauses, if the between statement is encoded, the clause is ignored. If an available index makes the clause meet the conditions, the parent clause is ignored.
3. If the operator of a unit is like, convert X like 'abc % 'to x> = 'abc' and x <'abd '.
Because like in SQLite cannot be optimized using indexes, if there is an index, the difference between the conversion and non-conversion is very long, because like does not work, but if there is no index, in terms of efficiency, like is not as efficient as after conversion.
Iii. Processing of several query statements (composite query)
1. the query statement is: <selecta> <operator> <selectb> order by <orderbylist> order
Execution method:
Is one of Union all, union, except T, or intersect.
The execution process of this statement is to first execute and sort select a and select B, and then scan and process the two results. The preceding four operations are different, divide the execution process into seven sub-processes:
Outa: place a row of selecta results into the final result set.
Outb: place a row of selecta results in the final result set (only union and Union all operations are performed, and other operations are not placed in the final result set)
Altb: When selecta's current record is smaller than selectb's current record
Aeqb: when the current record of selecta is equal to the current record of selectb
Agtb: When selecta's current record is greater than selectb's current record
Eofa: After the selecta result is traversed
Eofb: After the selectb result is traversed
The execution process of the four operations is as follows:
Execution sequence |
Union all |
Union |
Except |
Intersect |
Altb: |
Outa, nexta |
Outa, nexta |
Outa, nexta |
Nexta |
Aeqb: |
Outa, nexta |
Nexta |
Nexta |
Outa, nexta |
Agtb: |
Outb, nextb |
Outb, nextb |
Nextb |
Nextb |
Eofa: |
Outb, nextb |
Outb, nextb |
Halt |
Halt |
Eofb: |
Outa, nexta |
Outa, nexta |
Outa, nexta |
Halt |
2. If possible, you can convert a statement that uses the group by query to a distinct statement for query, because the group by statement may sometimes use the index, but does not use the index for distinct.
Iv. flattening subqueries
Example: select a from (select x + y as a from T1 where z <100) where a> 5
The default method for executing this SQL statement is to first execute the internal query, put the result into a temporary table, and then perform external query on the table. This requires two data processing times, in addition, this temporary table does not have an index, so external queries cannot be optimized.
After processing the preceding SQL statement, you can obtain the following SQL statement:
Select x + y as a from T1 where z <100 and a> 5, the result is obviously the same as the preceding one, but you only need to query the data once, in addition, if you have an index on table T1, you can avoid traversing the entire table.
Optimize SQL conditions using the flatten method:
1. subqueries and external queries do not use the set function
2. The subquery does not use a set function or the external query is not a table connection.
3. The subquery is not the right operand of a left Outer Join.
4. The subquery does not use distinct or the external query is not a table connection.
5. The subquery does not use distinct or the external query does not use the set function.
6. The subquery does not use the set function or the external query does not use the distinct keyword.
7. The subquery has a from statement.
8. The subquery does not use limit or the external query is not a table connection.
9. The subquery does not use limit or the external query does not use a set function.
10. The subquery does not use a set function or an external query does not use limit.
11. subqueries and external queries are not both order by clauses
12. The subquery and external query do not use limit
13. The subquery does not use offset.
14. The external query is not part of a composite query or the subquery does not use the order by and limit keywords at the same time.
15. The foreign query does not use a set function subquery and does not contain order.
16. flattened composite subquery: A subquery is not a composite query or a union all composite query, but it is composed of several non-Aggregate functions, his parent query is not a subquery of a composite query, nor does it use a set function or distinct query, and there are no other tables or subqueries in the from statement, parent queries and subqueries may contain where statements, which are limited by the conditions 11, 12, and 13 above.
Example:
SELECT a+1 FROM (
SELECT x FROM tab
UNION ALL
SELECT y FROM tab
UNION ALL
SELECT abs(z*2) FROM tab2
) WHERE a!=5 ORDER BY 1
Convert:
SELECT x+1 FROM tab WHERE x+1!=5
UNION ALL
SELECT y+1 FROM tab WHERE y+1!=5
UNION ALL
SELECT abs(z*2)+1 FROM tab2 WHERE abs(z*2)+1!=5
ORDER BY 1
17. If the subquery is a composite query, all the order by statements in the parent query must be simple references to the subquery columns.
18. The subquery does not use the limit statement or does not have the where statement in the external query. The subquery flat is implemented by a special function. The function is:
static int flattenSubquery(
Parse *pParse, /* Parsing context */
Select *p, /* The parent or outer SELECT statement */
int iFrom, /* Index in p->pSrc->a[] of the inner subquery */
int isAgg, /* True if outer SELECT uses aggregate functions */
int subqueryIsAgg /* True if the subquery uses aggregate functions */
)
It is implemented in the Select. c file. Obviously, for a complicated query, if the query statement is flat when the preceding conditions are met, the query can be optimized. If an index exists, the effect will be better!
V. Connection Query
Before returning the query results, each row of the relevant table must have been connected. In SQLite, this is implemented using nested loops. in earlier versions, the leftmost layer loop is the outermost loop, the rightmost is the inmost loop. When two or more tables are connected, if an index exists, it is placed in the inner loop, that is, it is placed at the end of the from, because for each row selected above, when looking for the corresponding row, if there is an index, it will be very fast. If not, we need to traverse the entire table, which is very inefficient, but in the new version, this optimization has been implemented.
The optimization method is as follows:
For each table to be queried, calculate the index information of the table. First, assign the price to sqlite_big_dbl (a system-defined constant ):
1) if no index is available, check whether the query conditions for rowid in this table are as follows:
1. If rowid = expr exists, an estimate of the cost of the table is returned. The cost is calculated as zero. The number of records obtained from the query is 1, and the cost of the table is estimated,
2. If there is no rowid = expr but there is rowid in (...), and in is a list, the number of records returned is the number of elements in the in list. The estimated cost is nlogn,
3. If the in clause is not a list but a subquery result, the subquery cannot be determined. Therefore, only one value can be estimated. The number of returned records is 100, at a cost of 200.
4. If rowid is a range query, it is estimated that all qualified records are 1/3 of the total records, and the total record is estimated to be 1000000, and the estimated cost is also the number of records.
5. If the query still requires sorting, the cost of sorting is added to nlogn.
6. If the price obtained at this time is less than the total price, the total price will be updated. Otherwise, the price will not be updated.
2) If the WHERE clause contains the OR operator, separate all the clauses connected with or for further analysis.
1. If a clause is composed of an and connector, the clause connected by and is analyzed separately.
2. If the join clause is in the form of x <OP> <expr>, analyze the clause.
3. The next step is to calculate the total cost of the or operation.
4. If this query requires sorting, the above total cost will be multiplied by the sorting cost nlogn.
5. If the price is less than the total price, the total price will be updated. Otherwise, the price will not be updated.
3) If an index exists, the index information of each table is counted. For each index:
1. First find the column number corresponding to this index, and then find the corresponding available (the operator must be = or in (...)) If the WHERE clause of this index is not found, the loop of each index is exited. If it is found, the operator of this clause is determined. If it is =, there is no additional cost, if it is in (sub-select), it is estimated that the additional cost of inmultiplier is 25. If it is in (list), the additional cost is n (n is the number of columns in the list ).
2. Calculate the total cost and total number and cost of query results records.
3. nrow = pprobe-> airowest [I] * inmultiplier;/* Number of calculated rows */
4. Cost = nrow * estlog (inmultiplier);/* Statistical cost */
5. If the operator = or in (…) is not found (...) Clause, but a range query, so we have to estimate the number of query results records as nrow/3, and the estimated cost is cost/3.
6. Similarly, if this query requires sorting, add nlogn to the total cost above
7. If the price obtained at this time is less than the total price, the total price will be updated. Otherwise, no updates will be made.
4) through the optimization process above, you can get the total cost of a table query (that is, the sum of the above costs), and then perform the same operation on the second table, in this case, all the tables in the from clause are calculated to calculate their respective costs, and the minimum value is obtained. This serves as the innermost layer of the nested loop, and the nested sequence of the entire nested loop can be obtained in sequence, at this time, it is the best to achieve the goal of optimization.
5) The nested order of the loop is not necessarily the same as the order in the from clause, because index optimization is used to rearrange the order during execution.
Vi. Index
SQLite has the following indexes:
1) Single Column Index
2) Multi-column Index
3) unique index
4) for the statement:
For the primary key of integer primary key, this column is sorted by default, so although no index is generated for it in the data dictionary, its function is like an index. Therefore, if you create an index on this primary key, it will be a waste of space and no benefit.
Notes for using indexes:
1) there is no need to create an index for a small table.
2) If you often perform insert update operations on a table, you must use indexes in a controlled manner.
3) do not create too many indexes on a table. If too many indexes are created, SQLite may not choose the best way to execute the query during the query. One solution is to create a clustered index.
Time to use indexes:
1)Operators: =,>, <, In, etc.
2)The operators between, like, or cannot be indexed,
For example, between: Select * From mytable where myfield between 10 and 20;
In this case, it should be converted:
SELECT * FROM mytable WHERE myfield >= 10 AND myfield <= 20;
In this case, if you have an index on myfield, you can use it, greatly improving the speed.
Another example is like: Select * From mytable where myfield like 'SQL % ';
In this case, convert it:
SELECT * FROM mytable WHERE myfield >= 'sql' AND myfield < 'sqm';
In this case, if you have an index on myfield, you can use it, greatly improving the speed.
Another example is or: Select * From mytable where myfield = 'abc' or myfield = 'xyz ';
In this case, convert it:
SELECT * FROM mytable WHERE myfield IN ('abc', 'xyz');
In this case, if you have an index on myfield, you can use it, greatly improving the speed.
3) In some cases, indexes cannot be used. In this case, we should traverse the entire table (demonstrated by the Program)
SELECT * FROM mytable WHERE myfield % 2 = 1;
SELECT * FROM mytable WHERE substr(myfield, 0, 1) = 'w';
SELECT * FROM mytable WHERE length(myfield) < 5;