SQLite optimization notes
2010/04/30 :16
Recently I 've been tossing a database of dozens of Gbit/s of SQLite databases with dozens of tables and tens of millions of data records. The result is that a select count (*) has not been running for an hour ...... So I optimized some information, and the following is a flow account:
1. Transfer the database from HDD to SSD. Since the iops of SSD is dozens of times that of HDD, some queries can be improved by more than ten times. However, SSD space is limited. It would be nice to store indexes independently ......
Time taken from HDD to SSD, select count:
~ 5000 S limit 546 s
2. Vacuum. This command is used to delete unnecessary space reserved for inserting updates. It is said that it can also clear disk fragments, which can increase the retrieval speed by about two times, but takes a considerable amount of time. SQLite must first create a file that is equal to the size of the database in the temporary folder, and then create a journal file that is equal to the size of the database in the database folder. That is to say, the drive and temporary folder of the database must have sufficient space. You can set the system TMP or use Pragma temp_store_directory to change the location of the Temporary Folder, or use Pragma journal_size_limit to set the maximum number of journal files. The official website says that the vacuum speed is 2 m/s. I tried it and there are at least 4 m/s on the HDD, And the SSD can reach 10 m/s or more, this time is also related to the database's uniformity, but it is still very slow for large databases.
Time spent by select count (*) before and after vacuum:
546 s limit 205 s
3. Set page_size. This situation seems complicated. For small tables, there is almost no difference in page_size, but there can be a gap of five times between large tables. By default, the page_size value of SQLite under the compilation parameter can be 512, 1024, 2048, 4096, 8192, 16384, and 32768. The default value is 1024, which is consistent with the cluster size of Linux, NTFS is 4 K, some people say that setting it to 4 K can improve the performance, but I tried 4 K is almost the worst (in most cases, 2 k is worse ), however, there was a major increase in 8 K, but the difference between 16 K and 32 K was not very long. Generally, the larger the page_size, the faster the speed, and the heavier the System Burden. However, there are also many other factors that affect the size of database files under different page_size conditions, large page_size is not conducive to caching some data in the memory for repeated queries.
Page_size is changed from 1024 to 32768. The time taken by select count:
205 s limit 45 s
4. The cache_size mechanism is also influential, but I tried different cache_size, which is almost the same. Why is the memory insufficient?
5. For general databases, the primary key is faster than the index, but SQLite seems to be an exception, because the primary key seems to exist with the data, reading will waste a lot of time on useless data, especially when there is a huge text data in the table, and the index is stored separately, but it is faster than the primary key, if the data size of a row in a table is large, the gap may even be one hundred times.
Primary Key vs index, time spent by select count:
45 s limit 1 s
The followingProgramCompare the effects of different factors. Create three tables and store the same amount of data. Table A and table B have the same structure, except that table A inserts a short string and table B inserts a large text; table C and table B insert exactly the same data, except that table B has a primary key, while Table C only has an index. -- guess which one is faster? Http://blog.ieph.net/archives/316
About indexes:
When performing a joint query on multiple tables, using indexes can significantly improve the speed. Just now, we used SQLite for a test.
Create three tables:
Create Table T1
(ID integer primary key,
Num integer not null,
Word1 text not null,
Word2 text not null );
Create Table T2
(ID integer primary key,
Num integer not null,
Word1 text not null,
Word2 text not null );
Create Table T3
(ID integer primary key,
Num integer not null,
Word1 text not null,
Word2 text not null );
Create several indexes:
Table T1: composite indexes on num, word1, and word2
T2 table: There is an index on num, word1, and word2.
T3 table: There is an index on word1
Create index idxt1 on T1 (Num, word1, word2 );
Create index idxt2num on T2 (Num );
Create index idxt2word1 on T2 (word1 );
Create index idxt2word2 on T2 (word2 );
Create index idxt3word1 on T2 (word1 );
Insert 10000 rows of data into each of the three tables. The num is a random number, the word1 and word2 are English words, and the corresponding num in the three tables, the word1 and word2 columns contain some identical values, but they appear in different order in the table.
Speed Test results:
1) Select count (*) from T1, T3 where t1.word2 = t3.word2;
Very slow (no index on t3.word2)
2) Select count (*) from T3, T1 where t1.word2 = t3.word2;
Very slow (no independent index on t1.word2)
3) Select count (*) from T1, T2 where t1.word2 = t2.word2;
Very fast (index on t2.word2)
4) Select count (*) from T2, T1 where t1.word2 = t2.word2;
Very slow (no independent index on t1.word2)
5) Select count (*) from T1, T2 where t1.num = t2.num;
Very fast (index on t2.num)
6) Select count (*) from T2, T1 where t1.num = t2.num;
Very fast (in the compound index of T1, the first column is num)
7) Select count (*) from T1, T3 where t1.num = t3.num;
Very slow (no index on t3.num)
8) Select count (*) from T3, T1 where t1.num = t3.num;
Very fast (in the compound index of T1, the first column is num)
Conclusion: In the two tables following the from clause, if the columns to be queried in the 2nd tables contain indexes, the query speed will be faster, and vice versa. For example, in the third query, the first table after "from" is T2, and T2 has an index on word2, so this query is fast. When you enter the SQL command and press enter, the query results are immediately displayed. However, if you use 4th query commands (that is, interchange the positions of T1 and T2), it takes 1 minute to 6 seconds to query.
It can be seen that index creation is very important to improve the speed of database query.
For more information about SQLite query optimization, refer to Chapter 4 of SQLite written by Chris NEWMAN: query optimization.