I. Introduction:
Although the SQLite database is composed of a single file, some hidden temporary files exist in the SQLite runtime. These temporary files exist for different purposes, developers are transparent, so we do not need to pay attention to their existence during the development process. Even so, if we can have a good understanding of the generation mechanism and application scenarios of these temporary files, then we will apply them in the future.ProgramOptimization and maintenance are very helpful. SQLite mainly generates the following seven types of temporary files:
1). roll back the log.
2). master database log.
3). SQL statement log.
4). Temporary database files.
5). Temporary persistent files for views and subqueries.
6). Temporary index file.
7) the temporary database file used by the VACUUM command.
Ii. Details:
1. rollback log:
SQLite creates the temporary file at the beginning of the transaction to ensure atomic commit and rollback of the transaction. This file is always located in the same directory as the database file. The file name format is database file name +" -journal ". In other words, if the temporary file does not exist, SQLite cannot guarantee the integrity of the transaction and the consistency of the data status when the system running the program fails. This file will be deleted immediately after the transaction is committed or rolled back.
when a program is started, SQLite will find the temporary file in the process of opening the database file. We call this log file "Hot journal ". SQLite will restore the database based on the file before the database is successfully opened, to ensure that the database data is restored to the status before the previous transaction starts.
In SQLite, you can modify journal_mode Pragma , SQLite adopts different strategies for maintaining the file. By default, the value is Delete , that is, the log file is deleted after the transaction ends. The persist option does not delete the log file, but clears the header of the rollback log file to avoid disk overhead caused by file deletion. Then there is the off option value, which indicates that SQLite does not generate a rollback log file when starting a transaction. In this way, if a system failure occurs, SQLite cannot guarantee database data consistency.
2. master Database Log:
In SQLite, if transaction operations act on multiple databases, attach them to the database in the current connection using the attach command, then SQLite will generate the log file of the primary database to ensure that the changes caused by transactions remain atomic between multiple databases. Like the log rollback file, the log file of the primary database is also located in the directory where the primary database file is currently connected. The file name format is primary database file name + random suffix. This file contains the name of the attached database that the current transaction will change. After a transaction is submitted, this file is also deleted by SQLite.
the log file of the primary database can be created only when a transaction operates multiple databases simultaneously (the primary database and the attached database. Using this file, SQLite can implement the atomicity of transactions across multiple databases. Otherwise, it can only ensure the State consistency in each single database. In other words, if the system crashes or the host crashes during the execution of the transaction, if the file does not exist during data recovery, this will cause some SQLite databases to be submitted, and the other part to be rolled back, so the consistency of the transaction will be broken.
3. SQL statement log:
In a large transaction, SQLite creates an SQL statement log file at the beginning of the transaction to ensure that some data can be rolled back normally when an error occurs. For example, the update statement modifies the first 50 data records. However, when you modify 51st data records, it is found that this operation will undermine the uniqueness constraint of a field, in the end, SQLite will have to use this log file to roll back the first 50 data records that have been modified.
The SQL statement log file can be created only when the insert or update statement modifies multiple rows of records. At the same time, these operations are very likely to break some constraints and cause exceptions. However, if the insert or update statement is not included in begin... in commit, no other SQL statements are running on the current connection. In this case, SQLite will not create the SQL statement log file, instead, you can simply perform undo operations on some data by rolling back logs.
Unlike the above two temporary files, SQL statement log files do not have to be stored in the same directory as database files, and their file names are also randomly generated. The disk space occupied by the file depends on the number of records to be modified by the update or insert statement. After a transaction ends, the file will be automatically deleted.
4. Temporary database files:
When you use the "Create temp table" syntax to create a temporary data table, the data table is only visible in the current connection. After the current connection is closed, the temporary table disappears. However, during the life cycle, temporary tables, together with their related indexes and views, will be stored in a temporary database file. This temporary file is created when "create temp table" is executed for the first time. After the current connection is closed, the file will be automatically deleted. Finally, it must be noted that the temporary database cannot be executed with the detach command, nor can it be executed by other processes with the attach command.
5. Temporary persistent files for views and subqueries:
In many queries that contain subqueries, The SQLite executor splits the query statement into multiple independent SQL statements and persists the results of the subquery to a temporary file, then, we associate the data in the temporary file with the external query, so we can call it a materialized subquery. In general, the SQLite optimizer tries its best to avoid the materialized behavior of subqueries, but sometimes this operation cannot be avoided. The disk space occupied by the temporary file depends on the number of data retrieved by the subquery. After the query, the file will be automatically deleted. See the following example:
Select * From ex1 where ex1.a in (select B from ex2 );
In the preceding query statement, the results of select B from ex2 IN THE subquery will be persisted to the temporary file. during the running of the external query, the temporary file will be checked for each record, to determine whether the current record appears in a temporary file. If yes, the current record is output. Obviously, the above actions will produce a large number of Io operations, which significantly reduces the query execution efficiency. In order to avoid the generation of temporary files, we can change the preceding query statement:
Select * From ex1 where exists (select 1 from ex2 where ex2. B = ex1.a );
For the following query statement, if SQLite does not perform any smart rewrite operations, the subquery in the query will also be persisted to the temporary file, for example:
Select * From ex1 join (select B from ex2) as t on T. B = ex1.a;
After SQLite automatically changes it to the following statement, no temporary files will be generated, for example:
Select ex1. *, ex2. B from ex1 join ex2 on ex2. B = ex1.a;
6. Temporary index file:
When a query statement contains the following SQL clauses, SQLite creates a temporary index file to store intermediate results, for example:
1). Order by or group by clause.
2) the distinct keyword in the clustered query.
3) Multiple SELECT query statements connected by union, except T, and intersect.
It should be noted that if an index already exists in the specified field, SQLite will not create this temporary index file, but will access the data and extract useful information by directly traversing the index. If no index exists, you need to store the sorting results in the temporary index file for later use. The disk space occupied by the temporary file depends on the number of sorted data. After the query, the file is automatically deleted.
7. The temporary database file used by the VACUUM command:
The vacuum command will first create a temporary file at work, and then write the whole database to the temporary file. Then copy the contents of the temporary file to the original database file and delete the temporary file.
The disk space occupied by the temporary file does not exceed the size of the original file.
Iii. Related compilation parameters and commands:
For SQLite, when necessary, SQLite will write the rollback logs, master database logs, and SQL statement log files to disk files, but for other types of temporary files, SQLite can store them in memory to replace disk files, so that a large number of Io operations can be reduced during execution. The optimization mainly depends on the following three factors:
1. sqlite_temp_store:
This parameter is Source code In the macro definition (# define), the value range is 0 to 3 (the default value is 1), see the following description:
1) when the value is equal to 0, temporary files are always stored on the disk, instead of setting the temp_store Pragma command.
2) When the value is equal to 1, the temporary file is stored on the disk by default, but this value can be overwritten by the temp_store Pragma command.
3) when the value is equal to 2, temporary files are stored in the memory by default, but this value can be overwritten by the temp_store Pragma command.
4) when it is equal to 3, temporary files are always stored in the memory, and the settings of the temp_store Pragma command are not considered.
2. Run the command temp_store Pragma:
The value range of this command is 0 to 2 (the default value is 0). When the program is running, this command can be dynamically set. See the following description:
1) when the value is 0, the storage behavior of temporary files is completely determined by the sqlite_temp_store compilation period parameter.
2) When the value is equal to 1, if the sqlite_temp_store parameter during the compilation period specifies to use memory to store temporary files, this command overwrites this line and uses disk storage.
2) When the value is equal to 2, if the sqlite_temp_store parameter during the compilation period is specified to use a disk to store temporary files, the command overwrites this line and uses memory storage.
3. Temporary File Size:
For the above two parameters, the parameter values indicate that the default value is stored in the memory. Only when the size of the temporary file exceeds a certain threshold will Algorithm To write part of the data to the disk, so as to avoid the temporary file occupying too much memory and affecting the execution efficiency of other programs.
Finally, I will repeat it again. The sqlite_temp_store compilation period parameters and the command during the run of temp_store Pragma will only affect the storage policies for temporary files other than the rollback log and the master database log. In other words, the rollback log and the primary database log will always write data to the disk, instead of focusing on the values of the above two parameters.
4. Other optimization strategies:
Because the page cache buffer optimization mechanism is used in SQLite, even if temporary files are stored on disks, it is also possible that SQLite will refresh the file to the disk file only when the file size increases to a certain size. Until then, these files will still reside in the memory. This means that in most scenarios, if the data volume of temporary tables and temporary indexes is relatively small, they will not be written to the disk, and of course there will be no IO events. They are refreshed to the disk files only when they are not supported by the memory. The sqlite_default_temp_cache_size parameter can be used to specify the cache pages occupied by temporary tables and indexes before being refreshed to the disk file. The default value of this parameter is page 500.