First, Introduction:
Although the SQLite database is made up of a single file, there are, in fact, some hidden temporary files that exist at the time of the SQLite runtime, which exist for different purposes and are transparent to developers, so we don't need to be concerned about their presence in the development process. However, if you have a good understanding of the mechanisms and scenarios for these temporary files, then it is very helpful for us to optimize and maintain our applications in the future. The following seven types of temporary documents are mainly produced in SQLite, such as:
1). Roll back the log.
2. master database log.
3. SQL statement log.
4. Temporary database files.
5. Temporary persisted files for views and subqueries.
6). Temporary index file.
7. The temporary database file used by the vacuum command.
Second, the specific description:
1. Roll Back log:
SQLite creates the temporary file at the beginning of the thing in order to ensure the atomicity of the things committed and rolled back. This file is always located in the same directory as the database file, with the file name in the format: database filename +-journal. In other words, without the presence of the temporary file, the SQLite will not be able to guarantee the integrity of the object and the consistency of the data state when there is any failure in the system running the program. The file will be deleted immediately after the object has been submitted or rolled back.
While things are running, if the current host is down due to a power failure, and now that the rollback log file has been saved on disk, the next time the program starts, SQLite will find the temporary file in the process of opening the database file, which we call "hot Journal". SQLite completes the recovery of the database based on the file before successfully opening the database to ensure that the data in the database reverts to the state before the last thing started.
In SQLite we can modify the journal_mode pragma to make sqlite adopt a different strategy for maintaining the file. By default, this value is delete, which deletes the log file after the end of the thing. The PERSIST option value will not delete the log file, but instead will roll back the head of the log file to zero, thereby avoiding the disk overhead associated with file deletion. And then there is the off option value, which indicates that SQLite does not produce a rollback log file at the start of the event, so that once a system failure occurs, SQLite can no longer guarantee the consistency of the database data.
2. Primary database log:
In SQLite, if the operation of things works on multiple databases, that is, by attaching to the database in the current connection through the Attach command, SQLite will generate the primary database log file to ensure that changes in things are kept atomic across multiple databases. As with the rollback log file, the primary database log file is located in the same directory as the primary database file in the current connection, in the form of the primary database filename + random suffix. In this file, it will contain the name of the attached database where all the current things will change. After the thing was submitted, the document was SQLite deleted.
The primary database log file can be created only when an object operates multiple databases simultaneously (the primary and attached databases). Through this file, SQLite can achieve the atomic nature of things across multiple databases, otherwise, only a simple guarantee of state consistency within each single database. In other words, if there is a system crash or a host outage during the execution of the thing, in the case of data recovery, if the file does not exist, it will result in a partial SQLite database in the commit state, while the other is in a rollback state, so the consistency of the thing will be broken.
3. SQL statement log:
In a larger thing, SQLite creates an SQL statement log file at the beginning of a thing in order to ensure that some of the data can be rolled back normally when an error occurs. For example, the UPDATE statement modifies the first 50 data, but when you modify the 51st data it finds that the operation will break a uniqueness constraint on a field, and eventually SQLite will have to roll back the first 50 data that has been modified through the log file.
SQL statement log files are only possible to be created when you modify multiple-line records in an INSERT or UPDATE statement, and at the same time they are most likely to break some constraints and throw exceptions. However, if the INSERT or UPDATE statement is not included in the Begin...commit and no other SQL statement is running on the current connection, in which case SQLite will not create the SQL statement log file, but rather simply rollback the log to complete some of the data's undo operations.
Unlike the above two temporary files, SQL statement log files are not necessarily stored in the same directory as the database file, and their file names are randomly generated. The disk space occupied by the file depends on the number of records that will be modified by the UPDATE or INSERT statement. When things are over, the file is automatically deleted.
4. Temporary Database files:
When a temporary datasheet is created using the Create temp table syntax, the datasheet is visible only within the current connection, and the temporary table disappears when the current connection is closed. However, during the lifetime, temporary tables will be stored in a temporary database file along with their associated indexes and views. The temporary file is created the first time the Create TEMP TABLE is executed, and the file is automatically deleted when the current connection is closed. The last thing to note is that the staging database cannot be executed with the detach command, and it cannot be executed by another process attach command.
5. Temporary persisted files for views and subqueries:
In many queries that contain subqueries, the SQLite executor splits the query statement into separate SQL statements, persisting the results of the subquery into a temporary file, then associating the external query based on the data in the temporary file, so we can call it a materialized subquery. In general, the SQLite optimizer tries to avoid the materialized behavior of subqueries, but at some point the operation is unavoidable. The amount of disk space occupied by the temporary file depends on the number of data retrieved by the subquery, which is automatically deleted when the query is finished. See the following example:
Copy Code code as follows:
SELECT * from Ex1 WHERE ex1.a in (select B from EX2);
In the query above, the results of the subquery Select B from EX2 will be persisted to the temporary file, and the external query will check the temporary file for each record at run time to determine whether the current record appears in the temporary file and, if so, to output the current record. Obviously, the above behavior will result in a large number of IO operations, thereby significantly reducing the efficiency of query execution, in order to avoid the generation of temporary files, we can change the query statement above:
Copy Code code as follows:
SELECT * from Ex1 where EXISTS (select 1 from ex2 where ex2.b=ex1.a);
For the following query statement, if SQLite does not do any intelligent rewrite action, the subquery in the query will also be persisted to the temporary file, such as:
Copy Code code as follows:
SELECT * FROM Ex1 JOIN (select B to ex2) as T on t.b=ex1.a;
After SQLite automatically modifies it to the following wording, temporary files will no longer be generated, such as:
Copy Code code as follows:
SELECT ex1.*, ex2.b from Ex1 JOIN ex2 on ex2.b=ex1.a;
6. Temporary index file:
When a query statement contains the following SQL clause, SQLite creates a temporary index file for storing intermediate results, such as:
1). Order BY or GROUP BY clause.
2. The DISTINCT keyword in the clustered query.
3. Multiple select query statements connected by union, except, and intersect.
It is important to note that if an index already exists on the specified field, then SQLite will not create the temporary index file, but instead accesses the data and extracts useful information by traversing the index directly. If there are no indexes, the results of the sort need to be stored in the temporary index file for later use. The amount of disk space occupied by the temporary file depends on the number of sorted data, which is automatically deleted when the query is finished.
7. Temporary database files used by the vacuum command:
The vacuum command creates a temporary file at work and then writes the entire rebuilt database to the temporary file. The contents of the temporary file are then copied back to the original database file, and the temporary file is finally deleted.
The temporary file does not occupy more disk space than the size of the original file.
Third, the relevant compile-time parameters and directives:
For SQLite, rollback logs, primary database logs, and SQL statement log files are written to disk files when required SQLite, but for other types of temporary files, SQLite can be stored in memory to replace disk files. This can reduce a lot of IO operations during execution. The key to completing this optimization depends on the following three factors:
1. Compile-time parameter sqlite_temp_store:
This parameter is the macro definition (#define) in the source code, and its value ranges from 0 to 3 (the default is 1), as described below:
1. Equal to 0 o'clock, temporary files are always stored on disk, regardless of the settings of the temp_store pragma directive.
2. Equal to 1 o'clock, temporary files are stored on disk by default, but the value can be overwritten by temp_store pragma directives.
3. Equal to 2 o'clock, temporary files are stored in memory by default, but the value can be overwritten by temp_store pragma directives.
4. Equal to 3 o'clock, temporary files are always stored in memory, regardless of the settings of the temp_store pragma directive.
2. Run-time instruction Temp_store Pragma:
The directive takes a range of 0 to 2 (the default is 0), which can be set dynamically when the program is run, as shown in the following:
1. Equal to 0 o'clock, the storage behavior of temporary files is determined entirely by the Sqlite_temp_store compile-time parameters.
2. Equals 1 o'clock, if the compile-time parameter sqlite_temp_store specifies that memory is used to store temporary files, the directive overwrites this behavior using disk storage.
2. Equals 2 o'clock, if the compile-time parameter sqlite_temp_store specifies that temporary files are stored using disk, the directive overrides this behavior and uses memory storage.
3. Size of the temporary file:
For the above two parameters, all have parameter values to indicate that the default is stored in memory, and only if the size of the temporary file exceeds a certain threshold, some data will be written to disk, lest the temporary files occupy too much memory and affect the execution efficiency of other programs.
In the end, sqlite_temp_store the compile-time parameters and the temp_store pragma runtime directives only affect the storage policy for temporary files other than the rollback log and the primary database log. In other words, rolling back logs and the primary database log will always write data to disk without paying attention to the values of the two parameters above.
Iv. Other optimization Strategies:
The buffer optimization mechanism for page cache is used in SQLite, so even if temporary files are specified on disk, they will only be able to be flushed to disk files after the size of the file has grown to a certain size, which would still reside in memory until it is sqlite. This means that for most scenarios, if temporary tables and temporary indexes have relatively small amounts of data, they will not be written to disk, and of course no IO events will occur. They are flushed to the disk file only if they grow to a point where the memory is not able to fit. The sqlite_default_temp_cache_size compile-time parameter can be used to specify that temporary tables and indexes need to be flushed to a disk file when the CACHE page is occupied, with a default value of 500 pages.