SQLite profiling (6): temporary files and memory databases

Source: Internet
Author: User

1. Seven temporary files
The content is organized from http://sqlite.org/tempfiles.html.
A different feature of SQLite is that a database consists of a single disk file. This simplifies the use of SQLite, because you only need to copy a single file to a mobile or backup database. This also makes SQLite suitable for application file format. However, when a database is stored in a single file, SQLite uses many temporary files during database processing.
Currently, SQLite uses seven different types of temporary files:
* Rollback Journals)
* Master Journals)
* SQL statement log (statement journals)
* Temporary database (temp databases)
* View and subquery persistence (materializations of views and subqueries)
* Temporary index (transient indices)
* Temporary database used by vacuum (transient databases used by vacuum)
(1) rollback log
A rollback log is a temporary file used for atomic commit and rollback. The rollback log is always in the same directory as the database file, and the file name is "-Journal" after the database file name ". Rollback logs are usually created at the beginning of a transaction and deleted when a transaction is committed or rolled back. Without rollback logs, SQLite will not be able to roll back an unfinished transaction, and the database will be damaged if the system crashes or power-off occurs at some point in the middle of the transaction execution. Rollback logs are usually created and destroyed at the transaction start point and end point, but there are some exception rules.
If a crash or power failure occurs at a certain time in the transaction, the rollback log is left on the hard disk. The next time another application tries to open a database file, it will notify the existence of a rollback log (we call it "Hot log "), and use the information in the log to restore the database to the State before the start of the unfinished transaction. This is the basic principle for SQLite to implement atomic commit.
If the application uses the command "Pragma locking_mode = exclusive;" to place SQLite in the exclusion lock mode, SQLite creates a new rollback log at the beginning of the transaction with the exclusion lock mode session, rollback logs are not deleted at the end of the transaction. Rollback logs may be reduced, or their headers may be cleared (depending on your SQLite version), but the files will not be deleted, logs will not be deleted until they are rolled back when the access mode is rejected.
You can also use the Pragma command in log mode to create and delete rollback logs. The default log mode is delete, that is, the rollback log is deleted at the end of each transaction. In persist log mode, the log file is discarded, but the header of the log file is cleared to prevent other processes from rolling back the log. Therefore, this is the same effect as deleting the log file, although log files are not actually deleted from the disk. That is to say, the log mode persist shows the same behavior as the exclusive lock mode. The off log mode disallows SQLite to create a rollback log at the beginning. It disables the atomic commit and rollback functions of SQLite and makes the rollback command unavailable. If a transaction in off log mode crashes or loses power at a certain time in the middle, the database file cannot be recovered and may be damaged.
(2) Main Log File
The master log file is used to modify multiple databases in a transaction during the atomic commit of multiple database operations. These databases are associated with a database connection through the attach command. The master log file is always located in the same directory as the master database file (the master database file is used to create a database connection by calling sqlite3_open (), sqlite3_open_v2 () or sqlite3_open_v2 ), with a random suffix. The main log file contains the names of all associated secondary databases. When a multi-database transaction is committed, the master log file will be deleted.
The primary log file is created only in the following circumstances: one data connection and two or more databases associated with attach are used for session, and one transaction modifies multiple database files. If there is no primary log file, multi-database transactions commit each individual database atomically, but not for the whole multi-database. That is to say, if the commit is interrupted at a certain time point due to a crash or power failure, the change to one database may be completed, and the change to the other database may be rolled back. The master log file ensures that all changes to all databases are either rolled back or committed together.
(3) SQL statement Log File
The SQL statement log file is used to roll back some results of a single SQL statement in a large transaction. For example, assume that an update statement attempts to modify the 100 rows in the database, but terminates the operation after the 50 rows are modified due to unexpected conditions. The SQL statement log is used to cancel the changes of these 50 rows so that the database can be restored to the status before the statement is executed.
The SQL statement log will only be created when one update or insert statement modifies multiple rows of the database and terminates unexpectedly or throws an exception in the trigger, thus Canceling Partial results. If update or insert is not included in begin... in commit, and there are no other active SQL statements on the same database connection, you do not need to create statement logs because the original rollback logs can be used. If a reliable conflict resolution algorithm is used, the statement log is ignored, for example:
Update or fail...
Update or ignore...
Update or replace...
Insert or fail...
Insert or ignore...
Insert or replace...
Replace ....
The SQL statement Log File uses a random file name and does not have to be automatically deleted at the end of the transaction in the same directory as the primary database. The size of the space in the SQL statement log is only the proportion of the change part completed by the update or insert statement.
(4) temporary database
The table created using the "Create temp table" command is only visible on the database connection that executes this command. These temp tables and any associated indexes, triggers, and views are stored in a separate temporary database file. This temporary database is created when the "Create temp table" command is run for the first time. This separate temporary database file also has an associated rollback log. The temporary database used to store temp tables is automatically deleted when sqlite3_close () is used to close the database connection.
The temporary data database file is similar to the auxiliary database file added through the attach command, but it has some special attributes. Temporary database files are always automatically deleted when the database connection is closed. Temporary databases are always set using the Pragma commands synchronous = off and journal_mode = persist. In addition, the temporary database cannot use detach, and other processes cannot use attach to associate the temporary database. Temporary database files and rollback logs are created only when the application uses the "Create temp table" command.
(5) Persistence of views and subqueries
The query command that contains the subquery must execute the subquery at a specific time and store the results in a temporary table. Then, use the content in the temporary table to execute external queries. We call it "persistent" subquery. SQLite's query optimizer tries to avoid persistence, but sometimes this is inevitable. Each temporary table created during the persistence process is stored in its own temporary files and automatically deleted at the end of the query. The size of these temporary tables depends on the amount of data in the subquery object.
Subqueries on the right of the in operator must be persistent. For example:
Select * From ex1 where ex1.a in (select B from ex2 );
In the preceding query command, the execution results of the subquery "select B from ex2" are stored in a temporary table (actually a temporary index ), it uses binary search to determine whether a value ex2. B ex2. B exists. Once this temporary table is created, an external query is run to check whether ex1.a contains each expected result row in the temporary table. If it is true, this result row is output.
To avoid creating temporary tables, you can rewrite the query to the following format:
Select * From ex1 where exists (select 1 from ex2 where ex2. B = ex1.a );
If the column ex2. B has an index, SQLite of 3.5.4 and later versions will automatically rewrite the index.
If the right part of the in operator is a value list, as shown below:
Select * From ex1 where a In (1, 2, 3 );
The Value List on the Right of in is considered as a subquery and must be persistent. That is to say, this query behavior is equivalent to the following:
Select * From ex1 where a in (select 1 Union all
Select 2 Union all
Select 3 );
When the right side of in is a value list, a temporary index is used to hold these values.
When a subquery appears in the from clause of the SELECT command, it is also persistent. For example:
Select * From ex1 join (select B from ex2) as t on T. B = ex1.a;
According to the query, SQLite may need to persist the "(select B from ex2)" subquery to a temporary table, and then execute the connection between ex1 and the temporary table. The query optimizer tries the flattening query to avoid the persistence of subqueries. In this example, the query can be flattened, and SQLite will automatically convert the query
Select ex1. *, ex2. B from ex1 join ex2 on ex2. B = ex1.a;
More complex queries may or may not be flat to avoid temporary tables. Whether the processing is flat depends on whether the subquery or external query contains Aggregate functions, order by or group by clauses, and limit clauses.
(6) Temporary Index
SQLite uses temporary indexes to implement many SQL language features, including:
* Order by or group by clause
* Distinct keywords in aggregate queries
* Compound select statements with join clauses such as union, except T, and intersect
Each temporary index is stored in its own temporary file and is automatically deleted at the end of SQL statement execution.
SQLite tries to use an existing index to implement the order by clause. If an index already exists on the specified field, SQLite will traverse the index (instead of creating a temporary index) to extract the required information and output the result row in the specified order. If SQLite does not find an appropriate index, execute the query and store each row in a temporary index. The keyword of the index is the field specified by order. Then SQLite returns and traverses the temporary index from start to end to output each row in the specified order.
For the group by clause, SQLite sorts the output rows according to the specified field. Compare each output row with the previous row to see if it belongs to the new group. The sorting of the Group by field is the same as that of the order by field. If an existing index exists, use it. If no existing index exists, create a temporary index.
The distinct keyword on the aggregate query creates a temporary index in a temporary file and stores each row of results in the index. Ignore a new result row if it already exists in a temporary index.
The Union operator of the composite query creates a temporary index in a temporary file and stores the results of the left and right subqueries in the index to ignore duplicate rows. After the two subqueries are executed, the temporary indexes are traversed from start to end to generate the final output.
The distinct t operator of the composite query creates a temporary index in a temporary file, stores the subquery results on the left to the temporary index, and then removes the results of the subquery on the right from the index, finally, traverse the temporary index from start to end to get the final output.
The distinct t operator of the composite query creates two independent Temporary indexes, which are located in two independent temporary files. Subqueries on the left and right are executed and placed in their temporary indexes. Then traverse the two indexes and output the results that exist in both indexes.
Note that the Union all operator of the composite query does not use temporary indexes. Of course, the subqueries on the left and right of Union all may use temporary indexes separately, depending on how they are composite.
(7) temporary database used by the VACUUM command
The vacuum command will first create a temporary file, then recreate the entire database and write it to the temporary file. Then copy the contents of the temporary file to the original database file and delete the temporary file. The temporary file created by the VACUUM command is not larger than the original database file.
2. sqlite_temp_store parameters and Pragma commands during compilation
Rollback logs, Master logs, and SQL statement log files are always written to the disk, however, other types of temporary files may be stored in the memory rather than written to the disk (which can reduce the number of I/O operations). Whether the files are written to the disk or stored in the memory depends on the sqlite_temp_store parameter during compilation, temp_store Pragma runtime command, and the size of the temporary file.
When sqlite_temp_store is compiled, the parameter is the macro definition (# define) in the source code. The value range is 0 to 3 (the default value is 1), as follows:
* When the value is equal to 0, temporary files are always stored on the disk, regardless of the settings of the temp_store Pragma command.
* When the value is equal to 1, temporary files are stored on disks by default, but this value can be overwritten by the temp_store Pragma command.
* When the value is equal to 2, temporary files are stored in the memory by default, but this value can be overwritten by the temp_store Pragma command.
* When the value is 3, temporary files are always stored in the memory, and the settings of the temp_store Pragma command are not considered.
The value range of the temp_store Pragma command is 0 to 2 (the default value is 0). When the program is running, the command can be dynamically set as follows:
* When the value is 0, the storage behavior of temporary files is completely determined by the sqlite_temp_store compilation period parameter.
* If the sqlite_temp_store parameter is equal to 1 and uses the memory to store temporary files, this command overwrites this line and uses disk storage. Otherwise, use sqlite_temp_store directly.
* If the sqlite_temp_store parameter is equal to 2 and uses the disk to store temporary files, this command overwrites this line and uses memory storage. Otherwise, use sqlite_temp_store directly.
To reiterate, the temp_store Pragma command during sqlite_temp_store compilation only affects temporary files except the rollback log and the master log. These two types of logs are always written to the disk.
For the preceding two parameters, the parameter values indicate that the values are stored in the memory by default. Only when the size of the temporary file exceeds a certain threshold will the algorithm be used, write part of the data to the disk to avoid the temporary file occupying too much memory and affecting the execution efficiency of other programs.
3. Optimization of other temporary files
SQLite uses the page cache buffer optimization Mechanism for the database pages currently read and written. Therefore, even if temporary files are stored on disks, only when the size of the file increases to a certain size (resulting in page cache filling) can SQLite refresh the file to the disk file, before which they will still reside in the memory. This means that in most scenarios, if the data volume of temporary tables and temporary indexes is relatively small (the page cache is sufficient to store them), they will not be written to the disk, of course, there will be no disk Io. They are refreshed to the disk files only when they are not supported by the memory.
Each temporary table and index has its own page cache. the maximum number of database pages that can be stored is determined by sqlite_default_temp_cache_size compile-time parameters, this parameter specifies the page cache occupied by temporary tables and indexes before being refreshed to the disk file. The default value of this parameter is 500 pages. This parameter value cannot be modified at runtime.
4. Memory Database
The content is organized from http://sqlite.org/inmemorydb.html.
In SQLite, databases are usually stored in disk files. However, in some cases, we can keep the database in the memory. The most common method is to specify the database file name parameter as ": Memory:" When calling sqlite3_open (), sqlite3_open16 () or sqlite3_open_v2 (), for example:
Rc = sqlite3_open (": Memory:", & dB );
After the above functions are called, no disk files are generated. Instead, a new database is successfully created in pure memory. Because the database is not persistent, the database disappears immediately after the current database connection is closed. Note that each: Memory: database is a different database, that is, two independent internal databases are created when two databases are connected by using the file name ": Memory.
The file name ": Memory:" can be used wherever the database file name is allowed. For example, it can be used in the attach command to append a memory database to the current connection like other common databases, for example:
Attach database ': Memory:' As aux1;
Note: When creating a memory database, you can only use the file name ": Memory:" And cannot contain other texts, such as "./: Memory:". A database based on disk files will be created. When using a file name in Uri format, you can also use ": Memory:", for example:
Rc = sqlite3_open ("file: Memory:", & dB );
Or attach database 'file: Memory: 'As aux1;
If the memory database is opened with a URI file name, it can use the shared cache. If the memory database is specified by the unmodified ": Memory" name, the database always has a private cache invisible to other connections. If the URI file name is used, the same memory database can be connected to two or more databases. For example:
Rc = sqlite3_open ("file: Memory :? Cache = shared ", & dB );
Or attach database 'file: Memory :? Cache = shared 'as aux1;
This allows multiple database connections to share the same memory database. Of course, the connections that share a memory database must be in the same process. When the last database connection is closed, the memory database is automatically deleted.
To use multiple different but shared memory databases in a process, you can add the mode = memory query parameter to the URI file name to create a named memory database:
Rc = sqlite3_open ("file: memdb1? Mode = memory & cache = shared ", & dB );
Or ttach database 'file: memdb1? Mode = memory & cache = shared 'As aux1;
The memory database named in this way will only share its cache with another connection with the same name.
5. Temporary database corresponding to the empty file name
When you call the sqlite3_open () function or execute the attach command, if the database file parameter is a Null String, a new temporary file will be created as the storage file of the temporary database, for example:
Rc = sqlite3_open ("", & dB );
Or attach database ''as aux2;
Different temporary files will be created each time. Similar to the memory database, the temporary databases created by the two connections are independent of each other. After the connection is closed, the temporary database will automatically disappear, its Storage files will also be deleted automatically.
Although the disk file is created to store data information in the temporary database, the temporary database will usually reside in the memory like the memory database. The only difference is that, when the data volume in the temporary database is too large, SQLite writes some data in the temporary database to the disk file to ensure that more memory is available for other operations, memory databases always store data in the memory.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.