SQLite Atom Submission Principle

Last Update:2016-09-17 Source: Internet

Author: User

Tags sqlite

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

English address
Article reference

Brief introduction
An important feature of a database system that supports transactions, such as SQLite, is atomic commit ( atomic commit ). That is, the write to the database in one transaction is either all executed or not executed. Appears to occur instantaneously when writing to different parts of the database.
In fact, changes to the contents of the disk take a while, and the write operation cannot occur instantaneously. For this reason, SQLite has a set of logic to ensure the atomicity of transactional operations, even if the system crash or power-down does not destroy atomicity.
This article describes the techniques and strategies for ensuring atomic manipulation, which apply only to rollback mode . If the database is running under Wal mode, the policy differs from this article.
Assumptions about the hardware
2. The minimum unit of hard disk write is sector (sector).
  You cannot modify data that is less than one sector. If necessary, you should read the entire sector, modify the part, and then write the entire sector.
  The size of the sector before 3.3.14, write dead in code is 512 bytes. With the development of hardware, the size of the sector has grown to 4k bytes. Thus, in the version that began after 3.3.15, a function and file system were provided to deal with the size of the sector. However, because the size of the file sector is not returned in UNIX and Windows systems, this function still returns 512 bytes. However, this function can work in an embedded system.
4. Write operations on sectors are not atomic, but are linear.
  The linear meaning here is that when you start a write operation, it starts at one end of the sector and one bit at a time until the other end of the sector. The direction of the write operation can be from the beginning of the sector to the end, or from the end of the sector to the beginning. If the system is out of power during the write operation, the sector is partially changed and some remains unchanged.
  The key to the SQLite hypothesis is that if a portion of the sector changes, the start or end of the sector is bound to change.
  In later versions of 3.5.0, a new VFS (virtual file system) interface was added. VFS is the only interface between SQLite and file system interaction. SQLite provides a default VFS implementation for UNIX and Windows, and allows users to implement a custom VFS implementation at run time.
  In the VFS interface, there is a function called xDeviceCharacteristics . This function interacts with the file system and provides some features of the file system, such as whether the sector write operation is atomic. If this sector writes atoms, then SQLite takes advantage of these features. However, UNIX and Windows default xDeviceCharacteristics functions do not provide this information.
6. The operating system buffers the file write operations.
  Therefore, when a write request is returned, the data is not actually written to the database file. In addition, it is assumed that the operating system reorder the write operation.
  Therefore, SQLite invokes or operates at the key point flush fsync . SQLite assumes that this operation does not return until the data is written to the file.
  However, there are some versions of Windows and UNIX versions fluse or fsync operations that are not like this. This way, a power-down occurs during commit, which causes the database file to be corrupted.
8. The change in file size occurs before the content changes.
  This means that the size of the file changes first, so that the file contains some junk data and then writes the data to the file.
  After the file size is written, power is dropped before the data is written. SQLite does some other work to ensure that this situation does not cause database file corruption.
  If the VFS xDeviceCharacteristics method determines that the data has been written to the file before changing the file size, SQLite will take advantage of this feature. However, the default implementation does not confirm this feature.
10. The deletion of the file is atomic
  The file is dropped during the removal process, then the file is either completely deleted or completely deleted after the reboot.
  If the file is only partially deleted after a reboot, the database will be corrupted.
12. Powersafe Overwrite when a program writes data to a file, the data outside the written range is not changed, even if a crash or power-down occurs.
  Hypothetical case: If the write operation only occurred in the first few bytes of the sector. Because the minimum unit of write operation is sector. After writing the first few bytes after the power down, restart, the data in this sector to verify, found that the wrong, will be used full 0 or all 1 to cover. This modifies the data outside of the write operation.
  Modern disks can detect power-down and then use the remaining power to write down the sector's data.
Single File commit process
2. Initial state
  The middle part is the disk buffer of the system.
  ?
4. Get read lock
  You need to obtain a read lock before the write operation to get the basic data of the database in order to parse the SQL statement.
  Note shared locks are only for the system's disk cache, not for disk files. The file lock is actually some flag of the system kernel. After the system crash or power-down, the lock will fail. Usually the process of creating a lock exits also causes the lock to fail.
  ?
6. Read data from the database
  Read the system disk cache first and then read the user space. If the cache is hit, the user space is read directly from the disk cache.
  
  Note that the entire database is not read into memory.
  ?
8. Get reserved lock
  Before making changes to the database, get reserved lock first. Other process operations that have shared read locks are allowed, but only one database file has to be given to reserved lock. Ensures that only one process at a time can perform database write operations.
  ?
10. To create a rollback log file
  The log file is the page in the database file that you want to change.
  Also included is a header that records the size of the original database file. The number of the page is written to the page of each database.
  Note that the file is not written to disk at this time.
  ?
12. Changing the page of the database in user space
  Each database link has its own copy of the database file. As a result, other database connections can still be read normally.
  ?
14. To swipe a log file into disk
  In most systems, two flush or fsync operations are required. The first time the file data is brushed into a file, the second is used to change the page number of the log file in the header, and the header is brushed into the file.
  ?
16. Get exclusive lock
  Get the exclusive lock in two steps. First, a pending lock is obtained to ensure that no new writes and reads are done. Then wait for the other read process to end, release the read lock, and finally get the exclusive lock.
  ?
18. Writing data from user space to a database file
  At this point, you can determine that no other database connection is being read from the database. This step is usually only written to the disk cache and is not written to the database file.
  ?
20. Write changes to the disk file
  Call the Fsync or flush operation. This section and the write log file to disk occupy the most time in a transaction.
  ?
22. Delete log files
  
  SQLite gives the appearance of have made no changes to the database file or have made the complete set of changes to T He database file depending on whether or not the rollback journal file exists.
  
  Deleting a log file is not atomic, but from the user's perspective, the operation is atomic. Ask the operating system if the file exists, answer yes or No.
  In some systems, it is a time-consuming operation to delete a file. SQLite can be configured to change the size of the file to 0 or overwrite the header of the log file with zero. In either case, the log file is not recoverable, so SQLite thinks the commit is complete.
  ?
24. Release lock
  In this picture, the database contents of the user space have been emptied. In the latest version, optimizations have been made. In the first page of the database, a counter is maintained, and each write operation adds one to the counter. If the counter is not changed, the database connection can reuse the database content in the user space.
  ?
Rolling back
Because a commit operation takes time. In this process, if a crash or power-down occurs, a rollback is required to ensure that the completion of the database transaction is "instantaneous". Use database log files to roll back to this database before transactions occur.
1. The initial condition assumes that a power outage occurred at step 10th. After the reboot, the database file was actually written only 1 and a half page, but we have the complete journal file.
2. Hot rollback journal when a new database connection is established, an attempt is made to acquire a shared read lock and notice that there is a rollback log file. Next, the database connection will verify that the database file is "Hot journal". A "Hot journal" is generated when a transaction occurs when a commit is dropped or crash. The judging criteria are as follows:
  - Roll back log file exists
  - Rollback log file is not an empty file
  - Reserved lock does not exist in the database file (will be lost after power-down)
  - The header format of the log file has not been compromised
  - The log file does not contain the primary log file name (user multiple file submissions) or contains the main log file, the main log file exists with the log file, we can restore the database.
3. The get exclusive lock is used to prevent other processes from rolling back the database with this log file at the same time.
4. Rollback of an incomplete change reads the log file from the disk file into memory and then writes to the database. The log file header stores the size information of the original database. This information is used to truncate the database if the original operation causes the database file to become larger. After this step, the size and content of the database are the same as before the transaction occurred.
5. Deleting hot journal may also be sized to 0, or the header of the file may be overwritten with 0. Anyway, it's not hot journal.
6. Continue with other operations
  The database file is now back to normal and can be used normally.
Multiple file Submissions
Important details of the commit process
Optimization
Performance analysis indicates that SQLite spends most of its time on disk IO. Therefore, if you can reduce disk IO, you can improve the performance of SQLite. Here are some methods that SQLite uses to improve performance in the context of guaranteed transactional atomicity.
2. Cache between transactions
  In older versions of SQLite, the content of SQLite is removed from the user space after the end of the transaction. The reason for this is because other operations can alter the contents of the database. The next time you read the same content, you still need to read the data from the disk cache or disk to the user space.
  After the new version (3.3.14), the database cache within the user control is preserved. At the same time, the counters are maintained in the header of the database (24 to 27 bytes), and each change adds one. The next time the process reads the database, it is only necessary to determine whether the counter has changed, and if not, then use the cache.
4. Exclusive access Mode
  Added after the 3.3.14 version, that is, the database can only be accessed by one process (for iOS). In this mode, there are the following advantages.
  1. You do not have to change the counters in the database header after the transaction ends. Reduce one file write for log files and database master files.
  2. You do not have to detect counters in the header at the beginning and end of a transaction, and you do not have to empty the cache.
  3. After the transaction ends, you can overwrite the log file header method instead of deleting the log file. Some file operations are reduced, such as changing the directory entries of the database files, releasing the disk sectors corresponding to the log files, and so on.
6. Do not record free pages in log files (added after 3.5.0)
  When you delete information from a database, the page that originally recorded the deleted content is counted into the blank list (freelist). When there are new actions to follow, the data is taken from the blank list, not the database file.
  Some free pages contain important information, such as the location of other free pages. But most of the free pages do not contain useful information, called leaves (leaf) free pages (I understand that free pages are stored in trees, and free pages are leaf nodes).
  Leaf free pages are unimportant, so SQLite avoids writing free pages to log files, which can greatly reduce the number of IO
8. Single-page updates and atomic sector writes
  Modern disks generally guarantee that the write to a single sector is atomic. When power is lost, the disk can use the power in the capacitor or the rotational angular momentum of the disk to complete the write operation of the current sector.
  If the write of the sector is atomic, the size of the database page is consistent with the size of the sector, and the database writes only one page, then the database skips all log and refresh operations and writes the changed content directly to the database file.
  The change counter on the first page of the database is modified individually because there is no effect on the database, even if the power is dropped before the counter is updated.
10. Secure Append (safe Append) file system (3.5.0 added later)
  SQLite assumes that when appending data to a file, the size of the file changes first, and then the content changes. This power-down will cause the log file to contain junk data.
  If the file system supports file size update before the content of the file must have been updated, then after the power-down or system crash, the log file will not have garbage data.
  To support this file system, SQLite stores 1 in the header of the log file where the page number is stored. SQLite uses the size of the file to calculate the number of page in the file.
  When commit, we save a flush or fsync operation. In addition, when the cache overflows, you do not have to write the new page number to the database log file
12. Persistent log files
  That is, the log files are not deleted at the end of the database transaction, which can save the file deletion and the creation of the file once.
  Enable methodPRAGMA journal_mode=PERSIST;
  This can cause a database log file to exist all the time. You can also set mode to truncate,PRAGMA journal_mode=TRUNCATE;
  Persist is the header of the log file is set to 0, the subsequent operation of the database file is overwritten. Truncate is to set the size of the log file to 0, do not need to call the Fsync operation, the operation of the database file is append. In embedded systems with synchronized file systems, append operations are slower than overrite, so truncate results in slower behavior than persist.
Test commit behavior Atomicity
possibility of causing database corruption
1. Incorrect lock implementation
  In the network operating system, it is very difficult to implement the lock mechanism. Therefore, try not to use SQLite in the network operating system.
  Errors can also occur when different locking mechanisms are used to obtain the same file, and the two locking mechanisms are not mutually exclusive.
2. Incomplete disk Refresh
  The Fsync () system call on UNIX or the FlushFileBuffers () call on Windows is not working properly.
3. Some files are deleted
  SQLite assumes that the deletion of the file is atomic. If SQLite deleted files are partially recovered after a power-down reboot, a failure occurs.
4. Be written to garbage data
  Other programs can write junk data to sqlite files.
  The operating system bug.
5. Delete or rename hot journal
Summary and the way forward

SQLite Atom Submission Principle

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More