SQLite Atom Submission Principle

Source: Internet
Author: User
Tags sqlite

English address
Article reference

  1. Brief introduction
    An important feature of a database system that supports transactions, such as SQLite, is atomic commit ( atomic commit ). That is, the write to the database in one transaction is either all executed or not executed. Appears to occur instantaneously when writing to different parts of the database.
    In fact, changes to the contents of the disk take a while, and the write operation cannot occur instantaneously. For this reason, SQLite has a set of logic to ensure the atomicity of transactional operations, even if the system crash or power-down does not destroy atomicity.
    This article describes the techniques and strategies for ensuring atomic manipulation, which apply only to rollback mode . If the database is running under Wal mode, the policy differs from this article.
  2. Assumptions about the hardware
    1. The minimum unit of hard disk write is sector (sector).
      You cannot modify data that is less than one sector. If necessary, you should read the entire sector, modify the part, and then write the entire sector.
      The size of the sector before 3.3.14, write dead in code is 512 bytes. With the development of hardware, the size of the sector has grown to 4k bytes. Thus, in the version that began after 3.3.15, a function and file system were provided to deal with the size of the sector. However, because the size of the file sector is not returned in UNIX and Windows systems, this function still returns 512 bytes. However, this function can work in an embedded system.
    2. Write operations on sectors are not atomic, but are linear.
      The linear meaning here is that when you start a write operation, it starts at one end of the sector and one bit at a time until the other end of the sector. The direction of the write operation can be from the beginning of the sector to the end, or from the end of the sector to the beginning. If the system is out of power during the write operation, the sector is partially changed and some remains unchanged.
      The key to the SQLite hypothesis is that if a portion of the sector changes, the start or end of the sector is bound to change.
      In later versions of 3.5.0, a new VFS (virtual file system) interface was added. VFS is the only interface between SQLite and file system interaction. SQLite provides a default VFS implementation for UNIX and Windows, and allows users to implement a custom VFS implementation at run time.
      In the VFS interface, there is a function called xDeviceCharacteristics . This function interacts with the file system and provides some features of the file system, such as whether the sector write operation is atomic. If this sector writes atoms, then SQLite takes advantage of these features. However, UNIX and Windows default xDeviceCharacteristics functions do not provide this information.
    3. The operating system buffers the file write operations.
      Therefore, when a write request is returned, the data is not actually written to the database file. In addition, it is assumed that the operating system reorder the write operation.
      Therefore, SQLite invokes or operates at the key point flush fsync . SQLite assumes that this operation does not return until the data is written to the file.
      However, there are some versions of Windows and UNIX versions fluse or fsync operations that are not like this. This way, a power-down occurs during commit, which causes the database file to be corrupted.
    4. The change in file size occurs before the content changes.
      This means that the size of the file changes first, so that the file contains some junk data and then writes the data to the file.
      After the file size is written, power is dropped before the data is written. SQLite does some other work to ensure that this situation does not cause database file corruption.
      If the VFS xDeviceCharacteristics method determines that the data has been written to the file before changing the file size, SQLite will take advantage of this feature. However, the default implementation does not confirm this feature.
    5. The deletion of the file is atomic
      The file is dropped during the removal process, then the file is either completely deleted or completely deleted after the reboot.
      If the file is only partially deleted after a reboot, the database will be corrupted.
    6. Powersafe Overwrite when a program writes data to a file, the data outside the written range is not changed, even if a crash or power-down occurs.
      Hypothetical case: If the write operation only occurred in the first few bytes of the sector. Because the minimum unit of write operation is sector. After writing the first few bytes after the power down, restart, the data in this sector to verify, found that the wrong, will be used full 0 or all 1 to cover. This modifies the data outside of the write operation.
      Modern disks can detect power-down and then use the remaining power to write down the sector's data.
  3. Single File commit process

    1. Initial state
      The middle part is the disk buffer of the system.
      ?

    2. Get read lock
      You need to obtain a read lock before the write operation to get the basic data of the database in order to parse the SQL statement.
      Note shared locks are only for the system's disk cache, not for disk files. The file lock is actually some flag of the system kernel. After the system crash or power-down, the lock will fail. Usually the process of creating a lock exits also causes the lock to fail.
      ?

    3. Read data from the database
      Read the system disk cache first and then read the user space. If the cache is hit, the user space is read directly from the disk cache.

      Note that the entire database is not read into memory.
      ?

    4. Get reserved lock
      Before making changes to the database, get reserved lock first. Other process operations that have shared read locks are allowed, but only one database file has to be given to reserved lock. Ensures that only one process at a time can perform database write operations.
      ?

    5. To create a rollback log file
      The log file is the page in the database file that you want to change.
      Also included is a header that records the size of the original database file. The number of the page is written to the page of each database.
      Note that the file is not written to disk at this time.
      ?

    6. Changing the page of the database in user space
      Each database link has its own copy of the database file. As a result, other database connections can still be read normally.
      ?

    7. To swipe a log file into disk
      In most systems, two flush or fsync operations are required. The first time the file data is brushed into a file, the second is used to change the page number of the log file in the header, and the header is brushed into the file.
      ?

    8. Get exclusive lock
      Get the exclusive lock in two steps. First, a pending lock is obtained to ensure that no new writes and reads are done. Then wait for the other read process to end, release the read lock, and finally get the exclusive lock.
      ?

    9. Writing data from user space to a database file
      At this point, you can determine that no other database connection is being read from the database. This step is usually only written to the disk cache and is not written to the database file.
      ?

    10. Write changes to the disk file
      Call the Fsync or flush operation. This section and the write log file to disk occupy the most time in a transaction.
      ?

    11. Delete log files

      SQLite gives the appearance of have made no changes to the database file or have made the complete set of changes to T He database file depending on whether or not the rollback journal file exists.

      Deleting a log file is not atomic, but from the user's perspective, the operation is atomic. Ask the operating system if the file exists, answer yes or No.
      In some systems, it is a time-consuming operation to delete a file. SQLite can be configured to change the size of the file to 0 or overwrite the header of the log file with zero. In either case, the log file is not recoverable, so SQLite thinks the commit is complete.
      ?

    12. Release lock
      In this picture, the database contents of the user space have been emptied. In the latest version, optimizations have been made. In the first page of the database, a counter is maintained, and each write operation adds one to the counter. If the counter is not changed, the database connection can reuse the database content in the user space.
      ?

  4. Rolling back
    Because a commit operation takes time. In this process, if a crash or power-down occurs, a rollback is required to ensure that the completion of the database transaction is "instantaneous". Use database log files to roll back to this database before transactions occur.

    1. The initial condition assumes that a power outage occurred at step 10th. After the reboot, the database file was actually written only 1 and a half page, but we have the complete journal file.
    2. Hot rollback journal when a new database connection is established, an attempt is made to acquire a shared read lock and notice that there is a rollback log file. Next, the database connection will verify that the database file is "Hot journal". A "Hot journal" is generated when a transaction occurs when a commit is dropped or crash. The judging criteria are as follows:
      • Roll back log file exists
      • Rollback log file is not an empty file
      • Reserved lock does not exist in the database file (will be lost after power-down)
      • The header format of the log file has not been compromised
      • The log file does not contain the primary log file name (user multiple file submissions) or contains the main log file, the main log file exists with the log file, we can restore the database.
    3. The get exclusive lock is used to prevent other processes from rolling back the database with this log file at the same time.
    4. Rollback of an incomplete change reads the log file from the disk file into memory and then writes to the database. The log file header stores the size information of the original database. This information is used to truncate the database if the original operation causes the database file to become larger. After this step, the size and content of the database are the same as before the transaction occurred.
    5. Deleting hot journal may also be sized to 0, or the header of the file may be overwritten with 0. Anyway, it's not hot journal.
    6. Continue with other operations
      The database file is now back to normal and can be used normally.
  5. Multiple file Submissions

  6. Important details of the commit process

      1. always records the entire sector
        if the page size is 1k, the size of the sector is 4k. In order to change the data of a page, the entire sector data must be counted into the log file, and the entire sector must be written to the database file when the data is written.
      2. Garbage data when writing log files
        when appending data to a database log file, SQLite assumes that the size of the database log file is larger before the data is written. If a power-down occurs between these two steps, then the log file is left with garbage data. If you use this log file for recovery, overwrite the correct content in the original database.
        SQLite uses two strategies to address this situation.
        1. The number of page entries in the log file's header,
          writes the page data of the log file to the header, and the initial value is 0. Therefore, when you roll back with an incomplete log file, you will find that the head is 0 and nothing will be done.
          before commit, the contents of the log file are brushed onto the disk and guaranteed to be free of junk data. At this point, the number of page in the log file is then brushed again into the disk. The log file header and page in the file are not in the same sector, so the page in the log file is not destroyed even if the power is dropped.
          The above-mentioned situation only occurs when "synchronous pragma" is full. If it is normal, then the number of page and page content will be synchronized to the disk file. Even if the contents of the page are first brushed into the code, and then the number of page is brushed, as the system changes the order of operations, it is possible that the number of page writes is correctly written to the disk, and the contents of the page are not written correctly to the disk.
        2. each page uses checksums
          SQLite prepares a 32bit checksum on each page. If a page checksum is not met, the entire rollback process is not performed.
          If synchronous pragma is full, then theoretically no checksum is required. However, there are no side effects, so no matter what the synchronous pragma is, there is a checksum in the log file.
      3. pre-commit cache overflow
        If the modified content has exceeded the cache of user space before committing, you must first write the completed operation to the database file before you do anything else. The
        Cache overflow promotes reserved locks to exclusive locks, thereby reducing concurrency. There are also additional flush or fsync operations, which are time-consuming to operate. Try to avoid cache overflow.
  7. Optimization
    Performance analysis indicates that SQLite spends most of its time on disk IO. Therefore, if you can reduce disk IO, you can improve the performance of SQLite. Here are some methods that SQLite uses to improve performance in the context of guaranteed transactional atomicity.

    1. Cache between transactions
      In older versions of SQLite, the content of SQLite is removed from the user space after the end of the transaction. The reason for this is because other operations can alter the contents of the database. The next time you read the same content, you still need to read the data from the disk cache or disk to the user space.
      After the new version (3.3.14), the database cache within the user control is preserved. At the same time, the counters are maintained in the header of the database (24 to 27 bytes), and each change adds one. The next time the process reads the database, it is only necessary to determine whether the counter has changed, and if not, then use the cache.
    2. Exclusive access Mode
      Added after the 3.3.14 version, that is, the database can only be accessed by one process (for iOS). In this mode, there are the following advantages.
      1. You do not have to change the counters in the database header after the transaction ends. Reduce one file write for log files and database master files.
      2. You do not have to detect counters in the header at the beginning and end of a transaction, and you do not have to empty the cache.
      3. After the transaction ends, you can overwrite the log file header method instead of deleting the log file. Some file operations are reduced, such as changing the directory entries of the database files, releasing the disk sectors corresponding to the log files, and so on.
    3. Do not record free pages in log files (added after 3.5.0)
      When you delete information from a database, the page that originally recorded the deleted content is counted into the blank list (freelist). When there are new actions to follow, the data is taken from the blank list, not the database file.
      Some free pages contain important information, such as the location of other free pages. But most of the free pages do not contain useful information, called leaves (leaf) free pages (I understand that free pages are stored in trees, and free pages are leaf nodes).
      Leaf free pages are unimportant, so SQLite avoids writing free pages to log files, which can greatly reduce the number of IO
    4. Single-page updates and atomic sector writes
      Modern disks generally guarantee that the write to a single sector is atomic. When power is lost, the disk can use the power in the capacitor or the rotational angular momentum of the disk to complete the write operation of the current sector.
      If the write of the sector is atomic, the size of the database page is consistent with the size of the sector, and the database writes only one page, then the database skips all log and refresh operations and writes the changed content directly to the database file.
      The change counter on the first page of the database is modified individually because there is no effect on the database, even if the power is dropped before the counter is updated.
    5. Secure Append (safe Append) file system (3.5.0 added later)
      SQLite assumes that when appending data to a file, the size of the file changes first, and then the content changes. This power-down will cause the log file to contain junk data.
      If the file system supports file size update before the content of the file must have been updated, then after the power-down or system crash, the log file will not have garbage data.
      To support this file system, SQLite stores 1 in the header of the log file where the page number is stored. SQLite uses the size of the file to calculate the number of page in the file.
      When commit, we save a flush or fsync operation. In addition, when the cache overflows, you do not have to write the new page number to the database log file
    6. Persistent log files
      That is, the log files are not deleted at the end of the database transaction, which can save the file deletion and the creation of the file once.
      Enable methodPRAGMA journal_mode=PERSIST;
      This can cause a database log file to exist all the time. You can also set mode to truncate,PRAGMA journal_mode=TRUNCATE;
      Persist is the header of the log file is set to 0, the subsequent operation of the database file is overwritten. Truncate is to set the size of the log file to 0, do not need to call the Fsync operation, the operation of the database file is append. In embedded systems with synchronized file systems, append operations are slower than overrite, so truncate results in slower behavior than persist.
  8. Test commit behavior Atomicity

  9. possibility of causing database corruption

    1. Incorrect lock implementation
      In the network operating system, it is very difficult to implement the lock mechanism. Therefore, try not to use SQLite in the network operating system.
      Errors can also occur when different locking mechanisms are used to obtain the same file, and the two locking mechanisms are not mutually exclusive.
    2. Incomplete disk Refresh
      The Fsync () system call on UNIX or the FlushFileBuffers () call on Windows is not working properly.
    3. Some files are deleted
      SQLite assumes that the deletion of the file is atomic. If SQLite deleted files are partially recovered after a power-down reboot, a failure occurs.
    4. Be written to garbage data
      Other programs can write junk data to sqlite files.
      The operating system bug.
    5. Delete or rename hot journal
  10. Summary and the way forward

SQLite Atom Submission Principle

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.