In SQLite, the lock and concurrency control mechanisms are handled by the PAGER.C module for the implementation of acid (Atomic, consistent, isolated, and durable) features. In a transaction that contains data modifications, the module ensures that all data modifications are committed, or all are rolled back. At the same time, the module also provides some memory cache functions for disk files.
In fact, the pager module does not care about the details of database storage, such as B-tree, encoding, indexing, etc. It simply treats it as a single file consisting of a uniform size (typically 1024 bytes) of data blocks, each of which is called a page. The starting number of the page is 1, that is, the first 1024 bytes of the database are called "Page 1", followed by the page number and so on. Pager communicates with the operating system through an OS interface module, such as OS_UNIX.C, OS_WIN.C.
1. Lock
From the point of view of a single process, a database file can have five different lock states:
(1) Unlocked: The file does not hold any locks, that is, the current database does not exist any read or write operations. Other processes can perform arbitrary read and write operations on the database. This state is the default state.
(2) SHARED: In this state, the database can be read but cannot be written. At the same time, any number of processes can hold shared locks on the same database, so the read operation is concurrent. In other words, an operation that has a database file write is no longer allowed to exist as long as one or more shared locks are active.
(3) RESERVED: If a process is going to write in the current database at some point in the future, but just read the data from the database, then we can simply understand that the database file now has a reserved lock. When a retention lock is active, only one or more shared locks exist in the database, that is, only one reserved lock and multiple shared locks can exist at the same time in the same database. In Oracle, such locks are called pre-write locks, but the granularity of the lock in Oracle can be refined to a table or even to a row, so the lock does not affect concurrency as much in Oracle as it does in SQLite.
(4) PENDING: The lock means that a process is going to write on the database, but there are many shared locks (read operations) in the database, then the write operation must be in the waiting state, that is, waiting for all shared locks to disappear, at the same time, the new read operation will no longer be allowed, To prevent the writing of the hunger phenomenon of locking occurs. During this wait, the lock status of the database file is pending, and after all shared locks have disappeared, the pending lock state's database file will enter the exclusive state after acquiring the exclusive lock.
(5) EXCLUSIVE: Before performing a write operation, the process must first obtain an exclusive lock on the database. However, once an exclusive lock is possessed, no other lock type can coexist with it. Therefore, in order to maximize concurrency efficiency, SQLite will minimize the total amount of time that an exclusive lock occupies.
2. Roll back the log
When a process modifies a database file (and is not in Wal mode), it first logs unchanged content to the rollback log file. The rollback log also records the initial size of the database for later rollback operations. If a transaction in SQLite is attempting to modify data in multiple databases (using the Attach command), then each database will generate a rollback log file of its own, which is used to record its own data changes separately. At the same time, a primary database log file is generated to coordinate multiple database operations, and the primary database log file does not contain the page data to be rolled back, it simply contains the file name of each database rollback log file. The file name information for the primary database log file is also included in each rollback log file. However, for rollback log files that do not require a primary database log file, the information for the primary database log file is also retained, but the value of this information is empty at this time.
We can treat the rollback log as a "hot" log file because it exists to restore the consistent state of the database. When a process is updating the database, the application or the OS crashes abruptly so that the update operation does not complete successfully, resulting in a hot log. So we can say that hot logs are generated only under exceptional conditions, and if everything goes well, the file will never exist.
In the absence of a primary database log, if a log has a non-0 header, and the associated database file does not have a reserved lock, it is hot. In the case of a primary database log, if a log's primary database log exists and there is no reserved lock on the associated database file, it is also hot. It is important to understand when a log is hot and to write the preceding rules in the following form:
* A log is hot, if
* It exists, and
* Its space size is greater than 512 bytes, and
* The log head is non-zero, well structured, and
* Its primary database log exists, or the primary database file name is an empty string, and
* There is no reserved lock in the relevant database file.
Before reading a database, SQLite always checks to see if it has a hot log. If so, the rollback is performed before the database is read to ensure that the database state is consistent. When a process wants to read a database, the following steps must be completed:
(1) Open the database file and get a shared lock. If a shared lock cannot be acquired, it fails immediately and returns SQLITE_BUSY.
(2) Check whether the database files have hot logs, if not, then the work is completed, immediately return. If so, the log must be rolled back according to the following algorithm steps.
(3) Obtain a wait lock on the database file, and then acquire an exclusive lock (be careful not to get a retention lock, as this will make other processes think the log is no longer hot). If the acquisition fails, it means that another process is trying to do a rollback operation. You can only release all locks, close the database, and return to Sqlite_busy.
(4) Read the log file and roll back the previous modifications.
(5) Wait for rollback writes to the persistent storage device to restore the integrity of the database.
(6) Delete the log file (or if the pragma journal_mode=truncate instruction is set, the log is shortened to 0 bytes, if the pragma journal_mode=persist instruction is set, the log head is cleared 0).
(7) Delete the primary database log if it is safe to do so. This step is optional, just to prevent the stale primary database log files from filling up the disk.
(8) Release the exclusive lock and wait for the lock, but still keep the shared lock.
After these algorithm steps are successfully completed, the database can be read safely. Once all reads are complete, the shared lock is released.
An outdated primary database log is no longer useful, and it is removed only to free up disk space. A primary database log is out-of-date if there is no separate log file pointing to it. To determine whether a primary database log is out of date, SQLite first reads the primary database log file to get all the log filenames. Then check these log files to see if there is a primary database log file name field that points to the primary database log, and if so, the primary database file is not expired, otherwise the primary database file expires.
3. Data Write
If a process wants to perform a write operation on the database, it must obtain a shared lock (if there is a hot log, roll back the outstanding changes) as described earlier, and then acquire the retention lock after the shared lock is acquired. Because a reserved lock indicates that the process will perform a write operation at some point in the future, only one process can hold a reserved lock at the same time, but other processes can continue to hold the shared lock to complete the data read operation. If the process to perform a write operation cannot acquire a retention lock, it means that another process has acquired a retention lock. In this case, the write operation fails and immediately returns a sqlite_busy error. After a retention lock is successfully acquired, the write process creates a rollback log. The header of the log is initialized to the original size of the database file. The log header also has a field for the primary database log file name, initially an empty string.
Before any changes are made to any data, the write process writes the original content in the page to be modified to the rollback log file, but the page that is about to change is not written directly to the disk file at first, but remains in memory. The database is still unmodified, and other processes can continue to read the data in that database.
Either because the in-memory cache is full, or if the application has committed a transaction, the write process eventually updates the data to the database file. Until then, however, the write process must ensure that no other process is reading the database, and that the data in the rollback log is actually physically written to the disk file (so that it can be used for rollback when the system crashes or loses power). The steps are as follows:
(1) Ensure that all rollback log data is physically written to the disk file so that the database can be restored to a consistent state in the event of a system crash.
(2) To obtain a wait lock on the database file, and then acquire an exclusive lock, if the other process still holds a shared lock, the write thread will have to be suspended and wait until those shared locks disappear before the exclusive lock can be obtained.
(3) Write the modified page that is held in memory to the original disk file.
If the reason for writing to the database file is because the cache is full, the write process will not commit immediately, but continue to modify the other pages. However, the rollback log must be flushed to disk again before subsequent modifications are written to the database file. Also note that the exclusive lock acquired by the write process must be held until all changes are committed. This means that from the first time the data is flushed to the disk file, other processes cannot access the database until the transaction is committed.
When the write process prepares to commit changes, the following steps are performed:
(4) Obtain an exclusive lock, and through step 1-3 above ensure that all in-memory change data is written to the disk file.
(5) Physically write all modifications of the database file to disk.
(6) Delete the log file (or if pragma journal_mode is truncate or persist, truncate the log file or clear the header 0). If a system failure occurs before the deletion, the process will still recover based on the hot log the next time the database is opened. Therefore, only after the log file has been successfully deleted can we assume that the transaction completed successfully.
(7) Remove all exclusive and pending locks from the database file.
Once the pending lock is released, other processes can begin to read the database again. In the current implementation, the retention lock is also freed, but this is not required. Future versions of SQLite may provide a SQL command "CHECKPOINT", which submits all changes made by the current transaction, but holds a retention lock so that more changes can be made without the opportunity to write data to any other process.
If a transaction contains multiple database modifications, its commit logic will be more complex, as shown in the following steps:
(4) Ensure that each database file already holds an exclusive lock and a valid log file.
(5) Create the primary database log file whose file name is random. The file name of the rollback log file for each database is also written to the primary database log file and flushed to disk.
(6) The file name of the primary database log file is then written to each database at the specified location of the rollback log file and flushed to disk.
(7) Persist all database changes to the database disk file.
(8) Delete the primary log file, if a system failure occurs before the deletion, the process will still be recovering based on the hot log the next time the database is opened. Therefore, only after the primary log file has been successfully deleted can we assume that the transaction completed successfully.
(9) Delete the respective log files for each database.
(10) Remove exclusive and pending locks from all databases.
Finally, in SQLite2, if more than one process is reading from the database, that is, the database always has a read operation, that is, at every moment the database holds at least one shared lock, which will cause no process to perform write operations, Because a write lock cannot be obtained when the database holds a read lock, we call this situation "write hunger". In SQLite3, the use of pending locks effectively avoids the "write-hungry" situation. When a process holds a pending lock, the already existing read operation can continue until it ends normally, but the new read operation will no longer be accepted by SQLite, so after all the existing read operations are completed, The process that holds the pending lock can be activated and an attempt is made to further acquire an exclusive lock to complete the data modification operation.
4. How the database file is damaged
Pager modules are very robust, but sometimes they can be destroyed. If a rogue process opens a database file or log and writes useless data, the database will be corrupted. There is no need for more discussion in this case.
On UNIX, SQLite uses POSIX-recommended locks to implement the lock function. The Lockfile (), LockFileEx (), and Unlockfile () system calls are used on Windows. SQLite assumes that these system calls work correctly, or the database may be corrupted. It is important to note that POSIX-recommended locks are simpler, but not even implemented on many NFS (including the current Max OS x version), and there are many reports that the network file system under Windows has a lock problem, so you'd better avoid using SQLite on a network file system.
Under UNIX, SQLite uses the Fsync () system to flush data to disk, while Windows uses FlushFileBuffers (). Again, SQLite assumes that these operating system service functions are working correctly. But there are reports that Fsync () and flushfilebuffers () do not always work correctly, especially on inexpensive IDE hard drives. Some IDE hard disk manufacturer's controller chip report data has been written to the disk surface, but in fact the data is still in the hard drive circuit of the volatile cache. There are also reports that Windows sometimes ignore flushfilebuffers () for some uncertain reasons. If these reports are true, it means that database corruption is possible due to power outages. SQLite does not prevent hardware and OS vulnerabilities.
If the Linux ext3 file system is mounted without the "barrier=1" option in/etc/fstab and the disk-driven write cache is active, file system corruption can occur when the power-down or OS crashes, especially for inexpensive consumer-grade drives. Enterprise-class storage devices with non-volatile write caches are much less likely to have file system losses. It is said that many Linux distributions do not use the barrier=1 option and do not disable write caching, so many Linux distributions are vulnerable to this problem. Note that this is an operating system and hardware problem, SQLite is powerless, other database engine also has this problem.
If a crash or power outage occurs, a hot log is generated, but the hot log is deleted. The next process opens the database without knowing that the database needs to be rolled back and the database is in an inconsistent state. There are a number of reasons why the rollback log is deleted:
(1) The system administrator may do cleanup after the OS crashes or the system has power down, seeing that the log file considers it to be garbage and deleted.
(2) Someone (or a process) may rename the database file, but it does not have to name the associated log.
(3) If the database file has an alias (hard link or soft link) and the database file is opened by a link alias, the resulting log file will be named with the link name, and the log will not be found if another link name is used the next time the database is opened. To avoid this problem, you should not create a link to the SQLite database file.
(4) File system corruption caused by power failure may cause the log to be renamed or deleted.
When SQLite creates a log file on Unix, it opens the directory where the log file resides, and calls Fsync () to attempt to write directory information to disk. However, if another process is adding or removing unrelated files from the directory, a burst power outage can cause the log files to be removed from the directory and moved to "Lost+found". This is a rare scenario, but it can happen. The best way to avoid this situation is to use the log file system.
For transactional commits that involve multiple databases and one primary database log, if the databases are on different disk volumes, a power outage occurs when the transaction commits, the disk may be mounted with a different name when the machine is back up, or some disks are not mounted at all. In such cases, each log file and the primary database log file may not find each other, and the worst-case result is that the commit becomes no longer atomic. Some databases may be rolled back, while others are not rolled back. To avoid this problem, we should store all the databases on a disk volume and use the same name to mount the hard drive after the power outage.
5, SQL-level transaction control
sqlite 3 Some fine-grained changes are made to the implementation of locks and concurrency control, especially for transactions, a feature of the SQL language level. By default, SQLite 3 puts all SQL operations into Antocommit mode, so that all modifications to the database are automatically committed after the execution of the SQL command finishes. In SQLite, the SQL command "BEGIN TRANSACTION" (where the TRANSACTION keyword is optional) is used to explicitly declare a transaction, disabling autocommit mode, where subsequent SQL statements are not automatically committed after execution. Instead, you need to wait until the SQL command "commit" or "ROLLBACK" is executed to consider committing or rolling back. Note that the BEGIN command does not acquire any type of lock, and after begin, a shared lock is obtained when the first SELECT statement is executed, and a retention lock is obtained when the first DML statement (INSERT, update, or delete) is executed. In the case of exclusive locks, only when the data is written to disk from memory can it hold an exclusive lock until the transaction commits or rolls back. The
sql command commit does not actually commit changes to the disk, it just re-opens autocommit mode. Then, at the end of the command, the formal autocommit logic actually commits the changes to the disk. SQL command rollback also opens autocommit mode, but it sets a flag to tell the autocommit logic to perform a rollback instead of committing. If the autocommit logic commit change fails because another process holds the shared lock, the autocommit mode is automatically closed. This allows the user to re-commit after the shared lock is released.
If multiple SQL commands are executed in the same database connection at the same time, autocommit will be deferred until the last command is completed. For example, if a SELECT statement is being executed, all retrieved row records need to be returned during the execution of the command, if the thread processing the result set at this time is temporarily suspended due to the need for the business logic and is waiting, while the other thread may be performing an insert on the database at this time, Update or delete command, all data modifications made by these commands must wait until the select search is complete before being committed.
SQLite profiling locks and concurrency control