SQLite Anatomy of asynchronous IO mode, shared cache mode, and lock notification

Source: Internet
Author: User
Tags mutex sqlite sqlite database wrapper


1. Asynchronous I/O mode
Typically, when SQLite writes a database file, it waits until the write operation is complete and then controls the return to the calling program. Writing a file system is very time-consuming compared to CPU operation, which is a performance bottleneck. The asynchronous I/O backend is an extension of SQLite that allows SQLite to use a separate background thread to perform all write requests. While this does not reduce the resource consumption of the entire system (CPU, disk bandwidth, etc.), it allows SQLite to return to the caller as soon as the database is being written, which, from a user's point of view, increases the responsiveness of the front end. For asynchronous I/O, write requests are handled in a separate background thread, which means that the thread that initiates the database write does not have to wait for disk I/O to occur. The write operation seems to have happened very quickly, but in fact the speed is the same as usual, just in the background.
Asynchronous I/O seems to provide better responsiveness, but it comes at a cost. You will lose the persistence (durable) attribute in acid. In SQLite's default I/O backend, once the write operation is complete, you know that the changed data is safely on the disk. This is not the case with asynchronous I/O. If the application crashes or power down before the asynchronous write thread finishes after a data write operation, the database change may not be written to disk at all, and the next time the database is used, the changes are not visible.
asynchronous I/O loses durability, but still retains the other three properties of acid: atomicity (Atomic), consistency (consistent), and isolation (Isolated). Many applications do not work well with persistence.
We use the asynchronous I/O pattern by creating a SQLite VFS object and registering it with Sqlite3_vfs_register (). When using this VFS to open a database file and write (using the Xwrite () method of the VFS), the data is not immediately written to the disk, but is placed in a write queue maintained by the background thread. When the database file is opened and read with the asynchronous VFS (using the Xread () method of the VFS), the data is read from the disk, and the write queue is Xwrite () completed from the perspective of the VFS read process. The virtual file system (VFS) for asynchronous I/O is registered through Sqlite3async_initialize () and is closed by Sqlite3async_shutdown ().
To accumulate experience, asynchronous I/O implementations are intentionally kept simple. More features will be added in a future release. For example, in the current implementation, if the write operation is occurring on a steady stream that exceeds the I/O capability of the background write thread, the pending write queue will grow indefinitely and may deplete the host system's memory. A more complex module can track the number of pending write operations and stop receiving new write requests after a certain number.
Multiple connections using asynchronous IO can access a single database concurrently in a single process. From the user's point of view, if all the connections are in a single process, the concurrency of normal sqlite and SQLite using asynchronous IO is no different. If the file lock is active (which is activated by default), the connection from multiple processes is read and written to the database file, and concurrency is weakened under the following conditions:
(1) When a connection using asynchronous IO initiates a database transaction, the database is immediately locked. However, locks are released only after all operations in the write queue have been flushed to disk. This means that sometimes the database may remain locked even after a COMMIT or ROLLBACK has finished executing.
(2) If an application uses asynchronous IO to perform multiple transactions consecutively, other database users may not be able to use the database because the database has been locked. This is because when a begin is executed, the database lock is immediately built up. However, when the corresponding commit or rollback occurs, the lock may not be released, and it will not be released until the background write queue is flushed to disk. If the background write queue has not been refreshed, the database is locked and other processes cannot access the database.
File locks can be disabled at run time through the Sqlite3async_control () function. For NFS This can improve performance because you can avoid creating file locks on the server's asynchronous operations. However, if multiple connections try to access the same database and the file lock is disabled, application crashes and database corruption can occur.
The asynchronous IO Extension module consists of a single source file, SQLITE3ASYNC.C, and a header file Sqlite3async.h, located under the ext/async/subdirectory of the source tree. The application can use the C API defined in it to activate and control the functionality of this module. In order to use asynchronous IO extensions, compile sqlite3async.c as part of an application that uses SQLite, and then initialize and configure the module using the APIs defined in Sqlite3async.h. These APIs are described in detail in Sqlite3async.h's comments, and using these APIs usually has the following steps:
(1) Call Sqlite3async_initialize () to register the asynchronous IO VFS (virtual file system) for SQLite.
(2) Create a background thread to perform the write operation and call Sqlite3async_run ().
(3) using the normal SQLite API to read and write the database via the asynchronous IO VFS.
The current asynchronous IO extension is compatible with Win32 systems and systems that support pthread interfaces, including Mac OS X, Linux, and other Unix variants. In order to migrate asynchronous IO extensions to other platforms, the user must implement the mutex and conditional variable primitives on the new platform. There is currently no externally available interface to allow such control, but it is fairly easy to modify the code in SQLITE3ASYNC.C to include concurrency control primitives for the new platform, and more details to search the comment string "PORTING FUNCTIONS" in SQLITE3ASYNC.C. Then implement the new versions of the following functions:
static void Async_mutex_enter (int emutex);
static void Async_mutex_leave (int emutex);
static void async_cond_wait (int econd, int emutex);
static void async_cond_signal (int econd);
static void Async_sched_yield (void);
The functions above are described in detail in SQLITE3ASYNC.C's comments.



2. Shared cache mode
Starting with version 3.3.0, SQLite includes a special "shared cache" mode (disabled by default) and is primarily used in embedded servers. If the shared cache mode is active and one thread establishes multiple connections on the same database, the connections share a single data and schema cache. This can significantly reduce the memory and IO consumption of the system. In version 3.5.0, Shared cache mode is modified so that the same cached share can span the entire process and not just a single thread. Before this modification, passing data connections between threads is restricted. Starting with version 3.5.0, this limit is eliminated.



From the perspective of another process or thread, two or more database connections that use a shared cache look like a connection. The lock protocol is used to arbitrate between multiple shared caches or database users.






Figure 1 Shared cache mode



Figure 1 depicts an example of a run-time configuration with three database connections. Connection 1 is a normal SQLite database connection, connection 2 and 3 share a cache. The normal lock protocol is used to serialize database access between connection 1 and the shared cache. The serialization of connection 2 and connection 3 for shared cache access has a dedicated internal protocol. See below for a description.
There are three levels of shared cache lock models, transaction-level locks, table-level locking, and pattern-level locking.



(1) transaction-level locking
SQLite connections may open two types of transactions, read transactions, and write transactions. This is not explicitly done, and a transaction implicitly contains a read transaction until it writes a database file for the first time, and then becomes a write transaction. At any time, there can be at most one connection on the shared cache opening a write transaction, and this write transaction can coexist with any number of read transactions. This is different from the non-shared cache mode, where a write operation is not allowed when there are read operations in the non-shared cache mode.



     (2) Table-level locking
     When two or more connections use a shared cache, a lock is used to serialize concurrent access for each table. The table supports two types of locks, read locks, and write locks. Locks are granted to a connection, and every table on each database connection can have read, write, or no locks at any time. A table can have any number of read locks, but only one write lock. When reading a database table, you must first obtain a read lock. A write lock must be obtained when writing a table. If the required lock cannot be obtained, the query fails and returns sqlite_locked to the caller. A table-level lock is released after it is fetched to the end of the current transaction (read or write).
     If you use the read_uncommitted pragma directive to separate the transaction isolation mode from serial (serialized, the default mode, which is to query the data with shared locks, Block other transactions to modify the real data) to allow dirty reads (read-uncommitted, that is, select reads other transaction modifications without committing the data), the behavior described above is slightly different. There are two other types of transaction isolation modes, which cannot be read repeatedly read-comitted is the same query executed two times in the same transaction, if the time period between the first and second query, the other transaction just modifies the data of its query and commits, two reads inconsistent data. Can be repeated read read-repeatable refers to the same transaction two times the same query statement, the resulting data is always consistent.


 
/* Set the value of the read-uncommitted flag: 
  ** 
  **   True  -> Set the connection to read-uncommitted mode. 
  **   False -> Set the connection to serialized (the default) mode. 
  */  
  PRAGMA read_uncommitted = <boolean>;  
  
  /* Retrieve the current value of the read-uncommitted flag */  
  PRAGMA read_uncommitted; 


A database connection that allows dirty read mode does not acquire a read lock when reading a database table, and if another database connection modifies the table data being read, it may result in inconsistent query results because read transactions that allow dirty read mode are not interrupted. Allowing dirty read mode does not affect write transactions, it must acquire write locks, so database write operations can be blocked. Allowing dirty read mode also does not affect locks at the sqlite_master level.


(3) mode (Sqlite_master) level locking
The Sqlite_master table supports shared cache read and write locks that are identical to other database tables. The following special rules are also used:
* Before accessing any database tables or obtaining any other read and write locks, the connection must first obtain a read lock on the Sqlite_master table.
* Before executing a statement that modifies the database schema (for example, CREATE table or drop table), the connection must first obtain a write lock on the Sqlite_master table.
* If any other connection holds a write lock on the Sqlite_master table of the associated database (including the default primary database), then the connection cannot compile an SQL statement.
Between SQLite 3.3.0 and 3.4.2, the database connection can only be used by the thread that created it Sqlite3_open (), and a connection can only share the cache with other connections in the same thread. Starting with SQLite 3.5.0, this limitation was eliminated. On the older version of SQLite, shared cache mode cannot be used on virtual tables, starting with SQLite 3.6.17, which eliminates this limitation.
The shared cache mode is activated at each process level. The C interface, int sqlite3_enable_shared_cache (int), is used to globally activate or disable shared cache mode. Each call to Sqlite3_enable_shared_cache () affects subsequent database connections that are created using Sqlite3_open (), Sqlite3_open16 (), or SQLITE3_OPEN_V2 (). A database connection that already exists is not affected. Each call to the Sqlite3_enable_shared_cache () overwrites the previous calls on the process.
A single database connection created with SQLITE3_OPEN_V2 () may choose to participate in or not participate in shared cache mode by using the Sqlite_open_sharedcache or Sqlite_open_privatecache flag on the third parameter. These flags override the global Sqlite3_enable_shared_cache () setting on the database connection. If both flags are used at the same time, the behavior is undefined.
When using the URI file name, the "cache" query parameter can be used to specify whether the connection uses shared cache mode. "Cache=shared" Activates the shared cache, and "Cache=private" disables the shared cache. For example:
ATTACH ' file:aux.db?cache=shared ' as aux;
Starting with SQLite 3.7.13, if the database is created with a URI file name, the shared cache mode can be used on the memory database. For backwards compatibility, using the unmodified ": Memory:" Name to open the in-memory database is disabled by default when sharing the cache. Before SQLite 3.7.13, the shared cache on the memory database is always disabled, regardless of the memory database name used, the shared cache settings of the current system, and the query parameters or flags.
Activating a shared cache on a memory database allows two or more database connections on the same process to access the same memory. When the last connection is closed, the memory database is automatically deleted and the memory is reset.


3. Unlock Notifications
When multiple connections access the same database in shared cache mode, read and write locks (that is, shared and exclusive locks) on a single table are used to ensure that the concurrently executing transactions are isolated. If the connection cannot get to the desired lock, the Sqlite3_step () call returns sqlite_locked. If you cannot get a read lock on the Sqlite_master table for each associated database (although this is not a common case), the Sqlite3_prepare () or SQLITE3_PREPARE_V2 () call will also return sqlite_locked.
By using SQLite's sqlite3_unlock_notify () interface, we can let the Sqlite3_step () or SQLITE3_PREPARE_V2 () call block until the required lock is obtained, instead of returning sqlite_locked immediately. The following example shows the use of an unlock notification.


 
/* This example uses the pthreads API */
#include <pthread.h>
  
/*
** When registering an unlock notification, pass a pointer to this structure instance as an instance in the user context
*/
Typedef struct UnlockNotification UnlockNotification;
Struct UnlockNotification {
  Int fired; /* True after the unlock event occurs */
  Pthread_cond_t cond; /* condition variable to wait for */
  Pthread_mutex_t mutex; /* protects the mutex of this structure */
};
  
/*
** Unlock notification callback function
*/
Static void unlock_notify_cb(void **apArg, int nArg){
  Int i;
  For(i=0; i<nArg; i++){
    UnlockNotification *p = (UnlockNotification *)apArg[i];
    Pthread_mutex_lock(&p->mutex); /* Locks the critical section */
    P->fired = 1; /* triggers the unlock event, this variable can only be accessed mutually exclusive */
    Pthread_cond_signal(&p->cond);
    Pthread_mutex_unlock(&p->mutex);
  }
}
  
/*
** This function assumes that the SQLite API call (sqlite3_prepare_v2() or sqlite3_step()) returns SQLITE_LOCKED.
** The parameter is the associated database connection.
** This function calls sqlite3_unlock_notify() to register an unlock notification callback function and then blocks until
** The callback function is executed and returns SQLITE_OK. The caller should retry the failed operation.
** Or, if sqlite3_unlock_notify() indicates that blocking will cause a system deadlock, then this function is immediately
** Returns SQLITE_LOCKED. The caller should not retry the failed operation, but roll back the current transaction
*/
Static int wait_for_unlock_notify(sqlite3 *db){
  Int rc;
  UnlockNotification un;
  
  /* Initialize the UnlockNotification structure */
  Un.fired = 0;
  Pthread_mutex_init(&un.mutex, 0);
  Pthread_cond_init(&un.cond, 0);
  
  /* Register an unlock notification callback function */
  Rc = sqlite3_unlock_notify(db, unlock_notify_cb, (void *)&un);
  Assert( rc==SQLITE_LOCKED || rc==SQLITE_OK );
  
  The /* sqlite3_unlock_notify() call always returns SQLITE_LOCKED or SQLITE_OK.
  ** If SQLITE_LOCKED is returned, the system is deadlocked. This function needs to return SQLITE_LOCKED to the caller.
  ** The current transaction can be rolled back. Otherwise blocking until the unlock notification callback function is executed, then returning SQLITE_OK
  */
  If( rc==SQLITE_OK ){
    Pthread_mutex_lock(&un.mutex);
    If( !un.fired ){ /* If the unlock event does not occur, it blocks */
      Pthread_cond_wait(&un.cond, &un.mutex);
    }
    Pthread_mutex_unlock(&un.mutex);
  }
  
  /* Destroy mutex and condition variables */
  Pthread_cond_destroy(&un.cond);
  Pthread_mutex_destroy(&un.mutex);
  
  Return rc;
}
  
/*
** This function is a wrapper around the SQLite function sqlite3_step(), which works the same as sqlite3_step().
** But if the shared cache lock is not obtained, this function blocks to wait for the lock to be available.
** If this function returns SQLITE_LOCKED, the caller should roll back the current transaction and try again. Otherwise the system may be deadlocked
*/
Int sqlite3_blocking_step(sqlite3_stmt *pStmt){
  Int rc;
  While( SQLITE_LOCKED==(rc = sqlite3_step(pStmt)) ){
    Rc = wait_for_unlock_notify(sqlite3_db_handle(pStmt));
    If( rc!=SQLITE_OK ) break;
    Sqlite3_reset(pStmt);
  }
  Return rc;
}
  
/*
** This function is a wrapper around the SQLite function sqlite3_prepare_v2(), which works the same as sqlite3_prepare_v2().
** But if the shared cache lock is not obtained, this function blocks to wait for the lock to be available.
** If this function returns SQLITE_LOCKED, the caller should roll back the current transaction and try again. Otherwise the system may be deadlocked
*/
Int sqlite3_blocking_prepare_v2(
  Sqlite3 *db, /* database handle */
  Const char *zSql, /* UTF-8 encoded SQL statement */
  Int nSql, /* zSql bytes */
  Sqlite3_stmt **ppStmt, /* OUT: Pointer to the prepared statement */
  Const char **pz /* OUT: parsed string tail position */
){
  Int rc;
  While( SQLITE_LOCKED==(rc = sqlite3_prepare_v2(db, zSql, nSql, ppStmt, pz)) ){
    Rc = wait_for_unlock_notify(db);
    If( rc!=SQLITE_OK ) break;
  }
  Return rc;
}

If the Sqlite3_blocking_step () or SQLITE3_BLOCKING_PREPARE_V2 () function in the example returns sqlite_locked, then blocking will cause the system to deadlock.
You can use the sqlite3_unlock_notify () interface only if you define the preprocessing macro sqlite_enable_unlock_notify at compile time. The interface is designed to be used in such a system: Each database connection is assigned a separate thread. You cannot use this interface if you are running multiple database connections in one thread. The sqlite3_unlock_notify () interface works on only one thread at a time, so the lock control logic above only works on a single database connection for one thread.
In the above example, after Sqlite3_step () or SQLITE3_PREPARE_V2 () returns sqlite_locked, Sqlite3_unlock_notify () is invoked to register an unlock notification callback function. After a database connection holds a table-level lock, the unlock notification function is executed to prevent sqlite3_step () or SQLITE3_PREPARE_V2 () from subsequently completing the transaction and releasing all locks. For example, if Sqlite3_step () tries to read table x while some other connection y is holding a write lock for table X, Sqlite3_step () returns sqlite_locked. If you subsequently call Sqlite3_unlock_notify (), the unlock notification function is called after the transaction that connects Y ends. The unlock notification function is waiting for the connection (here Y), which is referred to as a "blocking connection".
If Sqlite3_step () tries to write a database, but returns sqlite_locked, there may be multiple processes holding the read lock of the current database table. In this case, SQLite randomly chooses one of the connections and performs an unlock notification function when the connected transaction completes. The unlock notification function executes from sqlite3_step () (or Sqlite3_close ()), which is associated with a blocking process. The Unlock notification function can call any of the sqlite3_xxx () functions, Cheng the signal to other waiting lines, or schedule some behavior to occur later.
The algorithm used by the Sqlite3_blocking_step () function is described as follows:
(1) Call Sqlite3_step () on the specified SQL statement object and return the value directly to the caller if it returns a value other than sqlite_locked. If you return sqlite_locked, continue.
(2) Call Sqlite3_unlock_notify () to register an unlock notification callback function. If Sqlite3_unlock_notify () returns sqlite_locked, indicating system deadlock, return this value to the caller for rollback. otherwise continue.
(3) block until the unlock notification function is executed by another thread.
(4) Call Sqlite3_reset () on the SQL statement object. Because the sqlite_locked error may only occur when the first call to Sqlite3_step () occurs (it is not possible to have Sqlite3_step () return Sqlite_row first and then return sqlite_locked the next time). The SQL statement object is then reset so that the query results are not affected. If you do not call Sqlite3_reset (), the next call to Sqlite3_step () returns Sqlite_misuse.
(5) Steering step (1).
The algorithm used by SQLITE3_BLOCKING_PREPARE_V2 () is similar, except that the 4th step (resetting the SQL statement object) is ignored.
For "Write hunger" phenomenon, SQLite can help the application to avoid the occurrence of hunger. When any attempt to acquire a write lock on a table fails (because there is a connection that holds a read lock), all attempts to start a new transaction on the shared cache will fail until one of the following conditions becomes true:
* The current write transaction is complete, or
* The number of read transactions opened on the shared cache is reduced to 0.
Failure to start a new read transaction will return sqlite_locked to the caller. If the caller then calls Sqlite3_unlock_notify () to register an unlock notification function, the blocking connection currently has a write transaction on the shared cache. This avoids writing hunger, because no new read lock can be opened. When all existing read locks are complete, the write operation will eventually have the opportunity to obtain the required write lock.
When Wait_for_unlock_notify () calls Sqlite3_unlock_notify (), it is possible that a blocking thread has completed its transaction so that the unlock notification function is called immediately before the Sqlite3_unlock_notify () return. The Unlock notification function may also be called by another thread, just after the sqlite3_unlock_notify () call, and before the thread begins to wait for an asynchronous signal. How this competition condition is handled depends on the thread and synchronization primitives used by the application. This example uses Pthread, which is the interface provided by modern UNIX-style systems (including Linux).
Pthread provides the pthread_cond_wait () function, which allows the caller to release a mutex at the same time and start waiting for an asynchronous signal. Using this function, a "fired" flag and a mutex, the race state can be eliminated, as follows:
When the unlock notification function is called, this can occur when the thread that calls Sqlite3_unlock_notify () starts to wait for an asynchronous signal before it does the following work:
(1) Gets the mutex amount.
(2) Set the "fired" flag to true.
(3) Cheng signal to the waiting line.
(4) Release the mutex amount.
When the wait_for_unlock_notify () thread begins to wait for the unlock notification function to arrive, it:
(1) Gets the mutex amount.
(2) Check whether the "fired" flag is set. If set, the UNLOCK notification function has been called, releasing the mutex directly and then continuing.
(3) If not set, atomically releases the mutex and begins to wait for the asynchronous signal. When the signal arrives, continue.
In this way, when Wait_for_unlock_notify () starts to block, the unlock notification function has no problem, whether it has been called or is being called.
The code in this example can be improved at least in the following two ways:
* Ability to manage thread priorities.
* Special situations that can handle sqlite_locked, which can occur when a table or index is deleted.
Although Sqlite3_unlock_notify () only allows callers to specify a single user context pointer, an unlock notification callback is passed to this array of context pointers. This is because when a blocking thread finishes its transaction, if more than one unlock notification is registered to invoke the same C function, the context pointer is arranged in an array. If each thread assigns a priority, the higher-priority thread will get a signal notification than the lower-priority thread, rather than notifying the thread in any order.
If a drop TABLE or drop INDEX command is executed, and there is one or more executing SELECT statements on the current database connection, sqlite_locked is returned. If Sqlite3_unlock_notify () is called, the specified callback function is immediately called. Re-attempting drop TABLE or drop INDEX will return another sqlite_locked error. In the above sqlite3_blocking_step () implementation, this results in a dead loop.
Callers can use the extended error code to differentiate this special "DROP table| INDEX "situation and other circumstances. When it calls Sqlite3_unlock_notify () normally, the extension error code is sqlite_locked_sharedcache. In the DROP table| INDEX "In the case, is the ordinary sqlite_locked. Another workaround is to limit the number of times a single query is retried (such as 100). While this can lead to less efficiency, the situation we are discussing here is not always happening.


SQLite Anatomy of asynchronous IO mode, shared cache mode, and lock notification


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.