Sqlite3 multi-threaded and lock, optimized insertion speed and performance optimization

Last Update:2016-04-12 Source: Internet

Author: User

Tags sqlite sqlite database

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, do you support multithreading? "Is SQLite threadsafe?" on the SQLite official website. This question and answer. Simply put, starting with version 3.3.1, it's thread-safe. The version of SQLite in iOS is not below this version, of course, you can also compile the latest version yourself.
However, this thread safety is still limited, in this "is SQLite Thread-safe?" has a detailed explanation.
Another important document is "SQLite and multiple Threads". It indicates that SQLite supports 3 threading modes:

Single thread: Disables all mutex locks, and errors occur when used concurrently. This is enabled when SQLite compiles with the sqlite_threadsafe=0 parameter, or when Sqlite3_config (Sqlite_config_singlethread) is called before initializing SQLite.
Multithreading: As long as a database connection is not used by multiple threads at the same time is secure. In the source code is enabled Bcoremutex, disable Bfullmutex. is essentially disabling the lock on the database connection and prepared statement (prepared statement), so you cannot concurrently use the same database connection or prepared statement in multiple threads. It is enabled by default when SQLite compiles with the sqlite_threadsafe=2 parameter. If Sqlite_threadsafe is not 0, you can call Sqlite3_config (Sqlite_config_multithread) before initializing SQLITE, or when creating a database connection, set the Sqlite_open_ Nomutex flag.
Serial: Enables all locks, including Bcoremutex and Bfullmutex. Because both the database connection and the prepared statement are locked, multi-threaded use of these objects is not concurrent and becomes serial. It is enabled by default when SQLite compiles with the sqlite_threadsafe=1 parameter. If Sqlite_threadsafe is not 0, you can call Sqlite3_config (sqlite_config_serialized) before initializing SQLITE, or when creating a database connection, set the Sqlite_open_ Fullmutex flag.

The initialization referred to here refers to the invocation of the Sqlite3_initialize () function, which is called automatically when calling Sqlite3_open (), and only the first call is valid.
Another explanation is prepared statement, which is managed by a database connection (pager), which can also be used as a connection to the database. Therefore, in multithreaded mode, concurrent calls to the same database connection call SQLITE3_PREPARE_V2 () to create prepared statement, or any prepared statement concurrent calls to the same database connection sqlite3_ Functions such as bind_* () and Sqlite3_step () will be faulted (on iOS, the thread will appear exc_bad_access and abort). This error has nothing to do with read-write. The security usage rules given in the documentation are: No transactions are waiting to be executed, and all prepared statement are finalized.
Incidentally, calling Sqlite3_threadsafe () can get the Sqlite_threadsafe parameter for the compile period.　　The standard release is 1, which is the serial mode, while iOS is 2, which is multithreaded mode, and the Python sqlite3 module also uses the serial mode by default and can be configured with Sqlite3.threadsafety. However, by default, a thread can only use a database connection that is open by the current thread unless the Check_same_thread=false parameter is set at the time of the connection. If you are using a different database connection, each connection cannot read uncommitted data from other connections unless you use read-uncommitted mode.

Now there are 3 modes of understanding, it is clear that SQLite is not a multi-threaded inability, then the next thing to understand the next. second, the business
The database can only be changed in a transaction. All commands that change the database (all SQL commands except Select) automatically open a new transaction and are automatically committed when the last query completes.
The BEGIN command can start the transaction manually and turn off autocommit. When the next commit command executes, autocommit is opened again, and changes made in the transaction are written to the database. When commit fails, autocommit remains closed to allow the user to try to commit again. If you execute the rollback command, automatic commits are turned on, but changes in the transaction are not saved. The transaction is also automatically rolled back when the database is closed or an error is encountered.
Often people complain that sqlite is too slow to insert, in fact it can be inserted tens of thousands of times per second, but only dozens of transactions per second.　　So when you insert large amounts of data, you can speed up by disabling autocommit. There is also an important point to emphasize: transactions are related to database connections, and each database connection (using pager) maintains its own transactions, and there can be only one transaction (but can use savepoint to implement inline transactions). In other words, a transaction is not thread-independent, a line thread can use multiple database connections to complete multiple transactions simultaneously, while multiple threads can use a single database connection to complete a transaction concurrently (not concurrently). To implement a transaction, you have to use a lock.
A SQLite database file has 5 types of lock status:

Unlocked: Indicates that the database is not read/write at this time.
SHARED: Indicates that the database can be read. A shared lock can be owned by multiple threads at the same time. Once a thread holds a shared lock, no thread can write.
RESERVED: Indicates preparation for writing to the database. Reserved locks can only be owned by one thread at most, and then it enters the pending state.
PENDING: Indicates that the database is about to be written, waiting for other read threads to release the shared lock. Once a thread holds a pending lock, other threads cannot acquire a shared lock. This allows the shared lock to enter the exclusive state as long as all the read threads are finished.
EXCLUSIVE: Indicates that it can write to the database. When this state is entered, no other thread can access the database file. Therefore, for concurrency, the shorter the holding time, the better.

A thread can acquire a higher level of lock only if it has a low level of lock. SQLite relies on these 5 types of locks to skillfully implement the mutual exclusion of read and write threads. It can also be seen that the write operation must enter the exclusive state, at this time the number of concurrent is reduced to 1, which is the reason that SQLite is considered to be bad for concurrent insert performance.
In addition, the read-uncommitted and Wal modes affect the mechanism of this lock. In these 2 modes, the read thread is not blocked by the write thread, even if the write thread holds the pending or exclusive lock.

Mention of the lock will have to talk about the deadlock problem, and SQLite may also appear deadlock.
Here's an example:

Connection 1:begin (unlocked)
Connect 1:select ... (SHARED)
Connect 1:insert ... (RESERVED)
Connection 2:begin (Unlocked)
Connect 2:select ... (SHARED)
Connect 1:commit (PENDING, try to get exclusive lock, but there is a shared lock not released, return sqlite_busy)
Connect 2:insert ... (Attempt to get reserved lock, but pending lock is not released, return sqlite_busy)

Now 2 connections are waiting for the other person to release the lock, so it's dead locked. Of course, the reality is not that bad, either side chooses not to continue to wait, rolls back the transaction to be OK.

But to better solve this problem, you have to get to know the business more deeply.
In fact, the BEGIN statement can have 3 starting states:

DEFERRED: Default value, no locks are acquired when starting a transaction. Acquires a shared lock for the first read operation and acquires a reserved lock for the first write operation.
IMMEDIATE: Gets the reserved lock when the transaction is started.
EXCLUSIVE: Gets the EXCLUSIVE lock when the transaction is started.

Now consider that 2 transactions begin with the immediate method:

Connection 1:begin IMMEDIATE (RESERVED)
Connect 1:select ... (RESERVED)
Connect 1:insert ... (RESERVED)
Connect 2:begin IMMEDIATE (attempt to get reserved lock, but an existing reserved lock is not released, so the transaction begins to fail, returns SQLITE_BUSY, waits for the user to retry)
Connection 1:commit (EXCLUSIVE, release after write is complete)
Connection 2:begin IMMEDIATE (RESERVED)
Connect 2:select ... (RESERVED)
Connect 2:insert ... (RESERVED)
Connection 2:commit (EXCLUSIVE, release after write is complete)

So the deadlock is avoided.

The exclusive approach is even more stringent, and does not deadlock if other connections open the transaction in deferred mode:

Connection 1:begin EXCLUSIVE (EXCLUSIVE)
Connect 1:select ... (EXCLUSIVE)
Connect 1:insert ... (EXCLUSIVE)
Connection 2:begin (Unlocked)
Connect 2:select ... (an attempt was made to acquire a shared lock, but an existing exclusive lock was not released, returning Sqlite_busy, waiting for the user to retry)
Connection 1:commit (EXCLUSIVE, release after write is complete)
Connect 2:select ... (SHARED)
Connect 2:insert ... (RESERVED)
Connection 2:commit (EXCLUSIVE, release after write is complete)

However, in the case of high concurrency, it is more difficult to get the exclusive lock directly, and in order to avoid the exclusive state blocking other requests for a long time, the best way is to let all the write transactions begin in immediate way.
Incidentally, you can use the Sqlite3_busy_timeout () or Sqlite3_busy_handler () function to implement retries.

This shows that in order to ensure thread safety, there are 4 ways of doing this:

SQLite uses single-threaded mode to access the database with a dedicated thread.
SQLite uses single-threaded mode to access the database with a thread queue, which allows only one thread to execute at a time, and the threads in the queue share a database connection.
SQLite uses multithreaded mode, and each thread creates its own database connection.
SQLite uses serial mode, and all threads share a global database connection.

third, sqlite3 insertion speed is slow 1. Add a transaction to multiple inserts as shown aboveSQLite uses transactional operations for each insert without explicitly using transactions, and the SQLite database is in the form of a file, which is equivalent to opening a file every time it is accessed, and if the data is heavily manipulated, time is spent on I/O operations, so it is slow. The workaround is to explicitly commit with a transaction: Because after we start the transaction, the statements of the large number of operations are kept in memory, and when the commit is all written to the database, the database file is only opened once. 2. If you add a transaction or not, you can try to modify the synchronization modeWhen inserting data at first Sqlite3, it takes about 100ms to insert each piece of data. If it is a bulk import, you can introduce transactions to increase speed. But assuming that your business is inserting several pieces of data every few seconds, it is obvious that 100ms is not allowed.
The workaround is to add the following line of code after calling the Sqlite3_open function: sqlite3_exec (db, "PRAGMA synchronous = OFF;", 0,0,0); The above solution seems to be a palliative, why add the above line of code, the speed will increase so much? Disk Synchronization1. How to set: PRAGMA synchronous = full; (2) PRAGMA synchronous = NORMAL; (1) PRAGMA synchronous = OFF; (0)
2. Parameter meaning: When synchronous is set to full (2), the SQLite database engine pauses in an emergency to determine that the data has been written to disk. This causes the system to crash or power supply problems to ensure that the database is not damaged when it is re-started. The full synchronous is safe but very slow.
When synchronous is set to normal (1), the SQLite database engine pauses for most emergencies, but not as frequently as in full mode. In normal mode, there is a small chance (but not nonexistent) that a power failure causes a database corruption. But in fact, in this case it is likely that your hard drive is no longer available, or another unrecoverable hardware error has occurred.
When set to synchronous OFF (0), SQLite continues without pausing after passing data to the system. If the application running SQLite crashes, the data is not compromised, but the database may become corrupted if the system crashes or the data is written to an unexpected power outage. On the other hand, some operations may be 50 times times faster or more synchronous off. In SQLite 2, the default value is normal. And in 3, modify to full. Www.2cto.com
3. Recommendation: If there is a mechanism for regular backup, and a small amount of data loss is acceptable, use off. Notice the red bold lettering above. Summary: If your data on security integrity and other requirements are not too high, you can use the method set to 0, after all, just "database may be damaged", as to the probability of damage to how big, I also do not know ... I haven't seen any damage yet, and I don't know when it will happen. Four, performance optimization

Many people use it directly, and do not notice that SQLite also has configuration parameters that can be adjusted for performance. Sometimes, the results can have a big impact.
Mainly through the pragma instruction to achieve.
For example: Space release, disk synchronization, cache size, and so on.
Do not open the auto_vacuum, vacuum efficiency is very low!

1 auto_vacuum
PRAGMA auto_vacuum;
PRAGMA auto_vacuum = 0 | 1;
Queries or sets the auto-vacuum tag for the database.
Normally, when you commit a transaction that deletes data from the database, the database file does not change size. Unused file pages are flagged and reused in later additions. In this case, use the vacuum command to release the resulting space.
When Auto-vacuum is turned on, the database file shrinks automatically when committing a transaction that deletes data from the database (the vacuum command does not work in the Auto-vacuum Open database). The database stores some information internally to support this feature, which makes the database file a little larger than when the option is not turned on.
You can change the auto-vacuum tag only if no tables are built in the database. Attempts to modify an existing table will not result in an error.

2 Cache_size
Proposed change to 8000
PRAGMA cache_size;
PRAGMA cache_size = number-of-pages;
Query or modify the number of database file pages that SQLite stores in memory at a time. Each page uses about 1.5K of memory, and the default cache size is 2000. If you need to use an update or delete command that changes a large number of rows, and you don't mind sqlite using more memory, you can increase the cache to improve performance.
When using cache_size pragma to change the cache size, the change is only valid for the current conversation, and the cache size reverts to the default size when the database is turned off and reopened. To permanently change the cache size, use default_cache_size pragma.

3 Case_sensitive_like
Open it. Otherwise the text string in the search will go wrong.
PRAGMA Case_sensitive_like;
PRAGMA case_sensitive_like = 0 | 1;
The default behavior of the LIKE operator is to ignore the case of the latin1 character. So the value of ' a ' like ' a ' is true by default. You can change this default behavior by opening case_sensitive_like pragma. When Case_sensitive_like is enabled, ' A ' like ' a ' is false and ' a ' like ' a ' is still true.

4 count_changes
Open it. Easy Commissioning
PRAGMA count_changes;
PRAGMA count_changes = 0 | 1;
Query or change the count-changes tag. Normally, insert, UPDATE, and DELETE statements do not return data. When Count-changes is turned on, the above statement returns a row of data with an integer value-the number of rows that the statement inserts, modifies, or deletes. The number of rows returned does not include the number of rows changed, such as inserts, modifications, or deletions that are generated by the trigger.

5 Page_size
PRAGMA page_size;
PRAGMA page_size = bytes;
Queries or sets the Page-size value. You can set page-size only if the database is not created. The page size must be an integer multiple of 2 and greater than or equal to 512 less than or equal to 8192. The upper limit can be changed by modifying the value of the macro definition sqlite_max_page_size at compile time. The upper limit is 32768.

6 Synchronous
If there is a mechanism for regular backup, and a small amount of data loss is acceptable, use off
PRAGMA synchronous;
PRAGMA synchronous = Full; (2)
PRAGMA synchronous = NORMAL; (1)
PRAGMA synchronous = OFF; (0)
Query or change the settings for the "synchronous" tag. The first form (query) returns an integer value. When synchronous is set to full (2), the SQLite database engine pauses during an emergency to determine that the data has been written to disk. This causes the system to crash or power supply problems to ensure that the database is not damaged when it is re-started. The full synchronous is safe but very slow. When synchronous is set to normal, the SQLite database engine pauses for most emergencies, but not as frequently as in full mode. In normal mode, there is a small chance (but not nonexistent) that a power failure causes a database corruption. But in fact, in this case it is likely that your hard drive is no longer available, or another unrecoverable hardware error has occurred. When set to synchronous OFF (0), SQLite continues without pausing after passing data to the system. If the application running SQLite crashes, the data is not compromised, but the database may become corrupted if the system crashes or the data is written to an unexpected power outage. On the other hand, some operations may be 50 times times faster or more synchronous off.
in SQLite 2, the default value is normal. And in 3, modify to full.

7 Temp_store
Use 2, Memory mode.
PRAGMA Temp_store;
PRAGMA Temp_store = DEFAULT; (0)
PRAGMA Temp_store = FILE; (1)
PRAGMA Temp_store = MEMORY; (2)
Query or change the settings for the "Temp_store" parameter. When Temp_store is set to default (0), a compile-time c preprocessing macro Temp_store is used to define the location where temporary tables and temporary indexes are stored. When set to memory (2) The temp table and index are stored in RAM. When set to file (1), it is stored in files. The Temp_store_directorypragma can be used to specify the directory where the file resides. When changing the Temp_store settings, all existing temporary tables, indexes, triggers, and views will be deleted immediately.
After testing, in the Class BBS application, through the above adjustment, the efficiency can increase more than twice times.

Sqlite3 multi-threaded and lock, optimized insertion speed and performance optimization

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More