It is also a bit of programming memo. SQLite is a good embedded database. This embedded database is basically synonymous with exquisite and lightweight. SQLite references many scenarios, and Firefox and Android have internal applications, the code for this small database is not much, but its functions are quite complete. I will not describe how to use it here. If you want to know how to use it, please refer to the database material, and this URL example. This small database not only supports C, but also supports many other languages, basically top Programming
10 languages are supported.
In this document, I want to briefly summarize some technical details. There are other aspects, such as data type and database format. Foreign keys, insert conflict processing, view performance, memory database usage. I will try again later.
Concurrency and lock
Programmers are generally very concerned about efficiency and are very concerned about concurrency and lock issues when querying databases. Whether it is a row-Level Lock or a table-Level Lock. SQLite is a table-level read/write lock. In other words, it cannot be queried During writing. Cannot be written during query. Here I will explain the implementation of windows. SQLite uses key areas and Signal Simulation internally. If it is in windows, the efficiency of key areas is very high.
We know that SQLite is a single-file database. In actual application, multiple processes Access one database file at the same time. Unfortunately, in this case, SQLite uses the method of locking the entire file. Of course, it actually uses partial locks. For example, this table occupies a section of the database file, then SQLite only locks this segment. But even in this way, we still cannot meet our needs in highly concurrent scenarios. At this time, we should consider whether to adopt other alternative solutions. Even at this time, it is feasible to adopt data files that we write ourselves.
Of course, the above scenario has a very important premise, that is, you need to figure out whether you must use the SQLite database function in your SQLite scenario. If you only need to query data, you do not need to delete or modify the data. Therefore, you do not need to introduce SQLite.
For the scenario where multiple processes read and write database files at the same time, SQLite operations, such as SQL statement execution may not succeed every time, may return sqlite_busy, indicating that the current database is busy. To return this value, we should write a simple loop like the io_pending processing method. Until sqlite_ OK is returned.
In the latest SQLite implementation, the sqlite3_busy_handler function is provided. When the database is not busy, the callback function is called. Therefore, we do not need to write the loop, which is a good Optimization for performance. See http://sqlite.org/c3ref/busy_handler.html for details
With regard to multithreading, we can say that since 3.5, SQLite can achieve multi-threaded sharing of a database connection cache, which can effectively control the amount of memory. See http://www.sqlite.org/sharedcache.html
Data Consistency and insertion Performance
An important attribute of a database is to ensure data consistency. SQLite has done a lot of work on this, which can ensure that the data will not be modified successfully or fail when the power is down. In general programming, we can basically ignore this factor. Specific details visible http://sqlite.org/atomiccommit.html
However, in some scenarios, Database Consistency and insertion performance are often a conflict. To ensure consistency, SQLite will first form a data modification log file each time it modifies the database, and then merge the file into the master database file. In this scenario, it has a great impact on the disk I/O performance. We know that the CPU is more than an order of magnitude faster than the disk I/O. in actual application, you will find that insertion is often a bottleneck for SQLite database operations.
In fact, this performance bottleneck is also well solved. Based on the operating system attributes, if you want to improve the Write Performance of Disk Files (including hard disks and SSDs), it is best to write data in batches. It does not involve the underlying operating system. In the simplest SQL statement, if any modification is made, a transaction is committed, and then the transaction is committed after the modification is completed. In this case, SQLite adopts the block write policy in batches. In practical applications, if a transaction is not started, a single modification will be committed, and batch modification and final commit will take an order of magnitude slower.
Therefore, in actual applications, any modifications, including table modification, must be handled in the way of transactions. In this way, both performance and consistency are the correct practices. To put it another way, SQLite faces this place. As long as you have to quit and die, it can also be buried. You can close the transaction through Pragma. Of course, no one will do this. If you really want to have requirements on the Performance of log writing, you can set the log deletion method in trancate mode. By default, delete is used. trancate only clears the file to 0 and does not modify the directories of jourel files. This improves the performance.
In multi-process scenarios, we recommend that you use Wal logs to obtain better multi-threaded read/write concurrency. However, you must note that the database file in the wal format must end with one end, A database in Wal format can only be opened in Wal mode. He has poor support for databases earlier than MySQL 3.7, but can be used in new projects.
Here we will share with you a tip. after starting the transaction, your actual operations are performed in the memory and will not be synchronized to the hard disk file for the time being. You may be worried that if I submit inserts, will it occupy a lot of memory? Actually not, because SQLite regularly synchronizes your operations to database files. You don't have to worry about this. This operation is a standard checkpoint operation in the database, A checkpoint is generated when 1000 pages are reached. After the checkpoint is submitted, it is automatically destroyed.
When it comes to transaction commit, there must be a discipline to use SQLite, that is, rollback is required if the commit fails. If you do not roll back, it will have a negative impact on the generated log file or checkpoint, which may cause the next attempt to open the database file to fail. In terms of the Logic Semantics of the program, you should also be responsible for what you do, and try not to let other logic or code wipe your ass.
Try-Catch cannot solve the problem and can only block the problem.
SQLite query performance
Speaking of databases, we must describe B-tree. Most of the index structures in SQLite use B-tree (of course, there is an R-tree in the new version. For spatial databases, we will not discuss it for the moment .). The implementation and algorithms of B-tree are not discussed here. You can query the related information by yourself. I just want to explain that you don't have to worry about this place. If you don't have to, you can basically ignore this place. I will repost an evaluation result here.
Http://blog.csdn.net/mynicedream/article/details/2252398
Yes. Comparison with Berkeley DB
|
Berkeley DB |
SQLite |
Time consumed to insert 10000 records |
0.08 seconds |
0.42 seconds |
Time consumed to insert 100000 records |
2.31 seconds |
3.81 seconds |
Time consumed to insert 7200000 records |
1024.34 seconds |
249 seconds |
Time consumed to insert 57600000 records |
12860.78 seconds |
2155.14 seconds |
Time consumed to insert 172800000 records |
48039.64 seconds |
6352.06 seconds |
10000 records query 1 time consumed |
Less than 0.01 seconds |
Less than 0.01 seconds |
100000 records query 1 time consumed |
Less than 0.01 seconds |
Less than 0.01 seconds |
7200000 records query 1 time consumed |
Less than 0.01 seconds |
Less than 0.01 seconds |
57600000 records query 1 time consumed |
0.03 seconds |
0.16 seconds |
172800000 records query 1 time consumed |
0.03 seconds |
0.09 seconds |
Database size of 10000 records |
0.628 m |
0.527 m |
Database size of 100000 records |
5.29 m |
5.32 m |
Database size of 7200000 records |
516 m |
405 m |
Database size of 57600000 records |
3087.13 m |
3925.8 m |
Database size of 172800000 records |
11890.7 m |
10621.2 m |
This evaluation database table is also very simple. It is used to create an index for only one field in a table. Therefore, if you find that your query performance is slow, please do not doubt the implementation of SQLite easily. Please look at the SQL statements you write and the table structure. Generally, you can find the problem.
SQL query optimization is a big topic. I don't want to discuss it here. Please refer
Http://www.sqlite.org/optoverview.html
SQLite Architecture
As mentioned above, SQLite is small and dirty. In fact, we can say this. When I first came into contact with SQLite, I was amazed at his architectural design. Such a small embedded database provides comprehensive modules and functions for large databases. Previous truth
Here, we need to explain that this is his virtual machine. The SQLite virtual machine has its own virtual bytecode. Adopting this structure can be said to be of great benefit for debugging, tracking, and optimization of SQL statements. This is the premise of using SQLite tips as described below.
For more architecture details, see http://sqlite.org/arch.html
Tips for using binding to insert or modify data
We know that SQLite is a virtual SQL Execution engine, and it also has the SQL compilation process. The compiled bytecode is also the final command executed by the virtual machine. If:
Char Buf [128];
Sprintf (BUF, "insert into EMP (empname) values ('empname % 06d');", I + 1 );
Db.exe cdml (BUF );
This method is used to execute the insert operation, so one SQL statement must be re-compiled every time. This causes performance loss when a large amount of data is inserted. Therefore, it is best to use the bind method for insert. At this time, you only need to compile an SQL statement once, And the execution efficiency will be improved.
One point here is that SQL compilation is safe with multiple threads. If you modify or query a database connection, the internal implementation of SQLite is to first enter the key zone. There is no need to worry about this, but for scenarios with high concurrency requirements, pre-compilation is very cost-effective.
Encrypt database files
In practical applications, we need to encrypt database files. There is nothing to say about this. We can adopt two solutions. One is to provide a sqllite_key and sqlite_rekey function by modifying the code for database encryption. I personally do not recommend this solution because you need to modify the database code.
I personally recommend that you provide a VFS File System (Virtual File System) and insert it into SQLite. When SQLite actually writes a file, it will call the VFS registered by the system for real insert operations. In fact, we only need to put the encryption and decryption functions into the VFS read and write operations. This is also the classic layer mode. For more information, see the relevant documentation http://www.sqlite.org/vfs.html. this is a standard interface that can fully meet our encryption needs.
Because it is a layer mode, you can freely configure the order of layers, and different layer structures can be combined. Relatively flexible.
Asynchronous file writing
In some scenarios, such as logging to the SQLite database, one or more threads generate logs, and one thread is responsible for inserting data into the database. In previous versions, we generally used to manually write a queue, and then write the database thread to constantly retrieve data from the queue. This queue is often implemented by ourselves.
Now the latest SQLite provides an asynchronous write method, which is actually quite simple. It is a VFS system. To use this function, you must call a sqlite3async_initialize function before opening the database file, at the same time, you also need to provide a thread to call the sqlite3async_run function to truly implement the read/write function. Specific can see http://www.sqlite.org/asyncvfs.html
Someone may ask, is this different from the queue I implemented myself? All are queues and are finally written. In fact, there is a big difference.
1. When you write a program, you don't have to worry about multithreading. It's okay to read and write the database normally.
2. This VFS is at the bottom of the entire architecture. In fact, writing the file page and writing that part are all determined by the database engine. If you adopt a self-implemented queue scheme, each time you write data, you must completely synchronize the data to the hard disk, and then execute the next step. The optimization function is not used.
3. In the SELECT statement, it is actually read from the write queue of the disk and asynchronous files. In this case, hard disk operations are not involved. For databases that are frequently inserted, disk I/O operations can be mitigated.
Where are the advantages?
1. There is a loss of consistency. If a power failure occurs, data in the write queue will be lost.
2. When multiple processes access the database, if the write queue is large, the query delay may occur for other read database processes and the modification cannot be responded in real time.
Of course, there is no free lunch in the world, and the specific requirements determine the specific design. Be sure to pay attention when using it.
Important Pragma
I don't want to talk about this. This is an SQL statement, basically the overall control switch of SQLite. For details, see the document
Http://www.sqlite.org/pragma.html#syntax
You can do it with him
L control memory usage
L control the string encoding method
L connection cache size
L file synchronization mode
L and so on
Notes for database Compression
When the database file is large, there must be fragments in the middle (maybe ~). We use the vacuum statement to compress the data. Note that the vacuum command may change the integer primary key value. Therefore, when writing a program, there cannot be any assumptions about integer primary key, nor do you think it must be added.