Preface: This section is the subsequent part of the previous section. These two sections are the design and functions of each module of the SQLite kernel from a global perspective. Only by grasping SQLite globally can we better understand the implementation of SQLite. SQLite adopts a hierarchical and modular design, which makes it highly scalable and portable. In addition, the structure of SQLite is not very different from that of general DBMS, so it is of great significance to understand general DBMS. Now let's start to discuss the remaining two parts of SQLite: Back-end and compiler ).
2. B-tree and Pager
B-Tree enables VDBE to query, insert, and delete data under O (logN), and traverse the result set in two directions under O (1. B-Tree does not directly read or write data to the disk. It only maintains the relationship between pages. It calls Pager when the B-TREE needs a page or modifies the page. When modifying a page, pager ensures that the original page is first written to the log file. When it completes the write operation, pager determines how to do it based on the transaction status. B-tree does not directly read and write files, but uses the page cache buffer module to read and write files. (Note: This is similar to reading and writing files in the operating system. In Linux, the upper-layer module of the operating system does not directly call the device driver to read and write the device, but uses a high-speed buffer module to call the device driver to read and write files and save the results to the high-speed buffer zone ).
2.1. Database File Format (Database File Format)
All pages in the database are marked in the order starting from 1. A database consists of many B-trees. Each table and index has a B-tree (Note: The index uses B-tree, while the table uses B + tree, this is mainly determined by the different requirements of tables and indexes and the different structures of B-tree and B + tree: All leaf nodes of B + tree contain all keyword information, there are also two types of sequential searches-For details, see data structure, Yan Weimin. B-tree is more suitable for indexing ). The root pages of all tables and indexes are stored in the sqlite_master table.
The first page (page 1) in the database is a bit special. The first 100 bytes of page 1 contain a special file header that describes the database file. It includes the database version, mode version, page size, encoding, and other parameters set during database creation. The content of this special file header is defined in btree. c. page 1 is also the root page of The sqlite_master table.
2.1 Page Reuse and Vacuum)
SQLite uses a free list to recycle pages. When all records on a page are deleted, they are inserted to this list. When the VACUUM command is run, the free list is cleared, so the database is reduced. In essence, it creates a new database in a new file, and all the pages used are copied, but the free list does not. The result is a new and smaller database. When autovacuum of the database is enabled, SQLite does not use free list, and the database is automatically compressed at each commit.
2.2 B-Tree records
A page in B-tree consists of B-tree records, also known as payloads. Each B-tree record or payload has two fields: key field and data field ). Key field is the value of ROWID or the value of the Key word in the database. From the B-tree perspective, data field can be any unstructured data. The database records are stored in these data fields. The task of B-tree is sorting and traversing, and it needs to be a keyword most. The size of Payloads is variable, which is related to internal keywords and data domains. When a payload is too large, a page cannot be saved to multiple pages.
B + Tree is sorted by keywords. All keywords must be unique. The table uses B + tree, and the internal page does not contain data, as shown below:
In B + tree, both root pages and internal pages are used for navigation. The data fields of these pages are pointers to lower-level pages and only contain keywords. All database records are stored on the leaf pages. At the leaf node level, records and pages are in the order of keywords. Therefore, B-tree can be traversed horizontally. the time complexity is O (1 ).
2.3 Records and domains (Records and Fields)
Records of data domains on the leaf node page are managed by VDBE. database records are stored in binary format, but there is a certain data format. The record format includes a logical header and a data segment. The header segment includes the header size and an array of data types, the data type used in data segment is as follows:
2.4 Hierarchical Data Organization)
From top to bottom, the data becomes more and more disordered. from bottom to top, the data becomes more and more structured.
2.5 B-Tree API
The B-Tree module has its own APIs, which can be used independently of C APIs. Another feature is that it supports transactions. Transactions, locks, and logs processed by pager are all B-tree services. The functions can be divided into the following categories:
2.5.1. Access and transaction Functions
Sqlite3BtreeOpen: Opens a new database file. Returns a B-tree object.
Sqlite3BtreeClose: Closes a database.
Sqlite3BtreeBeginTrans: Starts a new transaction.
Sqlite3BtreeCommit: Commits the current transaction.
Sqlite3BtreeRollback: Rolls back the current transaction.
Sqlite3BtreeBeginStmt: Starts a statement transaction.
Sqlite3BtreeCommitStmt: Commits a statement transaction.
Sqlite3BtreeRollbackStmt: Rolls back a statement transaction.
2.5.2. Table Functions
Sqlite3BtreeCreateTable: Creates a new, empty B-tree in a database file.
Sqlite3BtreeDropTable: Destroys a B-tree in a database file.
Sqlite3BtreeClearTable: Removes all data from a B-tree, but keeps the B-tree intact.
2.5.3 Cursor Functions)
Sqlite3BtreeCursor: Creates a new cursor pointing to a particle B-tree.
Sqlite3BtreeCloseCursor: Closes the B-tree cursor.
Sqlite3BtreeFirst: Moves the cursor to the first element in a B-tree.
Sqlite3BtreeLast: Moves the cursor to the last element in a B-tree.
Sqlite3BtreeNext: Moves the cursor to the next element after the one it is currently
Pointing.
Sqlite3BtreePrevious: Moves the cursor to the previous element before the one it is
Currently pointing.
Sqlite3BtreeMoveto: Moves the cursor to an element that matches the key value passed in as a parameter.
2.5.4. Record Functions)
Sqlite3BtreeDelete: Deletes the record that the cursor is pointing.
Sqlite3BtreeInsert: Inserts a new element in the appropriate place of the B-tree.
Sqlite3BtreeKeySize: Returns the number of bytes in the key of the record that
Cursor is pointing.
Sqlite3BtreeKey: Returns the key of the record the cursor is currently pointing.
Sqlite3BtreeDataSize: Returns the number of bytes in the data record that the cursor is
Currently pointing.
Sqlite3BtreeData: Returns the data in the record the cursor is currently pointing.
2.5.5. Configuration Functions)
Sqlite3BtreeSetCacheSize: Controls the page cache size as well as the synchronous
Writes (as defined in the synchronous pragma ).
Sqlite3BtreeSetSafetyLevel: Changes the way data is synced to disk in order to increase
Or decrease how well the database resists damage due to OS crashes and power failures.
Level 1 is the same as asynchronous (no syncs () occur and there is a high probability
Damage). This is the equivalent to pragma synchronous = OFF. Level 2 is the default. There
Is a very low but non-zero probability of damage. This is the equivalent to pragma
Synchronous = NORMAL. Level 3 when CES the probability of damage to near zero but with
Write performance ction. This is the equivalent to pragma synchronous = FULL.
Sqlite3BtreeSetPageSize: Sets the database page size.
Sqlite3BtreeGetPageSize: Returns the database page size.
Sqlite3BtreeSetAutoVacuum: Sets the autovacuum property of the database.
Sqlite3BtreeGetAutoVacuum: Returns whether the database uses autovacuum.
Sqlite3BtreeSetBusyHandler: Sets the busy handler
2.6 instance analysis
Finally, we will end the discussion in this section with the specific implementation of sqlite3_open (see source code of Version 3.6.10 ):
It can be seen that all IO operations of SQLite are eventually converted into system calls of the operating system (one name: DBMS is built on a painful OS ). At the same time, we can also see that the implementation of SQLite is very hierarchical and modular, making SQLite more scalable and highly portable.
3. Compiler (Compiler)
3.1 Tokenizer (Tokenizer)
The interface passes the SQL statement to be executed to Tokenizer. Tokenizer splits words one by one according to the syntax definition of the SQL statement and passes them to the analyzer (Parser) for syntax analysis. The word divider is manually written and mainly implemented in Tokenizer. c.
3.2. analyzer (Parser)
SQLite syntax analyzer uses Lemon, an open-source LALR (1) syntax analyzer generator, and the generated file is parser. c.
A simple syntax tree:
SELECT rowid, name, season FROM episodes WHERE rowid = 1 LIMIT 1
3.3 Code Generator)
The code generator is the largest and most complex part of SQLite. It is closely related to Parser and generates the SQL statement execution function of the VDBE program based on the syntax analysis tree. It consists of multiple files: select. c, update. c, insert. c, delete. c, trigger. c, where. c, and so on. These files generate VDBE program commands. For example, SELECT statements are generated by select. c. The following is an implementation of code for opening a table in a read operation:
/* Generate code that will open a table for reading.
*/
Void sqlite3OpenTableForReading (
Vdbe * v,/* Generate code into this VDBE */
Int iCur,/* The cursor number of the table */
Table * pTab/* The table to be opened */
){
Sqlite3VdbeAddOp (v, OP_Integer, pTab-> iDb, 0 );
Sqlite3VdbeAddOp (v, OP_OpenRead, iCur, pTab-> tnum );
VdbeComment (v, "# % s", pTab-> zName ));
Sqlite3VdbeAddOp (v, OP_SetNumColumns, iCur, pTab-> nCol );
}
The Sqlite3vdbeAddOp function has three parameters: (1) VDBE instance (which will add commands), (2) operation code (one command), and (3) two operands.
3.4 Query Optimization
The code generator is not only responsible for generating code, but also for query optimization. The main implementation is in where. c. The generated WHERE statement blocks are usually shared by other modules, such as select. c, update. c, and delete. c. These modules call sqlite3WhereBegin () to generate the WHERE statement block command, add their own VDBE code to return the code, and finally call sqlite3WhereEnd () to generate the command, as shown below: