This article is based on http://sqlite.org/arch.html.
This document describes the architecture of the SQLite library. This is useful for those who want to understand and modify the internal structure of SQLite. Before exploring, we first download the source code package sqlite-src-3071400.zip, which contains all source code files under its src directory. To compile the sqlitedatabase, download a single file named sqlite-amalgamation-3071400.zip.
Is an architecture diagram that shows the main components of SQLite and how each component is correlated. Next, we will briefly introduce each component. Note that SQLite 3.0 is described here. It is similar to version 2.8 and earlier versions, but there are differences in some details.
Figure 1 SQLite Architecture
Internally, SQLite consists of the following components: kernel, SQL Compiler, backend, and attachment. SQLite makes debugging, modification, and expansion of SQLite kernel more convenient by using virtual machines and virtual database engine (vdbe. All SQL statements are compiled into readable assembly that can be executed in the SQLite virtual machine. SQLite supports databases up to 2 TB in size, and each database is fully stored in a single disk file. These disk files can be moved between computers in different bytes. The data is stored on the disk in the form of B + tree (B + tree) data structure. SQLite obtains database permissions based on the file system.
1. Public interface)
Most of the public interfaces of the SQLite library are composed of main. c, legacy. C and vdbeapi. functions in the C source file are implemented. These functions depend on some programs scattered in other files, because they can access data structures with file scopes in these files. The sqlite3_get_table () routine is implemented in table. C. sqlite3_mprintf () can be found in printf. C, and sqlite3_complete () is located in tokenize. C. The TCL interface is implemented in tclsqlite. C. For information about the C interface of SQLite, see http://sqlite.org/capi3ref.html.
To avoid conflicts with other software names, all external symbols in the SQLite library are prefixed with sqlite3, which are used as external symbols (in other words, these symbols are used to form the SQLite API) it is named after sqlite3.
2. Lexical analyzer (tokenizer)
When executing a string containing an SQL statement, the interface program must pass the string to tokenizer. The task of tokenizer is to divide the original string into tokens and pass these identifiers to the parser. Tokenizer is manually written in the tokenize. c file.
Note that tokenizer calls parser in this design. People familiar with YACC and bison may be used to calling tokenizer with parser. The author of SQLite has already tried these two methods. Concurrent calls to parser using tokenizer will make the program run better. YACC will make the program more lagging.
3. parser)
The syntax analyzer assigns a specific meaning to the identifier in the specified context. The syntax analyzer of SQLite uses the lemon lalr (1) analysis program generator. Lemon does the same work as YACC/Bison, but uses different input syntaxes, which are less prone to errors. Lemon also produces reentrant and thread-safe syntax analyzer. Lemon defines the non-terminator concept. When a syntax error occurs, it will not leak the memory. The source file of the lemon driver can be found in parse. Y.
Because Lemon is an uncommon program on the development machine, the lemon source code (just a c file) is put under the SQLite "tool" subdirectory. Put the lemon documentation in the "Doc" subdirectory.
4. Code Generator)
After the syntax analyzer assembles the identifier into a complete SQL statement, it calls the code generator to generate the virtual machine code to execute the SQL statement request. The Code Generator contains many files: attach. c, Auth. c, build. c, delete. c, expr. c, insert. c, Pragma. c, select. c, Trigger. c. Update. c, vacuum. C and where. c. These files cover most of the most important and meaningful things. Expr. C processes the code generation of expressions in SQL. Where. C processes where clause code generation in select, update, and delete statements. File attach. C,
Delete. c, insert. c, select. c, Trigger. c. Update. C and vacuum. c. Process Code Generation of SQL statements with the same name (these files call expr when necessary. C and where. c ). The Code of all other SQL statements is generated by build. C. File Auth. c implements sqlite3_set_authorizer.
5. Virtual Machine)
The code generated by the code generator is executed by the virtual machine. For more information about virtual machines, see http://sqlite.org/opcode.html. In general, virtual machines implement an abstract computing engine designed specifically for operating database files. It has a storage stack for storing intermediate data. Each command contains an operation code and no more than three additional operations.
The Virtual Machine itself is completely contained in a separate file vdbe. c, it also has its own header file, where vdbe. h defines the interface between the virtual machine and other parts of the SQLite library, vdbeint. h defines the private data structure of the virtual machine. The file vdbeaux. C contains some tools used by virtual machines, and some interface modules used by other parts of the library to build VM programs. The file vdbeapi. C contains the virtual machine's external interfaces, such as sqlite3_bind _... family functions. A separate value (string, integer, floating point, BLOB Object) is stored in an internal object called MEM, and its implementation can be found in vdbemem. C.
SQLite uses a callback-style C language program to implement SQL functions. Each built-in SQL function is implemented in this way. Most built-in SQL functions (such as coalesce (), count (), substr (), and so on) can be found in func. C. The date and time conversion functions can be found in date. C.
6. B-tree)
An SQLite database is stored on a disk in the form of B-tree, and the implementation of B-tree is located in source file btree. C. Each table and index in the database uses a separate B-tree, and all B-trees are stored in the same disk file. File Format details are recorded in the remarks starting with btree. C. The B-tree subsystem interface is defined in the header file btree. h.
7. Page Cache)
The B-tree module requests information from the disk in the form of fixed data blocks. The default block size is 1024 bytes, but it can be changed between 512 and 65536 bytes. The page cache reads, writes, and caches these data blocks. Page cache also provides rollback and atomic commit abstraction, and manages data file locking. The B-tree driver module requests a specific page from the page cache. when it wants to modify the page, submit or roll back the current modification, it will also notify the page cache. High-speed page cache handles all troublesome details to ensure that requests can be processed quickly, securely and effectively.
The Code Implementation of the page cache is included in a single C source file pager. C. The interface of the page cache subsystem is defined in the header file pager. h.
8. OS Interface
To provide portability between POSIX and Win32 operating systems, SQLite uses an abstraction layer to provide operating system interfaces. The OS abstraction layer interface is defined in OS. H. Each supported operating system has its own implementation: OS _unix.c for UNIX, OS _win.c for Windows, and so on. Each specific operating system usually has its own header file, such as OS _unix.h and OS _win.h.
9. Utilities)
The memory allocation and string comparison functions are located in util. C. The symbol table used by the syntax analyzer is maintained by the hash table, which is located in hash. C. The source file UTF. C contains the Unicode conversion subroutine. SQLite has its own printf () Implementation (with some extensions). In printf. C, it also has its own random number generator in random. C.
10. Test code)
If you calculate the regression test script, more than half of the SQLite code will be tested. The main code file contains many assert () statements. In addition, the source file test1.c is extended for testing purposes only through test5.c and md5.c. OS _test.c backend interface is used to simulate power failure to verify the crash recovery mechanism of the page cache.
Note:
I wanted to analyze the architecture of SQLite 3 based on the source code. Later I found that there were good analysis articles on the Internet. Refer:
SQLite: http://www.cnblogs.com/hustcat/category/175618.html. It performs In-depth analysis on the components of SQLite 3. You can also download the series at http://sqlite.com.cn.
SQLite learning Manual: http://www.cnblogs.com/stephen en-liu74/category/348367.html. It is also described in detail in favor of the use of SQLite.