How can I implement a relational database by myself?

Source: Internet
Author: User
For example, recently, a colleague suddenly wanted to create a relational database on his own. Functions can be incomplete, but SQL statements can be identified for addition, deletion, query, and modification. Is there any good information for recommendation, because I found that all the databases found on the Internet are in mysql, not the whole database, or is my name incorrect? Anyway, he wants to implement it in python only. Is it feasible? What knowledge is required to implement the database? What is the logical division of underlying functions? For example, recently, a colleague suddenly wanted to create a relational database on his own. Functions can be incomplete, but SQL statements can be identified for addition, deletion, query, and modification. Is there any good information for recommendation, because I found that all the databases found on the Internet are in mysql, not the whole database, or is my name incorrect? Anyway, he wants to implement it in python only. Is it feasible? What knowledge is required to implement the database? What is the logical division of underlying functions? Reply content: All answers are incorrect.

The role of relational databases is to implement indexing, transaction, rollback, and power-off protection (

See database system concepts. Recently, my graduation project is to build a simple relational database. with Rust, most of the modules have been completed. The code quality is generally the same. If you are interested, check out GitHub-doyoubi/Blastoise: tiny relational database.
SQL parser, semantic check, simple execution plan generation, memory pool, and persistence are implemented. I think it basically meets the requirements of the question.
However, I really do not recommend this. I agree with wheel brother that an important part of relational databases is to ensure consistency and performance optimization. It's just simple to create a prototype. In fact, it's a waste of time and it's not a big gain. If you have time, you should read more information.

But I 'd like to paste the information for a simple relational database.
How does a relational database work
Database System Implementation
Https://web.stanford.edu/class/cs346/2015/

The third is Stanford's course. You can search for redbase on github to find the complete code uploaded by students. Read sqlite code. The morning paper of the last week will talk about Database, Techiques Everyone shoshould Know, and refer to Chapter 3 of xiaohongshu.
(Database) Techiques Everyone shoshould Know
Readings in Database Systems, 5th Edition Updated on January 1, May 25, 2016
It's nearly three years since I wrote the first original database.
I have created a NoSQL database, AsyncDB, based on coroutine and asynchronous IO, and officially released it.-Lin Cheng's article-zhihu Column
-------------------------------------------------------
The subject's question pulled my thoughts a few years ago.
I used to write it like this. At that time, I was still reading accounting and had no idea about programming.
The following is the Python pseudocode:

import picklemy_db = {}db_file = open('db')pickle.dump(my_db, db_file)pickle.load(my_db, db_file)
No one mentioned how to implement SQL?
Let's take a look at the LEMON syntax analysis generator.
The source code of this book is lemon. c In the sqlite source code. This is a parser generator of LALR (1), which is about 4000 lines. The external data model is a relational database. The internal implementation is mainly divided into two categories: disk-based, such as mysql and ipvs, memory based, and MemSQL and sap haha, oceanBase. The purpose of the question is to refer to the former. Here we will talk about how much a disk-based relational database involves.

In the 70/80 s of the last century, the memory was not large, and the data could not be stored in the memory. Most of the data was stored on the disk and the read data also needed to be read from the disk. However, reading and writing the disk was too slow, therefore, a buffer pool is created in the memory, and read data is cached in the buffer pool. When written, it is written to the buffer pool and then returned, the buffer pool function is used to manage the movement of data on disks and memory. The Data Management Unit in the buffer pool is page. The page size is usually dozens of KB. Generally, this parameter can be configured. If there is no idle page in the buffer pool, you need to put a page into the buffer pool. If it is a dirty page, you need to flush to the disk. Here, another LRU algorithm is required. A page contains multiple records. The page format must be designed to support variable-length fields. If this happens, the data in the buffer pool will be lost. This requires the REDO log to write the changes to the data to the redo log first, then write the buffer pool, and then return it to the client. Then, the dirty page in the buffer pool is flushed to the data file (no force ). Then, the data can be recovered from the redo log during the restart. Flushing data to the disk before the REDO log is complete can speed up writing. The disadvantage is that the UNDO log needs to be played back during restoration and some uncommitted transaction modifications can be rolled back. Write logs are classified into logical logs and physical logs, as well as physical logical logs. In short, a logical log is a record operation. For example, you can change a value from 1 to 2. physical log records are specific to the location of a record, such as a field in a record of a page, the original value, and the new value. The problem with logical log is that in the case of concurrency, it is not good to restore to consistency. Physical logs are too trivial for some operations such as create table, so generally databases adopt a hybrid approach. In order to track the sequence of various operations in the system, you need to allocate an id for the log and record it as the LSN (log sequence number ). Various lsns, such as pageLSN and flushedLSN, are recorded in the system. To speed up recovery from downtime, You need to regularly write checkpoint, which is an LSN.
C In ACID is related to D. The following describes A and I, I .e. atomicity and isolation.

The two properties are guaranteed by concurrency control. There are many isolation levels. There are four at the beginning, from low to high: read uncommitted, read committed, repeatable read, and serializable. Serializable is the result of concurrent execution of multiple transactions. In addition to serializable, there are various other problems. For example, the repeatable read has a phantom read problem (phantom), avoiding the need for gap lock for phantom read. Read committed has phantom read and non-repeated read problems. Later, there were some more isolation levels, such as snapshot isolation and snapshot isolation, which also had the write skew issue. In the early days, concurrency control protocols were mostly implemented based on two-phase locks (2PL). Therefore, there were only four isolation levels mentioned earlier. Later, another type of concurrency control protocol emerged, it is collectively referred to as Timestamp Ordering, so there are more isolation levels such as snapshot isolation. About the isolation level, you can look at this paper http://research.microsoft.com/pubs/69541/tr-95-51.pdf . 2PL needs to handle the deadlock issue.

The general idea of Timestamp Ordering is that there is not much conflict between transactions and no locks are required. check whether there is a conflict only at the time of commit. Is an optimistic lock.
Timestamp Ordering includes multiple types. The most common type of MVCC is OCC (optimistic concurrency control ). MVCC generates a new version for each update of the transaction and uses the timestamp as the version number. You can read the specified version or the latest version. Almost all mainstream databases support MVCC, because MVCC reads and writes are not blocked and read performance is high. The rollback segment of MySQL is used to save the old version. MVCC requires a background thread to recycle versions that are no longer needed. Postgres vacuum does this. The difference between OCC and MVCC is that in the OCC protocol, the transaction changes are stored in private space (such as the client), and the conflict is detected during the commit, the general practice is to check the last modified timestamp of the data to be modified at the beginning of the transaction, and check whether the timestamp has become larger at the time of submission. If yes, it indicates that it has been changed and conflicted. After a conflict, you can roll back or try again.

After the above steps are done, the core of the database is implemented. For the sake of performance, indexes are required. there are usually two types: one supports sequential scanning of B + Tree, and the other is Hash index. Hash Index and O (1) time complexity are suitable for single read. Sequential Scan is only applicable to B + Tree and O (logN) complexity. Then, some queries only need to scan the index to get the results, some queries can directly scan the data table to get the results, some queries can use the secondary index, find the data table through the secondary index and then get the results .. The specific method is the optimizer.

In addition, relational databases naturally need to support SQL, and there are many steps between SQL and finally executable physical execution plans. First, SQL generates an abstract syntax tree through lexical syntax analysis, then the planner generates a logical execution plan based on this tree. The generation of logical execution plans usually involves optimization technologies at the logic level such as equivalent predicate rewriting and subquery elimination. The purpose of optimization is of course performance. For example, the equivalent predicate is rewritten, and the like, between..., and other predicates that cannot be used by indexes are eliminated with a predicate greater than or equal to the predicate. The next step is to generate a physical execution plan for the logical execution plan. each node in the physical execution plan tree is an operator, and the execution of the operator is a real operation, such as the operator and filter opertor of the scan table. A logical execution plan can usually have multiple physical execution maps. Selecting a logical execution plan involves physical execution plan optimization. This involves the classic cost model, taking into account the memory, CPU, I/O, network. The most typical is three-table join, from left to right or right to left, using hash join, or sort merge join. For more information about the query optimizer, see the art of the database query optimizer: Principle Analysis and SQL Performance optimization.

It can be seen that it is very complicated to implement a disk-based relational database system. If you want to read the code, check postgres directly.

Write these first, and then add them later... For java, see derby. The implementation of a mature database is no less difficult than implementing a mature operating system. You can use MiniBase for UWisconsin Madison teaching. Pai_^ write a concurrency control subsystem first. Provides a variety of latches. Including those with different compatibility matrices, with priority queues or none, capable of exponential backward or untraceable, globally traceable or untraceable, and so on.

Then write a storage management subsystem. Here you can decide the layout of your database's external storage. For example, if a table can be separated from several files, there is no concept of partition, there is no concept of segment, there is no concept of table space, who among them is fixed length, who is variable length, who is the space application unit and who is the space scheduling unit. We decided to start designing the tablespace format of the page segments, and their descriptor format. Then we used the page header, page record, and some strings at the end of the page together. After the design is complete, determine the memory objects of this subsystem. At least one Storage Manager is required for initialization, allocation, or scheduling of storage units, at least a bunch of methods should be provided to determine how to convert binary data into meaningful data, such as reading a ushort and writing a uint64.

Then you need to write a buffer management subsystem (assuming you are not doing a memory database ). First, understand what a block, a page, and a frame. These are your classes. Write a buffer pool and then a buffer manager. The buffer pool specifies the layout of data in the memory. The buffer manager is the interface of this system. You can respond to a page application and implement your favorite page replacement policy.

Then write a log system. First, you need to use the shadow page log or the ARIES algorithm log. If the latter is used, you lose the forced write and use the frame stealing technology. In this way, you need to design the redo log format and make your log record types scalable, because it is not always time before you need a new log record. If you want to make your system more stable, you can see that you do not need a set of logs (a set of log records are either redone or not redone ). To make your system more efficient, see if mvcc is required. You must add the undo log and design the format. Below you will design the log record granularity. All physical logs? All logic logs? Physical logic log. In short, the more logical components there are, the more complicated the system design (such as how to handle bad part of the write ). Finally, the storage management system should have local materialized logs, and the buffer management system should be used to schedule log pages.

Next, write a lock system. First, think about whether your system is a table-Level Lock or a page-Level Lock or a row-Level Lock. The first two are the most natural. You can fix the number directly. The last one is the extra data structure that you need to use to represent the row lock. One lock instance per row? Every page shares a lock instance? Then go to the lock table to apply for release locks. Finally, how can we solve the deadlock and throw an exception upon timeout? Dependency Graph Analysis?

Next, write a transaction subsystem. It provides some methods to ensure that the two-phase lock is correctly used for various operations, logs are correctly written, and rollback is correct. However, the architecture of this system is determined by the diversity of "various operations. Compared with the heap file, adding, deleting, modifying, and querying records in the record files in the B + tree organization greatly complicate the log writing process. Compared with a fixed-length record file, the addition, deletion, modification, and query of variable-length records are also another story.

There are also metadata management subsystems and record (INDEX) subsystems. These constitute a storage engine. The other things that the subject wants are SQL lexer, SQL parser, SQL planner, and SQL optimizer. The above constitutes an SQL compiler. Finally, the Server/Client Module is used to control permissions and provide APIs. It is estimated that this is almost the same.

As for the summary of those components, wait until I find time to write it again.

(In addition, the first sub-system is a problem written in Python. You can use acquire_lock () to find a non-queue (it seems that there is a wait parameter that can specify whether to wait for a thread, so maybe there is a queue, and I forgot the details, the spin lock of the queue. However, the database requires that the number of spin times can be customized, and there must be a priority queue to sleep the thread. In addition, the buffer zone is not under your control in any language with GC. After a page is kicked out, it does not mean that it is parsed. If the code is not written, GC always thinks it is useful and does not cause memory leakage .)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.