Architecture of a Database System. Joseph M. Hellerstein, Michael stonebraker and James Hamilton
one of Michael Stonebraker's papers, the Turing Award winner, specifically describes the architecture of the database and the specific practices of the relevant theories and mainstream databases.
for those who have already read the basics of the database, this paper is well suited to continue in-depth learning and understanding of the various architectures of the database. This paper mainly introduces the data processing model, parallel architecture, storage module design, transaction model implementation, query and optimizer architecture, and the shared modules of these architectures. The paper includes not only the basic theories of books, but also the concrete practices of open source and commercial databases, which are useful for understanding the kernel of a database. In addition, the paper has a large number of citations for each content for interested people to continue in-depth study.
Although the paper itself belongs to the direction of introduction, there are not too many professional terminology and implementation of in-depth discussion, but because of the scope of a lot of, so long (for the length of the paper).
Readings in Database Systems Fifth Edition (2015). Peter Bailis, Joseph M. Hellerstein, and Michael Stonebraker
the very famous red book is also the hand of Michael Stonebraker. This book focuses on some of the main trends of current DBMS development.
Compared with the architecture of a Database system above, this book focuses on the development direction and trends of DBMS, including the current very popular big data, NoSQL, data analysis and so on. At the same time, the author's subjective evaluation and suggestions on the technology will be helpful to understand the current development trend. At the same time, the thesis has some typical paper recommendations on different topics, which can be deeply researched by interested people.
Some local authors may be more subjective in their evaluations, but they can be considered as a reference. Some content, because of the length of the relationship, is only dragonfly water, for non-practitioners may be more difficult to understand.
Aries:a Transaction Recovery Method supporting fine-granularity Locking and Partial rollbacks Using write-ahead Logging. C. MOHAN, DON Haderle, BRUCE LINDSAY, HAMID Pirahesh, PETER SCHWARZ
Quite classic log recovery algorithm, many database implementation of the recovery algorithm have Aries shadow.
The implementation of Aries is relatively simple and efficient, supporting the transaction part rollback, row lock, Fuzzy checkpoint. Aries records the update log (Undo/redo log) for all updates to the data page, and uses the LSN to correlate the data page with the update log, correctly linking all the update logs for the same transaction, and handling the nested rollback. In addition, Aries does not force checkpoint when the time to refresh all the dirty pages to the hard disk and other features, in short Aries the entire algorithm is very flexible, specific practice can be customized according to the needs of simple modification.
The whole part of the Aries algorithm is not complicated, but the paper spends a lot of time to introduce and compare the previous recovery algorithms, which may make the first reading difficult to master the essence of Aries, so it is recommended to first focus on understanding the whole part of the Aries algorithm, and then focus on other content.
Cache substitution Algorithm
The Lru-k Page replacement algorithm for Database Disk buffering. E. J. O ' Neil, P. E. O ' Neil, and G. Weikum
2Q:A Low Overhead High performance Buffer Management replacement algorithm. T. Johnson and D. Shasha
Lirs:an efficient low inter-reference Recency Set replacement Policy to Improve Buffer Cache performance. Song Jiang and Xiaodong Zhang
Arc:a self-tuning, low OVERHEAD replacement CACHE. Nimrod Megiddo, Dharmendra S. Modha
4 papers corresponding to 4 different cache replacement algorithms, not only for the database cache, in other applications that need to apply the cache (such as the file system cache) has an important role in the guidance.
Lru-k,2q,lirs three algorithms are based on the second-to-last access time to infer the block's access frequency, thereby replacing the block with low access frequency. ARC is an adaptive algorithm to adjust the cache according to the different workload, compared with the previous three need to configure parameters to perform the best performance of the algorithm, ARC does not need to configure parameters and can play the best performance. In addition, the 4 algorithms are scan-resistant, that is, to avoid scanning the contaminated cache.
The first three models need to be tested and configured with the best parameters, the ARC model is too idealistic, and there is no discussion of how the memory can be fixed when it cannot be refreshed, so the application needs to be adjusted according to the actual situation.
Data Storage related
The log-structured merge-tree (lsm-tree). Patrick o ' Neil, Edward Cheng, Dieter gawlick, Elizabeth O ' Neil
The LSM tree is the data structure of the disk-based, which provides lower IO overhead than the B-tree for insert. Incidentally, the underlying data structure of HBase is based on the LSM tree.
When the data page is refreshed, it results in a large number of random Io, the main overhead of which is the drive's robotic arm rotation (i.e. addressing). Using memory to build a B-tree as a write cache, combined with the merge sort by using deferred and batch index modifications, and then flushing to the hard disk, can greatly reduce the overhead of IO write drive addressing.
For lookups, in some cases it can be inefficient, so the LSM tree is more efficient for writing than it is for lookups.
Because of the need to use memory to establish a B-tree for insert caching, it typically requires a large memory overhead.
For the current market of SSD does not necessarily have a large performance improvement.
In this paper, the data model of the LSM tree is established, and the detailed theoretical proof is made, which is rather ugly.
The Design and implementation of a log-structured File System. Mendel Rosenblum, John K. Ousterhout.
Log-structed is a storage management method that converts random io to sequential io.
This paper puts forward the storage management method of log-structed, which takes all the changes as log, then saves it to the memory first, then writes the batch sequence to the hard disk, so that the sequence IO is changed to save the drive addressing time and the drive utilization rate is improved. At the same time, in order to better manage the large space block on the hard disk, divide multiple log into a segment, and use the cost-based (cost-based) cleanup strategy to compress and reclaim the large space blocks. This paper realizes the file system of the sprite LFS theoretically, and the transmission utilization rate is up to 70% beyond the Unix 5-10%.
For the database, you can refer to the Log-structed method, the original refresh of the B-tree data page of the random write into sequential write.
This paper is mainly aimed at the implementation of the file system, which requires that all modifications be converted to log output to the hard disk, which may increase the number of bytes output. Similarly, for SSDs, performance improvements may not change significantly.
C-store:a column-oriented DBMS. Mike Stonebraker, Daniel J. Abadi, Adam Batkin, Xuedong Chen, Mitch cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, S Am Madden, Elizabeth o ' Neil, Pat o ' Neil, Alex Rasin, Nga Tran, Stan Zdonik
This paper mainly introduces Column-store, that is, the implementation details of the column-type storage. It was also Mike Stonebraker's paper. Although the paper only achieved a small part of the characteristics, but it is undeniable that the views are very new ideas, can have a lot of inspiration.
This paper introduces some implementation details of the column store by using a storage engine that implements the column store, C-Store, as an example. C-Store is designed to be efficient for reading and writing with special Columnstore for Data Warehouse type queries (ad hoc queries with large amounts of data to perform aggregation) and OLTP-type transactions. In addition, the paper also mentions how C-Store is used to achieve high availability of k-sfaety when distributed, and to avoid transactional features of two-phase commit (2PC).
Specific features include the writeable store (hot data memory) optimized for write performance and the Read-optimized store's two storage modules for optimized read performance, and a projection of different sequential data for different columns of the redundant storage table based on the query. This enables the query to use the best projection, compresses the column data using multiple encodings, the query optimizer and executor for Columnstore, and provides distributed high availability by storing overlapping columns as projections, and snapshot isolation to avoid 2PC and query locks.
Although the ideas presented in this paper are novel, they are difficult to achieve, and the performance comparison is only performed by C-Store, and the writeable store is not even able to meet the test requirements and can only be run on a single machine. It remains to be seen whether many features are as effective as those mentioned in the paper.
In addition, read optimization requires manual selection of columns to synthesize the projection according to the type of query, which requires manual intervention to optimize.
Efficient Locking for Concurrent Operations on B-trees. PHILIP L. lehman,s BING YAO
This paper introduces an algorithm of parallel operation B-tree.
Using the B + Tree variant blink tree, it realizes the simple and efficient search and insertion concurrency operation.
Blink tree, is to add a link pointer to the right sibling at each node of the B + tree. For insertions, use the blink tree's link pointer to obtain a maximum of three node locks at any point (including split) for each process (thread). For lookups, you do not need to get locks on any nodes.
Because it restricts the locks acquired by each process (thread), it greatly increases the degree of concurrency.
For deletion, the paper does not provide efficient concurrency.
This is done by not deleting the key of the non-leaf node, deleting only the leaf node data, and not performing the merge operation. This is due to the fact that the removal of real applications is a relatively uncommon operation. When you delete too much data and cause the leaf nodes to be used at a fairly low rate, you can lock down the entire tree and perform the finishing operations.
Personal collation of some papers in the database