In this article, I'll explain why I'm using an existing model in my project instead of starting completely from scratch. I'll list a series of criteria for choosing a K-V database model. Finally, it outlines and selects some of the K-V databases that are well-known and conform to the standards listed. The article will mainly cover:
- Don't reinvent the wheel.
- Model and selection criteria for reference
- Summary of selected databases
1. Do not create wheels repeatedly
[1] The K-V database has been in existence for at least more than 30 years. One of the most memorable items was dbm, the initial database management software written by Kenneth Tompson for UNIX version 7 in 1979 [2]. The engineers were confronted with the various scenarios of these databases.
and questions about its design and data structure or acceptance or objection. They have gained a lot of experience with the problems they encounter in actual production, ignoring the fact that the work of these precursors will be foolish from scratch, and that they will only repeat the mistakes they have made.
Gall law, presented by John Gall [3]
A complex system that works can always evolve from a simple system that works. The opposite is true: a complex system designed from scratch never works properly and cannot be debugged to work properly. You have to start over, from a working Jane.
Single-system start.
This quote lays out two keynote notes for my K-V database project:
1. Use the model. I need to find some K-V database projects that have been around for a while, and more ideally it will be a follow-up project for some successful k-v databases. Because these projects are often designed to be robust, they are constantly being perfected during the iterative development process. These
The K-V database will be used as a model for my own project.
2. Start from childhood. the first version of the project should be small and simple, so that it is easy to test and validate. If improvements and additional features are needed, they should be added in subsequent releases.
2. Candidate models and selection criteria
After doing a little homework on the K-V database and the NoSQL database, I decided to make a more comprehensive choice in the following databases:
- dbm
- berkeley db
- kyoto cabinet
- memcached and memcachedb
- leveldb
- mongodb
- redis
- openldap
- sqlite
The criteria for my selection are as follows:
- I want to implement an K-V database with object-oriented programming, so I need to get inspiration from the database written in the OOP language.
- For the underlying data structure, I want to base on the hash table on the hard disk, so the candidates need to provide a way to read and write the hard disk.
- Database needs to support network access
- No query engine or structured access to data
- No need to support the complete acid specification
- Given that the project needs me to do it alone, I want to find some small team to achieve, one or two people the best
3. Overview of the finalized K-v database
The final three winners are Berkeley Db,kyoto cabinet and Leveldb. Both Berkeley DB and Kyoto cabinet are the successor projects of DBM. In addition, both are not the initial versions, but the nth version of their authors. This usually means that they
More reliable than other projects that were first completed. LEVELDB, the underlying data structure is based on the LSM tree rather than the hash table. But his code is the cleanest in all the code I've ever seen. The above three projects were developed by one or two people, and the following is their detailed letter
Interest.
Berkeley DB
The project began in 1986 and it has been 26 years since I wrote this article. Berkeley DB is a subsequent project for DBM, and the bottom layer is implemented by hash table. The first version was implemented by Margo Seltzer[22] and Ozan yigit[23], and they were still in UCB. Thereafter by
Oracle inherits and continues to develop.
Berkeley DB was originally implemented in C language and is still pure C language development. The entire development process is moving forward, with new features added to each major release. concurrency, transaction, recovery and Replication[4] have now been supported. Berkeley DB has
Widely used in deployment [5], proving its architecture is highly reliable. For more information on its design, refer to "Berkeley DB Programmer's Reference Guide" [6] and "The Architecture of Open Source applications, Volume 1" [5]
Kyoto Cabinet
Kyoto cabinet was released in 2009 by Mikio Hirabayashi[24] and is still active today. This project is a follow-up project by the author of Tokyo Cabinet (2007) and QDBM (2003). QDBM was implemented as a follow-up implementation of high performance dbm [7]. Kyoto Cabinet
This is particularly interesting because of its pure dbm lineage and the author's experience of more than 12 years of k-v database development. For so many years of implementing these three k-v databases, the author must have a solid understanding of the data structures needed, not to mention the intuition of performance bottlenecks.
Kyoto cabinet is implemented with C + + and implements the hash table,b+ tree and some esoteric data structures. [16] It offers quite excellent performance. Nonetheless, he still seems to have some performance problems caused by initialization parameters. As many people have feedback
That way, when the amount of data is less than a threshold that is proportional to the size of the bucket array, which is determined by the initialization parameters that create the database file, there is no problem with good performance. Once the threshold is exceeded, performance will drop sharply [18][19]. The same problem is also in Tokyo CABINET[20][21]
appear in the. This means that when you use both, there can be serious consequences once the project's requirements change. Everyone knows exactly how many things really are in the field of software ...
LevelDB
LEVELDB is a Google employee Jeffrey Dean (translator: This is strong to not human)) [8] and Sanjay ghemawat[9] implementation, they have been involved in Google's mythical Infrastructure project: MapReduce and BigTable. In view of the fact that these two people are working in Google
Experience of the large-scale problems they are experiencing, they are largely aware of what they are doing. The interesting thing about LEVELDB, unlike other k-v databases, is that he does not use hash table or B-tree to do the underlying data structure, but is based on log-structured Merge tree[12].
The LSM was allegedly optimized for SSDs [13]. [17] You can find a bunch of information about leveldb on the High Scalability blog.
Leveldb, published in 2011, is based on C + + and is designed as a base module for higher-level storage systems [10]. The INDEXEDDB HTML5 API in future versions of Chrome will use LEVELDB[10][11]. According to the test provided by the author [14], its performance in a certain
The amount of data below is the burst level. However, another test performed by Acunu Andy Twigg on a commercial SSD showed a dramatic decrease in performance when the amount of data exceeded 1M and developed toward the 1B level [15]. So for back-end projects that really require data size,
Leveldb may not be the best choice.
But it's not really a big problem, because the best part of leveldb for me is not his performance but its architecture. After looking at how the various parts of the source code are organized, you will understand what pure beauty is. Everything is so clear, concise and reasonable. Pick up
It is an excellent opportunity to write good code by touching the source of Leveldb and using him as a model.
What about the remaining k-v databases?
The fact that I didn't choose them doesn't mean that I'm going to throw them out altogether. I might occasionally use some of the elements in their architecture. But they wouldn't have a big impact on my project like the three databases that were selected.
Reference
[1] http://blog.knuthaugen.no/2010/03/a-brief-history-of-nosql.html
[2] HTTP://EN.WIKIPEDIA.ORG/WIKI/DBM
[3] http://en.wikipedia.org/wiki/Systemantics
[4] http://en.wikipedia.org/wiki/Berkeley_DB#Origin
[5] http://www.aosabook.org/en/bdb.html
[6] http://docs.oracle.com/cd/E17076_02/html/programmer_reference/intro.html
[7] http://fallabs.com/qdbm/
[8] http://research.google.com/people/jeff/
[9] http://research.google.com/pubs/SanjayGhemawat.html
[ten] http://google-opensource.blogspot.com/2011/07/leveldb-fast-persistent-key-value-store.html
[one] http://www.w3.org/TR/IndexedDB/
[ http://www.igvita.com/2012/02/06/sstable-and-log-structured-storage-leveldb/ ]
[ http://www.acunu.com/2/post/2011/04/log-file-systems-and-ssds-made-for-each-other.html ]
[ http://leveldb.googlecode.com/svn/trunk/doc/benchmark.html ]
[ http://www.acunu.com/2/post/2011/08/benchmarking-leveldb.html ]
[ http://blog.creapptives.com/post/8330476086/leveldb-vs-kyoto-cabinet-my-findings ]
[http://highscalability.com/blog/2011/8/10/] leveldb-fast-and-lightweight-keyvalue-database-from-the-auth.html
[Http://stackoverflow.com/questions/13054852/kyoto-cabinet-berkeley-db-hash-table-size-limitations]
[ https://groups.google.com/forum/#!topic/tokyocabinet-users/Bzp4fLbmcDw/discussion ]
[http://stackoverflow.com/questions/1051847/] Why-does-tokyo-tyrant-slow-down-exponentially-even-after-adjusting-bnum
[ https://groups.google.com/forum/#!topic/tokyocabinet-users/1E06DFQM8mI/discussion ]
[ http://www.eecs.harvard.edu/margo/ ]
[ http://www.cse.yorku.ca/~oz/ ]
[ http://fallabs.com/mikio/profile.html ]
Key-value Database Implementation Part 2: Model with existing K-V database