Implement key-Value Pair storage (2) -- take existing key-Value Pair storage as a model

Source: Internet
Author: User

This article is the second article in the series of "implementing key-Value Pair storage"

Original article from Emmanuel goossaert (codecapsule.com)


At the beginning of this article, I will explain the reasons for using an existing model instead of starting the project with a heavy header. I will elaborate on a series of criteria for selecting a key-Value Pair storage model. Finally, I will give an overview of some well-known key values on the Storage Project, and use these standards to select some of them as models. This article will include:

1. Don't re-invent the wheel
2. Alternative Models and selection criteria
3. Storage overview of the selected key-value pairs
4. References

 

1. Don't re-invent the wheel

Key-Value Pair storage has been sung for at least 30 years [1]. One of the most famous projects is dBm, the earliest Database Manager compiled by kenth Thompson for the seventh edition of UNIX, which was released in 1979 [2]. Engineers encountered problems related to these database systems and chose or gave up various design and data structure ideas. Test and learn problems in real life. It would be silly not to consider their work and start from scratch, but to repeat the mistakes they have made before. The gall Theorem in John Gall's systems:

Any complex system that can be operated is developed from a simple system that can be operated. Its Inverse Proposition is also a true proposition: a complex system designed by a system that cannot operate normally. You must start with a simple and operational system.

This quote provides two basic ideas for the development of storage projects.

1. Use the model. I need to identify the key-value pairs that have been stored for a period of time, or even further, the successor of the previously successful key-Value Pair storage. This is a proof of its reliable design and is refined over time in iteration. The storage of these selected buildings should be used as a model for projects I am currently working on.

2. The starting point is small. The first version of this project must be small and simple, so that its design can be tested and passed. If necessary, the improvements and additional features must be added to later versions.

 

2. models to be selected and selection criteria

After a little research on key-value pairs and nosql databases, I decided to use the following options as further options:

  • DBM
  • Berkeley DB
  • Kyoto Cabinet
  • Memcached and memcachedb
  • Leveldb
  • MongoDB
  • Redis
  • OpenLDAP
  • SQLite

The selection criteria are as follows:

  • I want to use object-oriented programming to create key-Value Pair storage, so in design, I have to draw inspiration from projects written in object-oriented languages.
  • As for the underlying data structure, I want to have a hash table on the hard disk, so I need to select a project that provides read/write information to the hard disk.
  • I also want to enable network access for this data storage.
  • I don't need a query engine or method to access structured data.
  • Do not fully support acid specifications.
  • Since this project was developed by myself, I would like to use the project models implemented by small teams. Ideally, I would like to have one or two people.
3. Overview of the selected key-value pairs

The three selected models are Berkeley dB, Kyoto cabinet, and leveldb. Berkeley dB and Kyoto cabinet have the same history as dBm's successor. In addition, Berkeley dB and Kyoto cabinet are not the "first version ". This indicates that they are more reliable than other key-value pairs that are implemented for the first time. Leveldb is more modern and based on the data structure of the LSM tree. It is useless for the hash table mode. However, the Code is the cleanest I have ever seen. These three projects are developed by one or two people. Below are their respective details.

Berkeley DB

The development of Berkeley dB started in 1986, which indicates that it existed for 26 years when I started writing this article. Berkeley dB was developed as a successor to dBm and implements a hash table. The first version was written by Margo seltzer [22] and Ozan Yigit [23] at UC Berkeley. This project was subsequently obtained by Oracle and further developed by Oracle.

Berkeley dB was initially implemented by C and is still only used by C. It is developed through the incremental process, that is, new features are added to each main version. Berkeley dB evolved from a simple key-Value Pair storage to the management of parallel access, transactions and recovery, and synchronization functions [4]. Berkeley dB is widely used and has hundreds of millions of copies deployed [5]. This is evidence of its architecture and reliability. For more information about its design, see Introduction to "Berkeley dB programmer's Reference Guide" [6] and "the architecture of open source applications, it is found at the beginning of volume 1 "[5.

Kyoto Cabinet

Kyoto cabinet was introduced by Mikio hirabayashi [24] in 2009. It is still under active evolution. Kyoto cabinet is another key-Value Pair storage of the same author: successor of Tokyo Cabinet (2007 release) and qdbm (2003 release, starting from 2000. Qdbm intends to serve as the successor of DBM's high performance [7]. Kyoto cabinet is particularly interesting because it has a pure dBm lineage and its authors have been working on key-value pairs for 12 years. After being immersed in three key values for storage for so many years, there is no reason to doubt that the author has a solid understanding of the structure requirements and a strong understanding of the causes of performance bottlenecks.

Kyoto cabinet is implemented by C ++ and implements a hash table, a B + tree, and other profound data structures. It also provides excellent performance [16]. However, it seems that some performance problems are caused by its internal parameters. Indeed, many people report that as long as the number of data entries remains below a specific threshold value (proportional to the size of the bucket array, which is determined by the parameters when the database file is created, performance is good. Once this threshold is exceeded, the performance seems to drop sharply [18] [19]. The same problem also exists in Tokyo cabinet [20] [21. This indicates that if the demand of a project Changes During database usage, you may encounter serious problems. We all know that changes in software are so frequent.

Leveldb

Leveldb is developed by Google employee Jeffrey DEAN [8] and Sanjay Ghemawat [9] who work for Google's legendary infrastructure projects mapreduce and bigtable. Based on the extensive problem experience gained by Dean and Ghemawat at Google, they are likely to understand what they are doing. Compared with most key-value pairs, leveldb does not use a hash table or B-tree as the underlying data structure, it is a merging Tree Based on a log structure [12]. The LSM structure is said to be optimized for SSD disks [13]. You can find tons of leveldb information in this blog high scalability blog [17.

Leveldb is implemented by C ++ and released in 2011. It is designed as part of an advanced storage system [10]. The indexeddb HTML5 API will use leveldb [10] [11] for future chrome implementations. Its performance depends on a specific workload, as shown in the benchmark test provided by the author [14]. However, another commercial SSD-based Benchmark Test of Andy Twigg in acunu showed that if the number of data entries exceeds 1e6 (1 million) and advances to 1e9 (1 billion, performance will be significantly reduced [15]. Therefore, it seems that leveldb does not seem to be the best choice for large databases that require heavy workloads or as the actual backend project needs.

But this is not really important. For me, the best part of leveldb is not its performance but its architecture. Looking at its source code and the way it organizes things, it's pure beauty. Everything is clear, simple, and organized. Accessing the source code of leveldb and taking it as a model is an excellent opportunity to create outstanding code.

What is the storage of unselected key-value pairs?

The reason why other key-value pairs are not selected does not mean that I completely abandon them. I will remember them and may occasionally use the elements in their structure. However, the current project is not affected by these key values as many as you have selected.

4. References

[1] http://blog.knuthaugen.no/2010/03/a-brief-history-of-nosql.html
[2] http://en.wikipedia.org/wiki/Dbm
[3] http://en.wikipedia.org/wiki/Systemantics
[4] http://en.wikipedia.org/wiki/Berkeley_DB#Origin
[5] http://www.aosabook.org/en/bdb.html
[6] http://docs.oracle.com/cd/E17076_02/html/programmer_reference/intro.html
[7] http://fallabs.com/qdbm/
Http://research.google.com/people/jeff/
Http://research.google.com/pubs/SanjayGhemawat.html [9]
Http://google-opensource.blogspot.com/2011/07/leveldb-fast-persistent-key-value-store.html [10]
[11] http://www.w3.org/TR/IndexedDB/
[12] http://www.igvita.com/2012/02/06/sstable-and-log-structured-storage-leveldb/
Http://www.acunu.com/2/post/2011/04/log-file-systems-and-ssds-made-for-each-other.html [13]
[14] http://leveldb.googlecode.com/svn/trunk/doc/benchmark.html
Http://www.acunu.com/2/post/2011/08/benchmarking-leveldb.html [15]
Http://blog.creapptives.com/post/8330476086/leveldb-vs-kyoto-cabinet-my-findings [16]
Http://highscalability.com/blog/2011/8/10/leveldb-fast-and-lightweight-keyvalue-database-from-the-auth.html [17]
[18] http://stackoverflow.com/questions/13054852/kyoto-cabinet-berkeley-db-hash-table-size-limitations
[19] https://groups.google.com/forum! Topic/tokyocabinet-users/bzp4flbmcdw/discussion
Http://stackoverflow.com/questions/1051847/why-does-tokyo-tyrant-slow-down-exponentially-even-after-adjusting-bnum [20]
[21] https://groups.google.com/forum! Topic/tokyocabinet-users/1e06dfqm8mi/discussion
[22] http://www.eecs.harvard.edu/margo/
[23] http://www.cse.yorku.ca /~ Oz/
[24] http://fallabs.com/mikio/profile.html


Implement key-Value Pair storage (2) -- take existing key-Value Pair storage as a model

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.