Key-Value Pair storage (1): What is key-Value Pair Storage? Why?

Source: Internet
Author: User
Tags value store

Key-Value Pair storage (1): What is key-Value Pair Storage? Why?

Author: Emmanuel Goossaert

Original article: codecapsule.com


In this article, I will begin with a short description of what a key-value pair is. Then I will explain some reasons for this project, and finally I will explain the main objectives of the key-Value Pair storage I intend to achieve. Here is a list of content that will be included in this article:

  1. Key-Value Pair storage Overview
  2. Key-Value Pair storage vs Relational Database Service
  3. Why do we need to store key-value pairs?
  4. Plan
  5. Reference

 

1. Key-Value Pair storage Overview

This section provides a brief introduction to key-Value Pair storage. I have already selected several references at the bottom of this article.

Key-Value Pair storage is the simplest form of database organization. Basically, all programming languages are stored with key-value pairs applied in the memory. The map iner of C ++ STL, The HashMap of Java, and the dictionary type of Python are stored as key-value pairs. Key-Value Pair storage usually has the following interfaces:

Get (key): Get some data that was previously stored under a certain identifier "key", or if there is no data under "key", an error is returned.

Set (key, value): stores "value" under a bucket identifier "key", so that we can access it by calling the same "key. If some data already exists under "key", the old data will be replaced.

Delete (key): Delete data stored in the key.

Most low-layer implementations use hash tables or some self-balancing tree (such as B-tree or red/black tree ). Sometimes the data is too big to be loaded into the memory, or you must maintain the data to guard against system crashes due to unknown reasons. In these cases, you must use the file system.

Key-Value Pair storage is part of the NoSQL movement. NoSQL combines all database systems that do not use the concept of relational databases. NoSQL entries on Wikipedia summarize the features of these databases.

  • Do not use SQL Query Language
  • ACID (atomicity, consistency, isolation, and durability) is not fully supported ).
  • Provides a distributed and fault-tolerant structure.

 

2. Key-Value Pair storage and relational databases

Unlike relational databases, key-value pairs do not need to understand the data in values, nor have any structure as in MySQL or PostgreSQL. At the same time, this means that it is impossible to request a part of the data using a WHERE statement like SQL or filtering in any form. If you do not know where to find them, you must traverse all the keys to obtain their corresponding values, apply some filter you need, and then keep what you want. This requires a lot of operations, that is, it means that the best performance can be reflected only when the key is known, otherwise the key-Value Pair storage will not be competent (note: key-value pairs store structured data with field indexes ).

Therefore, even if the access speed of key-value pairs is often several orders of magnitude higher than that of the relational database system, the known requirements on keys limit their applications.

 

3. Why does Why implement a key-value store implement key-value Pair storage?

I started this project mainly as a way of charging, learning and adding some basic knowledge about core backend principles. Reading and Wikipedia articles are boring and have no practice, so I think it would be better to start with and actually write code. I am looking for a project that allows me to review the following content:

  • C ++ Programming Language
  • Object-Oriented Design
  • Algorithm and Data Structure
  • Memory Management
  • Multi-process or multi-thread concurrent management
  • Server/client network
  • Disk access I/O problems and file system usage

A file system is used for permanent storage, and the key-value pairs that provide network interfaces will contain all the content listed above. This project can handle various fields of the backend project. But let's face the reality. There are already a large number of key-value pairs on the market, some of which are implemented by very smart people and used in the production environment of large companies. This includes Redis, MongoDB, memcached, BerkeleyDB, Kyoto Cabinet, and LevelDB.

In addition, there has recently been a trend in key-Value Pair storage. It seems that each person has a key-Value Pair and wants to show everyone how outstanding and fast their key-Value Pair storage system is. This issue is described in the article about key-Value Pair storage in the Leonard Lin blog. Most of these projects were immature at that time and cannot be used in the production environment, but people still want to present them. In blog articles or conference slides, you can often see comparison of obscure key-value pairs on the storage system performance. These charts are basically meaningless, and they only perform isolated tests on their own hardware with their own data and applications, it can tell you which key-Value Pair storage is most suitable for solving your problem. Here are the conditions on which performance depends:

  • Hardware
  • File System Used
  • Actual Application and specific keys that will be accessed (reference locality)
  • The length of a dataset, especially the key and value, and the possibility of a key collision when a hash table is used.

Therefore, it is difficult to write a key value that has a certain influence on the storage system, because it is likely to be ignored because of the existence of other better key-value pairs in the storage system, or simply drowned in half-baked amateur projects without any concern.

To be different, this project cannot be as fast as other people do, but must be aimed at filling gaps between existing solutions. Here are several methods that I have found to make the key-Value Pair project stand out.

  • Applicable to a specific data type (such as images and geographical data)
  • Suitable for certain operations (for example, reading performance is particularly good or writing performance is particularly good)
  • Adapt to a specific problem (for example, automatic parameter adjustment, many key-value pairs have many options for storage, and finding the best parameter setting is sometimes tricky)
  • Provides more data access options. Taking LevelDB as an example, data can be accessed either forward or backward. There is an iterator that sorts data by key. Not all key-value pairs can be stored in this way.
  • Make your own implementations more approachable: currently, few key-value pairs have full code for the storage system. If you need to quickly build a project, you must customize a key-Value Pair storage for it. Even if it is not a well-known project, code-based solutions seem approachable and will serve as one of the options. Actually understand the code and believe that this solution will make up for these shortcomings.
  • Define the application. Here is an example of a practical problem: many web crawler frameworks (web crawlers) have a crude interface to manage the URLs they need to crawl, this often enables the customer to use key-Value Pair storage for logic. All Web Crawler frameworks benefit from a unified URL-optimized key-value pair.

 

4. Plan

The goal of the project is to develop a lightweight key-Value Pair storage with easy-to-understand C ++ code. In fact, I plan to follow the Google C ++ code style guide in this project. I will use a hash table as the underlying data structure, store the data on the hard disk, and implement a network interface. I will not rush to complete the project progress, but to make the design and implementation concise and clear. I will also try my best to minimize the space occupied by the database files on the hard disk.

I don't want to re-invent the wheel, so I will start by looking at other C or C ++ key-value pairs, and then select a better one. I will gradually learn their structures and code to get inspiration from them. Backend engineering is one of my core skills. I already have most of the knowledge required for this project, but I know that I still need to learn many new things, which makes it more interesting for me. I am also happy to record everything. I used to like core technology blogs, such as Alexander Sandler and Gustavo Duarte. I also want to contribute something useful and as good as possible.

My research results and key-Value Pair storage work will be documented in this article series. Do not try to use the date of the article to speculate on the implementation time of the key-Value Pair: there may be a considerable delay between the article and the actual research or practice.

In the second part, I will search for top-level key-value pairs and explain why I chose some of them as a reference, instead of selecting another one. For other articles, refer to the directories in this series.

You can find some articles and books in the "Reference" section below to learn.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.