Introduction to Key-value Storage systems

Source: Internet
Author: User
Tags cassandra hypertable value store

Redis is a key-value storage system. Similar to memcached, it supports storing more value types, including string (string), list (linked list), set (set), and Zset (ordered collection). These data types support Push/pop, Add/remove, and intersection-set and difference sets, and richer operations, and these operations are atomic. Based on this, Redis supports sorting in a variety of different ways. As with memcached, data is cached in memory to ensure efficiency. The difference is that Redis periodically writes the updated data to disk or writes the modified operation to the appended record file, and Master-slave (Master-Slave) synchronization is implemented on this basis.

  Introduction to Key-value Storage systems

Key-value Store is now a popular topic, especially in the construction of such as search engine, IM, peer, game server, SNS and other large-scale Internet applications and provide cloud computing services, how to ensure the system in the massive data environment of high performance, high reliability, high scalability, high availability, Low cost is a critical consideration for all system architectures, and how to address database server performance bottlenecks is the biggest challenge.

According to the CAP theory of the distributed domain (the three parts of consistency, availability, tolerance to network partitions can only meet two points at the same time in any system architecture implementation, it is not possible to take into account the three) to measure, The acid of traditional relational database only satisfies the consistency and availability, so it is very difficult to do well on partition tolerance. In addition, the traditional relational database processing massive data, distributed architecture in performance, Scalability, availability and other aspects also has great limitations.

and Key-value store more attention to the massive data access performance, distributed, extensibility support, do not need some characteristics of traditional relational database, such as: Schema, transaction, full SQL query support, etc. Therefore, the performance in distributed environment is greatly improved compared with the traditional relational database.

The Key-value database is divided into many categories, such as:

These key-value databases, some written in C + +, some are written in Java, and some are written in Erlang, each has its own unique, we choose some of the more distinctive and widely used product learning and understanding.

  1,Voldemort

Voldemort is a distributed key/value storage system that has the following features:

Data is automatically replicated between multiple servers;

Data is automatically partitioned so that each server includes only a subset of the overall data;

Server fault handling is transparent;

Supports plug-in serialization, allowing rich key and value types, including lists and tuples, or integrating common serialization frameworks such as Protocol Buffers,thrift,avro and Java serialization;

Data items support versioning, even in the event of a failure, data integrity can be guaranteed;

Each node is independent, no other node coordination, and therefore there is no central node;

Excellent single-node performance: 10-20k operations can be performed per second depending on machine configuration, network, disk system, and data replication factors;

Support for geo-distributed deployments.

  2, Dynamo

Dynamo is Amazon's key-value-mode storage platform with good usability and scalability and good performance: 99.9% response times in read-write access are within 300MS.

Next, give a brief description of some of the features that Dynamo needs:

Cost-effectiveness-Save money! Unlike some commercial database products, Dynamo requires expensive servers to get good performance, and perhaps 5% more traffic will require you to buy a new server with 20,000 US knives. On the Dynamo, because of the use of a lot of cheap machines to save data, so you may only need to spend a 500 of knives to buy a broken machine to join the cluster on the line.

Dynamo is a key-value store-so he does not support foreign keys and associated queries or anything. Its value is binary stored, so the query condition can only function on the key.

Configure simple distributed storage-this is because Dynamo is decentralized, and every machine in the cluster is a peer, unlike a central design like MongoDB, so it doesn't have a single point of view.

  3, Memcachedb

Memcachedb is an open source project by Sina developers, adding Berkeley DB of persistent storage mechanism and asynchronous primary and secondary replication mechanism to memcached distributed cache server, so memcached has the ability of transaction recovery, Persistence and distributed replication capabilities are ideal for applications that require ultra-high performance read-write speeds, but do not require strict transaction constraints and can be persisted, such as Memcachedb, which is used on Sina blogs.

  4, Cassandra

Apache Cassandra is a set of open source distributed key-value storage systems. It was originally developed by Facebook to store particularly large data. Facebook is currently using this system.

Key Features:

Distributed

Column-based structuring

High extensibility

The main characteristic of Cassandra is that it is not a database, but a distributed network service composed of a bunch of database nodes, a write operation to Cassandra will be copied to the other nodes, and the read operation to Cassandra will be routed to a node to read. For a Cassandra cluster, scaling performance is a simple matter, just add nodes to the cluster.

Cassandra is a hybrid non-relational database, similar to Google's bigtable. Its main function is richer than dynomite (distributed Key-value Storage System), but the support is not as good as document storage MongoDB (open source product between relational database and non relational database, the most abundant function in non-relational database, most like relational database). The supported data structures are very loose and are JSON-like bjson formats, so you can store more complex data types. Cassandra was originally developed by Facebook and turned into an open source project. It is an ideal database for Internet social cloud computing. Based on Amazon's proprietary, fully distributed dynamo, the data model of Google BigTable is based on the column family (columns Family). Center-to-peer storage. Many aspects can be called Dynamo 2.0.

Compared with other databases, there are several salient features:

Flexible mode: Using Cassandra, like document storage, you don't have to resolve fields in the record in advance. You can add or remove fields whenever the system is running. This is an astonishing efficiency boost, especially in large deployments.

True extensibility: Cassandra is a purely horizontal extension. To add more capacity to a cluster, you can point to another computer. You don't have to restart any processes, change application queries, or manually migrate any data.

Multi-Datacenter Recognition: You can adjust your node layout to avoid a fire in a data center, and an alternate datacenter will have at least a full copy of each record.

Range Query: If you don't like all of the key-value queries, you can set the range of the keys to query.

List data structure: In mixed mode you can add a super column to a 5-D. For each user's index, this is very convenient.

Distributed write operations: You can read or write any data at any time in any place. And there will be no single point of failure.

  5, memcached

Memcached is a set of distributed cache system, originally Danga Interactive for LiveJournal development, but is currently used by many software (such as MediaWiki). This is a set of open source software that is released with BSD license authorization.

Memcached lacks authentication and security controls, which means that the memcached server should be placed behind a firewall.

The Memcached API uses a 32-bit cyclic redundancy check (CRC-32) to calculate the key values and spread the data across different machines. When the table is full, the next additions will be replaced with the LRU mechanism. Since memcached is often used only as a cache system, applications that use memcached require additional code to update memcached data when writing back to a slower system, such as a back-end database.

Memcached Client Development packages in a variety of languages, including: perl/php/java/c/python/ruby/c#/mysql/

  6, Hypertable

Hypertable is an open-source, high-performance, scalable database that uses a model similar to Google's bigtable. Over the past few years, Google has built three key pieces of scalable computing infrastructure designed to run on a PC cluster. The first key infrastructure is the Google File system (GFS), a highly available filesystem that provides a global namespace. It achieves high availability through file data replication across machines (and across racks), and thus is protected from many failures that traditional file storage systems cannot avoid, such as power, memory, and network port failures. The second infrastructure is a computing framework called Map-reduce, which works in close collaboration with GFS to help process the massive amounts of data collected. The third infrastructure is bigtable, which is an alternative to traditional databases. BigTable allows you to organize massive amounts of data through some primary keys and enable efficient queries. Hypertable is an open source implementation of BigTable and has made some improvements based on our ideas.

Introduction to Key-value Storage systems

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.