Hbase Cassandra Riak hypertable

Source: Internet
Author: User
Tags cassandra riak hypertable

 

 

Cassandra hbase

Consistency

Quorum NRW Policy

Synchronizes Merkle tree using the gossip Protocol to maintain data consistency between cluster nodes.

Single Node, no replication, Strong Consistency

Availability

1. Data is replicated based on the consistent hash adjacent nodes. The data exists in multiple nodes and is not spof.

2. If a node goes down, new data from hash to the node is automatically routed to the next node for hinted handoff. After the source node recovers, It is pushed back to the source node.

3. Maintain the health status of all nodes in the cluster through the gossip protocol, send synchronous requests, and maintain data consistency.

4. sstable, pure file, and general single-host reliability.

1. There is a single point of failure. After the region server is down, the region maintained by the server cannot be accessed within a short period of time. Wait until the Failover takes effect.

2. Maintain the health status and region distribution of each region server through the master.

3. for multiple master nodes, the master node is down and the paxos voting mechanism of zookeeper is used to select the next master. Even if the master is down, the region read and write operations are not affected. The master only acts as an automatic O & M role.

4. HDFS is a distributed storage engine. It features high reliability and zero data loss.

5. HDFS namenode is a spof.

Scalability

1. Consistent hash to quickly locate the node where the data is located.

2. Data distribution needs to be adjusted between multiple nodes on the hash ring.

1. Locate the target region server through zookeeper, and then locate region.

2. The region server is resized. By releasing itself to the master, the master is evenly distributed.

Load Balancing

Heng

Request zookeeper to obtain the whole cluster address, and then select the appropriate node based on the consistent hash. The client caches the cluster address.

Request zookeeper to locate the region server from the read/write data route table. The Master will modify the route table. The client also caches some routing information.

Data difference comparison algorithm

Merkle tree, Bloom Filter

Bloom Filter

Lock and transaction

Client timestap (DYNAMO uses vector lock)

Optimistic Concurrency Control

Read/write Performance

Fast data read/write positioning.

Data read/write positioning may require a maximum of six network RPC times, with low performance.

Cap comment

1. Weak Consistency, and data may be lost.

2. high availability.

3. Easy resizing.

1. Strong Consistency, zero data loss.

2. low availability.

3. Easy resizing.


Big comparison
Both Riak and hbase are published based on Apache 2.0 licensed.
The implementation of Riak is based on Amazon dynamo, and hbase is based on Google's bigtable
Riak is mainly written in Erlang, including part of C, while hbase is written in Java.
Functional Comparison

Features Riak Hbase
Data Model Riak uses a bucket as the namespace to store key-value information.

Buckets, keys, and values
Hbase stores data according to the pre-defined column family structure (each piece of data contains a key and several column attribute values, and each column of data has its own version information ). Data in hbase is stored sequentially by column (unlike row-based relational databases ).

Hbase Data Model
Supported Data Types
Storage Engine Riak uses the modular idea to mount the storage layer to the system as an engine. You can select different storage engines as needed.

Storage engine supported by Riak
You can even use Riak's backend API to implement your own storage engine.
Hbase exists on HDFS, and its data files exist in HDFS. Similar to bigtable, data storage is divided into memory-based memstore and stored storefiles. Its data file is called hfile, Which is sstable Based on bigtable. You can directly use JVM's file system Io operations to operate data files.

HDFS
Hadoop uses HDFS
Data access interface In addition to using Erlang directly, Riak also provides two data access interfaces, rest mode and Protocol Buffer:

HTTP
Protocol Buffers
Riak clients are implemented based on the above APIs, and currently have good support for mainstream languages.
Client-libraries
Community developed libraries and projects
Hbase mainly performs code execution in JVM. Hbase also provides external data access methods, including rest and thrift protocol access.

Java Interface
Rest
Thrift
Data Operation Method Riak supports the following operations:

Perform direct operations on the primary key (get, put, delete, update)
Mapreduce Mode
Riak also provides secondary Indexes
Riak search plug-in
Comparison of the above methods
Hbase supports two data operations: scanning and querying ordered key values, obtaining value values, or performing mapreduce queries using powerful hadoop.

Scanning
Mapreduce
Secondary Indexes
Data Consistency Riak maintains data versions by means of vector clock to handle inconsistencies. You can also use the "Last-Write-wins" policy based on the timestamp instead of using the vector clock.

Vector clocks
Why vector clocks are easy
Why vector clocks are hard
Hbase uses highly consistent read/write guarantees. Data is saved in multiple different region formats. Column families can contain an unlimited number of data versions, and each version can have its own TTL

Consistent Architecture
Time to live
Concurrency All nodes in the Riak cluster can perform read and write operations at the same time. Riak is only responsible for Data Writing operations (saving with Version Control Based on Vector clock ), when reading data, define the processing logic of data conflicts. Hbase ensures the atomicity of write operations through row-level locks, but does not support transactions of multi-row write operations. Data scan operations do not guarantee consistency.

Consistency guarantees
Copy The theoretical source of Riak's data replication system is Dynamo's thesis and Dr. Eric Brewer's cap theory. Riak uses consistent hash to partition data. The same data is saved and backed up in multiple nodes. With the theoretical support of consistent hash, Riak uses virtual nodes to replicate data and ensure balanced data distribution. Introducing virtual nodes enables loose coupling between data and actual nodes

Replication
Clustering
Riak APIs provide free choice between consistency and availability. You can select different policies based on your application scenarios. When you initially store data to Riak, you can configure the replication mode by bucket. In subsequent read/write operations, you can set the number of copies each time.
Reading, writing, and updating data
Hbase is a typical eventual consistency implementation. Data replication is implemented through the master push to slave. Recently, hbase has also added the master-master implementation.

Replication
Scalability Riak supports dynamically adding and deleting nodes. All nodes are equal and there is no difference between the master and slave nodes. After a node is added to the Riak, the cluster discovers the node through gossiping and allocates the corresponding data range for data migration. The process of removing a node is the opposite. Riak provides a series of tools to add and delete nodes.

Adding and removing nodes
Command Line tools
Hbase performs sharding in the unit of regions. The region splits and merges and is automatically allocated among multiple nodes. Regions

Node Management
Hbase Architecture
Data synchronization between multiple data centers Only the Riak Enterprise Edition supports the deployment of multiple data centers. Generally, users only support the deployment of a single data center.

Riak Enterprise
Hbase uses region for sharding, which naturally supports the deployment of multiple data centers.

Node Management
Graphical monitoring and management tools Starting from Riak 1.1.x, Riak released Riak control, an open-source graphical management tool for Riak.

Riak Control
Introducing Riak Control
Hbase has some graphical tools developed by the open-source community and a command line control terminal.

Admin Console tools
Eclipse Dev plugin
Hbase Manager
Gui Admin

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.