Hbase Cassandra Riak hypertable

Last Update:2014-08-07 Source: Internet

Author: User

Tags cassandra riak hypertable

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

	Cassandra hbase
Consistency	Quorum NRW Policy Synchronizes Merkle tree using the gossip Protocol to maintain data consistency between cluster nodes.	Single Node, no replication, Strong Consistency
Availability	1. Data is replicated based on the consistent hash adjacent nodes. The data exists in multiple nodes and is not spof. 2. If a node goes down, new data from hash to the node is automatically routed to the next node for hinted handoff. After the source node recovers, It is pushed back to the source node. 3. Maintain the health status of all nodes in the cluster through the gossip protocol, send synchronous requests, and maintain data consistency. 4. sstable, pure file, and general single-host reliability.	1. There is a single point of failure. After the region server is down, the region maintained by the server cannot be accessed within a short period of time. Wait until the Failover takes effect. 2. Maintain the health status and region distribution of each region server through the master. 3. for multiple master nodes, the master node is down and the paxos voting mechanism of zookeeper is used to select the next master. Even if the master is down, the region read and write operations are not affected. The master only acts as an automatic O & M role. 4. HDFS is a distributed storage engine. It features high reliability and zero data loss. 5. HDFS namenode is a spof.
Scalability	1. Consistent hash to quickly locate the node where the data is located. 2. Data distribution needs to be adjusted between multiple nodes on the hash ring.	1. Locate the target region server through zookeeper, and then locate region. 2. The region server is resized. By releasing itself to the master, the master is evenly distributed.
Load Balancing Heng	Request zookeeper to obtain the whole cluster address, and then select the appropriate node based on the consistent hash. The client caches the cluster address.	Request zookeeper to locate the region server from the read/write data route table. The Master will modify the route table. The client also caches some routing information.
Data difference comparison algorithm	Merkle tree, Bloom Filter	Bloom Filter
Lock and transaction	Client timestap (DYNAMO uses vector lock)	Optimistic Concurrency Control
Read/write Performance	Fast data read/write positioning.	Data read/write positioning may require a maximum of six network RPC times, with low performance.
Cap comment	1. Weak Consistency, and data may be lost. 2. high availability. 3. Easy resizing.	1. Strong Consistency, zero data loss. 2. low availability. 3. Easy resizing.

Big comparison
Both Riak and hbase are published based on Apache 2.0 licensed.
The implementation of Riak is based on Amazon dynamo, and hbase is based on Google's bigtable
Riak is mainly written in Erlang, including part of C, while hbase is written in Java.
Functional Comparison

Features	Riak	Hbase
Data Model	Riak uses a bucket as the namespace to store key-value information. Buckets, keys, and values	Hbase stores data according to the pre-defined column family structure (each piece of data contains a key and several column attribute values, and each column of data has its own version information ). Data in hbase is stored sequentially by column (unlike row-based relational databases ). Hbase Data Model Supported Data Types
Storage Engine	Riak uses the modular idea to mount the storage layer to the system as an engine. You can select different storage engines as needed. Storage engine supported by Riak You can even use Riak's backend API to implement your own storage engine.	Hbase exists on HDFS, and its data files exist in HDFS. Similar to bigtable, data storage is divided into memory-based memstore and stored storefiles. Its data file is called hfile, Which is sstable Based on bigtable. You can directly use JVM's file system Io operations to operate data files. HDFS Hadoop uses HDFS
Data access interface	In addition to using Erlang directly, Riak also provides two data access interfaces, rest mode and Protocol Buffer: HTTP Protocol Buffers Riak clients are implemented based on the above APIs, and currently have good support for mainstream languages. Client-libraries Community developed libraries and projects	Hbase mainly performs code execution in JVM. Hbase also provides external data access methods, including rest and thrift protocol access. Java Interface Rest Thrift
Data Operation Method	Riak supports the following operations: Perform direct operations on the primary key (get, put, delete, update) Mapreduce Mode Riak also provides secondary Indexes Riak search plug-in Comparison of the above methods	Hbase supports two data operations: scanning and querying ordered key values, obtaining value values, or performing mapreduce queries using powerful hadoop. Scanning Mapreduce Secondary Indexes
Data Consistency	Riak maintains data versions by means of vector clock to handle inconsistencies. You can also use the "Last-Write-wins" policy based on the timestamp instead of using the vector clock. Vector clocks Why vector clocks are easy Why vector clocks are hard	Hbase uses highly consistent read/write guarantees. Data is saved in multiple different region formats. Column families can contain an unlimited number of data versions, and each version can have its own TTL Consistent Architecture Time to live
Concurrency	All nodes in the Riak cluster can perform read and write operations at the same time. Riak is only responsible for Data Writing operations (saving with Version Control Based on Vector clock ), when reading data, define the processing logic of data conflicts.	Hbase ensures the atomicity of write operations through row-level locks, but does not support transactions of multi-row write operations. Data scan operations do not guarantee consistency. Consistency guarantees
Copy	The theoretical source of Riak's data replication system is Dynamo's thesis and Dr. Eric Brewer's cap theory. Riak uses consistent hash to partition data. The same data is saved and backed up in multiple nodes. With the theoretical support of consistent hash, Riak uses virtual nodes to replicate data and ensure balanced data distribution. Introducing virtual nodes enables loose coupling between data and actual nodes Replication Clustering Riak APIs provide free choice between consistency and availability. You can select different policies based on your application scenarios. When you initially store data to Riak, you can configure the replication mode by bucket. In subsequent read/write operations, you can set the number of copies each time. Reading, writing, and updating data	Hbase is a typical eventual consistency implementation. Data replication is implemented through the master push to slave. Recently, hbase has also added the master-master implementation. Replication
Scalability	Riak supports dynamically adding and deleting nodes. All nodes are equal and there is no difference between the master and slave nodes. After a node is added to the Riak, the cluster discovers the node through gossiping and allocates the corresponding data range for data migration. The process of removing a node is the opposite. Riak provides a series of tools to add and delete nodes. Adding and removing nodes Command Line tools	Hbase performs sharding in the unit of regions. The region splits and merges and is automatically allocated among multiple nodes. Regions Node Management Hbase Architecture
Data synchronization between multiple data centers	Only the Riak Enterprise Edition supports the deployment of multiple data centers. Generally, users only support the deployment of a single data center. Riak Enterprise	Hbase uses region for sharding, which naturally supports the deployment of multiple data centers. Node Management
Graphical monitoring and management tools	Starting from Riak 1.1.x, Riak released Riak control, an open-source graphical management tool for Riak. Riak Control Introducing Riak Control	Hbase has some graphical tools developed by the open-source community and a command line control terminal. Admin Console tools Eclipse Dev plugin Hbase Manager Gui Admin

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More