Cassandra vs. hbase

Source: Internet
Author: User
Tags cassandra
Document directory
  • Advantages and disadvantages of Cassandra

Reprinted: http://hi.baidu.com/qnuth/blog/item/8720811ff79bca11314e15da.html

Because the data models of hbase and Cassandra are very similar, we will not compare the data models between them here. Next we will mainly compare the data consistency and multi-Copy Replication features of both parties.

Hbase

Hbase ensures write consistency. When a copy of data is required to be copied in N copies, the client will return a successful result only when n copies of data are actually copied to N servers. If a copy fails, all copies fail. No client connected to any server can see the copied data. Hbase provides row locks, but does not provide multi-row locks and transactions. Hbase is based on HDFS, so the multi-copy data replication function and reliability will be provided by HDFS. Hbase and mapreduce are naturally integrated.

Cassandra

You can select multiple modes when writing data. When a data copy mode is required to copy n copies, it can be returned immediately. It can be successfully copied to one server and then returned. It can be returned after all copies are copied to N copies of the server, you can also set a copy to the quorum server and then return. Quorum will be explained in detail later. Replication will not fail. In the end, all node data will be written. In contrast, clients connected to different servers may read different data within the time interval of not fully written. In a cluster, all servers are equivalent. No single point of failure exists. Nodes communicate with each other through the gossip protocol. The write order is sorted by timestamp, and row locks are not provided. The new version of Cassandra has integrated mapreduce.

Compared with Cassandra, configuring hbase is a tough and complex task. Facebook's question about why hbase was adopted is, facebook has been paying attention to hbase development for a long time and has a dedicated experienced hbase maintenance team to install and maintain hbase. As you can imagine, Facebook has had a fierce fight against the use of hbase and Cassandra, and more hbase teams have the upper hand. For large companies, raising a relatively large team similar to DBA to maintain hbase is not a huge overhead, but for small companies, this is not an affordable overhead.

In addition, hbase has a major defect in high reliability, that is, hbase depends on HDFS. HDFS is a replica of Google file system, and namenode is the single point of failure of HDFS. So far, HDFS has not added the self-recovery function of namenode. However, I believe that Facebook has some internal means to restore namenode, but it is not open-source.

On the contrary, Cassandra's P2P and decentralized design does not have the possibility of spof. From the design point of view, Cassandra is more reliable than hbase.

With regard to data consistency, Cassandra can also achieve the same consistency as hbase at the cost of response time. In addition, by setting the appropriate quorum, a good compromise value can be obtained in response time and data consistency.

Advantages and disadvantages of Cassandra

Mainly manifested in:

The configuration is simple and does not require multi-module collaboration. Strong functional flexibility. You can set different data consistency and performance based on different applications. Higher Reliability without spof.

Nevertheless, Cassandra has no weakness? Of course not. Cassandra has a fatal weakness.

This is the storage of large files. Although Cassandra was originally designed not to store large files, Amazon S3 is actually built based on dynamo, which will always make Cassandra think about storing large files. Unlike Cassandra, hbase is based on HDFS. HDFS is designed to store ultra-large-scale files and provide maximum throughput and the most reliable access. Therefore, because Cassandra is not a file system similar to HDFS that stores large files (hundreds of TB or even P) is currently powerless. In addition, even if the client is manually split, this is actually very unknown and consumes the client
CPU.

Therefore, if we want to build a search engine similar to Google, at least HDFS is essential to us. Although the HDFS namenode is still a single point of failure, the corresponding hack can make namenode more accurate. Hbase Based on HDFS is more suitable for inverted index databases behind search engines. In fact, the combination of Lucene and hbase is much smoother and more efficient than the integration of Lucene with the Cassandra project lucandra. (Lucandra requires Cassandra to use orderpreservingpartitioner, which may result in uneven distribution of keys, rather than load balancing, resulting in access to hotspot machines ).

 

So my conclusion is that in this age of diversified needs, there was no such thing as a winner. In addition, I am increasingly reluctant to believe that there are permanent and static solutions in the engineering field.When you only store massive-growing message data and massive-growing images and small videos, you must not lose data. You must maintain as few images as possible, if you want to quickly expand storage by adding machines, there is no doubt that Cassandra has the upper hand.

HoweverIf you want to build a very large-scale search engine and generate a very large inverted index file (of course, logical files, real files are actually split and stored on different nodes ), now, HDFS + hbase is your first choice.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.