Cassandra vs hbase
By vaibhav puranik
Translated by jametong
We are an advertising network company. we need to store the display and click information. we are evaluating multiple different mass data (or nosql, or whatever you like) systems for our new project. past 8
We have been using hbase on a test product for the past month and are satisfied with its performance. However, Cassandra has been very popular recently, so we decided to test it. I think, from
In some ways, the Cassandra team is doing a great job of promotion. You will find that in Santa
Monica, even non-technical staff (such as venture capitalists, CEOS, and product managers) will recommend Cassandra to each other.
Cassandra makes a good first impression. Their homepage looks more professional and friendly than hbase. It is also very simple to install and run it. There are many documents on this website. To be honest, install
It took me 5 minutes to work.
The real challenge is to understand Cassandra's data model and try to implement it in our application scenarios. We know how to implement it in hbase, because we have a pretty good experience with hbase.
Although Cassandra inherits the same data model from bigtable, there are some fundamental differences between Cassandra and hbase.
The following figure shows the differences between the two systems in a table:
Cassandra |
Hbase |
A table-like concept is missing. All documents tell you that multiple keyspaces are uncommon. This means You must share the same key space in a cluster. In addition, to add a keyspace, you must restart the cluster to make it take effect. |
Table-related concepts exist. It has its own key space. This is very important for us. It is easy to add/Delete tables, just like in RDBMS. |
Enable Use the string key. UUID is usually used as the key. If you want your data to be sorted by time, you can use timeuuid. |
Binary key. Usually Combine three different projects to build a key. This means that you can search for multiple keys in a given table. |
Even if you use Timeuuid does not cause hot issues, because Cassandra performs load balancing on client requests. |
If the first part of the key is time or number of sequences, A hot issue occurs. All new keys are inserted into the same region until the region is full (which leads to hot issues ). |
Supported Column sorting |
Column sorting is not supported. |
Super The column concept allows you to design a very flexible and complex table structure. |
Super columns are not supported. However, you can design a structure similar to super columns, but the column names and values are both Binary. |
There is no convenient way to increase the value of a column. In fact, different features of final consistency make the update/Write a record and It is very difficult to read the data immediately. Make sure that R + W> N is used to achieve strong consistency. |
Because the design is consistent, it provides a very convenient method from the increase counter. It is very suitable for data summarization. |
Support for map at the beginning Reduce interface. You also need a hadoop cluster to run it. You need to migrate data from the Cassandra cluster to the hadoop cluster. It is not suitable for running map on large data. Reduce task. |
The support for map reduce is native. hbase is built on the hadoop cluster. data does not need to be migrated. |
If hadoop is not required, the maintenance is relatively simple. |
Because it contains multiple such as zookeeperr, hadoop, and The maintenance of the movable components of hbase is relatively complicated. |
So far, no localized Java API support. There is no Java document. Although it is written in Java, you must use the thrift interface to communicate with the cluster. |
A friendly local Java Api. It is more like a java system than Cassandra. Because our applications are based on Java, this is very important to us. |
No There is a master node, so there is no single point of failure. |
Although there is a master node service in concept, hbase itself is not heavily dependent on it, even if the master node goes down The hbase cluster can still provide data services normally. hadoop's namenode is a single point of failure. |
After comparing the data model and related features in this way, hbase is a clear winner for us. in my opinion, if you really need consistency, hbase is an obvious choice. more
Further, the localized Map
Reduce support, table concepts, and a simple table structure that can be modified without restarting the cluster are some of the extra points you should not ignore. hbase is a more mature platform. When people talk about Twitter,
When Facebook was using Cassandra, they forgot that these companies are also using hbase. In fact, Facebook recently hired an hbase code submission
(Commiter), which clearly shows Facebook's interest in hbase.
In short, we fully support hbase !!