Cassandra vs hbase

Source: Internet
Author: User
Tags cassandra

Cassandra vs hbase

By vaibhav puranik
Translated by jametong

 

We are an advertising network company. we need to store the display and click information. we are evaluating multiple different mass data (or nosql, or whatever you like) systems for our new project. past 8
We have been using hbase on a test product for the past month and are satisfied with its performance. However, Cassandra has been very popular recently, so we decided to test it. I think, from
In some ways, the Cassandra team is doing a great job of promotion. You will find that in Santa
Monica, even non-technical staff (such as venture capitalists, CEOS, and product managers) will recommend Cassandra to each other.

Cassandra makes a good first impression. Their homepage looks more professional and friendly than hbase. It is also very simple to install and run it. There are many documents on this website. To be honest, install
It took me 5 minutes to work.

The real challenge is to understand Cassandra's data model and try to implement it in our application scenarios. We know how to implement it in hbase, because we have a pretty good experience with hbase.
Although Cassandra inherits the same data model from bigtable, there are some fundamental differences between Cassandra and hbase.
The following figure shows the differences between the two systems in a table:

Cassandra Hbase
A table-like concept is missing. All documents tell you that multiple keyspaces are uncommon. This means
You must share the same key space in a cluster. In addition, to add a keyspace, you must restart the cluster to make it take effect.
Table-related concepts exist.
It has its own key space. This is very important for us. It is easy to add/Delete tables, just like in RDBMS.
Enable
Use the string key. UUID is usually used as the key. If you want your data to be sorted by time, you can use timeuuid.
Binary key. Usually
Combine three different projects to build a key. This means that you can search for multiple keys in a given table.
Even if you use
Timeuuid does not cause hot issues, because Cassandra performs load balancing on client requests.
If the first part of the key is time or number of sequences,
A hot issue occurs. All new keys are inserted into the same region until the region is full (which leads to hot issues ).
Supported
Column sorting
Column sorting is not supported.
Super
The column concept allows you to design a very flexible and complex table structure.
Super columns are not supported. However, you can design a structure similar to super columns, but the column names and values are both
Binary.
There is no convenient way to increase the value of a column. In fact, different features of final consistency make the update/Write a record and
It is very difficult to read the data immediately. Make sure that R + W> N is used to achieve strong consistency.
Because the design is consistent, it provides a very convenient method from the increase counter.
It is very suitable for data summarization.
Support for map at the beginning
Reduce interface. You also need a hadoop cluster to run it. You need to migrate data from the Cassandra cluster to the hadoop cluster. It is not suitable for running map on large data.
Reduce task.
The support for map reduce is native. hbase is built on the hadoop cluster. data does not need to be migrated.
If hadoop is not required, the maintenance is relatively simple. Because it contains multiple such as zookeeperr, hadoop, and
The maintenance of the movable components of hbase is relatively complicated.
So far, no localized Java
API support. There is no Java document. Although it is written in Java, you must use the thrift interface to communicate with the cluster.
A friendly local Java
Api. It is more like a java system than Cassandra. Because our applications are based on Java, this is very important to us.
No
There is a master node, so there is no single point of failure.
Although there is a master node service in concept, hbase itself is not heavily dependent on it, even if the master node goes down
The hbase cluster can still provide data services normally. hadoop's namenode is a single point of failure.

After comparing the data model and related features in this way, hbase is a clear winner for us. in my opinion, if you really need consistency, hbase is an obvious choice. more
Further, the localized Map
Reduce support, table concepts, and a simple table structure that can be modified without restarting the cluster are some of the extra points you should not ignore. hbase is a more mature platform. When people talk about Twitter,
When Facebook was using Cassandra, they forgot that these companies are also using hbase. In fact, Facebook recently hired an hbase code submission
(Commiter), which clearly shows Facebook's interest in hbase.

In short, we fully support hbase !!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.