Basic concepts of "DB" HBase

Last Update:2016-04-01 Source: Internet

Author: User

Tags cassandra

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

What's an hbase?
Before we say Hase is a guy, let's start by looking at two concepts. Row-oriented storage and column-oriented storage. Row-oriented storage. I believe everyone should know that this is the type of RDBMS we are familiar with. Row-oriented storage is primarily suitable for transactional requirements, or for line-based storage systems that are suitable for OLTP. But according to cap theory, the traditional RDBMS. In order to achieve strong consistency, synchronization is performed through rigorous acid transactions, which results in the availability of the system and Elasticity ofGreatly discounted. And so much for now. NoSQLProducts, including HBase, are finally consistent systems that sacrifice part of the consistency for high availability. It seems like. I said above for Columnstore, so what exactly is column-oriented storage? Hbase,casandra,bigtable are all part of the column-oriented storage DistributedStorage System.

See here, assuming you are not clear what hbase is, it doesn't matter, I will summarize the following:

HBase is a column storage-orientedDistributedStorage System. Its strength lies in its ability to achieve high-performanceConcurrencyRead and write operations, the same time hbase will also be transparent data segmentation, so that the storage itself has a levelElasticity of。

Two hbase data Model
The Hbase,cassandra data model is similar. Their ideas are all based on Google's bigtable, so the data model of the three is very similar, the only difference is that Cassandra has super Cloumn family concept, and hbase now I do not find. All right. Say less nonsense. Let's take a look at what the HBase data model really is.

There are two basic concepts in HBase, Row key,column Family. We first look at column Family,column family Chinese aka "Row Family", column family is pre-defined before the system starts, each column family can be based on "qualifier" There are more than one column. Here's a sample we'll be very clear about.

If there is a user table in the system. Suppose, according to the traditional RDBMS. The columns in the user table are fixed, for example the schema defines attributes such as Name,age,sex. The user's properties are not dynamically added. But suppose a columnstore system is used. For example HBase. Then we can define the user table and then define the info column family. User data can be divided into: Info:name = Zhangsan,info:age=30,info:sex=male and so on. Let's say you want to add another attribute later. This is very convenient just need info:newproperty to be able.

Perhaps the previous example is not clear enough, let's give a sample to explain. Familiar with SNS friends, should all know that there is a friend feed, general design feed, we are in accordance with "someone did a title for something," but at the same time generally we will also reserve a keyword, for example, sometimes the feed may need to url,feed image properties, etc. , so to say. The properties of the feed itself are indeterminate. So it would be cumbersome to assume a traditional relational database. Moreover, the relational database causes some null units to be wasted, and the Columnstore does not have this problem. In HBase, assuming that each column element has no value, it takes up space.

Here we look at two images to represent such relationships:

id=10413&oid=23127383 "target=" _blank "style=" text-decoration:none; Color:rgb (102,102,102); Font-weight:lighter ">

is a traditional RDBMS design feed table, we can see how many columns of the feed are fixed, can not be added, and the null column wasted space.

But we'll see. For the hbase,cassandra,bigtable data Model diagram, it can be seen that the column of the feed table can be dynamically added. And the empty column is not stored, this greatly saves space, the key is the feed this thing with the system implementation. A variety of feeds will appear, and we have no way to anticipate how many feeds we have, so there is no way to determine how many columns The feed table has, so Hbase,cassandra,bigtable's Columnstore-based data model is a good fit for this scenario. This is the way to use HBase. Another important advantage is that the feed will be sliced on its own initiative. When the data in the feed table exceeds a certain threshold. HBase will take its own initiative to slice the data for us, in which case the query is scalable . In addition to the weak transactional nature of HBase, the write operation to HBase will also become very fast.

id=10414&oid=23127383 "border=" 0 "style=" border:0px ">

It says column family. So what I said before about the row key is that you can actually understand that row key is the primary key for a row in the RDBMS. However, because HBase does not support queries such as conditional queries and order BY, the design of row key is based on your system's query requirements. I also take the example of the feed, we usually query a person's latest feed, so our feed's row key can have the following three parts to form <userid><timestamp><feedid> In this way, we can specify start Rowkey as <userId><0><0> when we want to query the most advanced feed of a person. End Rowkey for <userId><Long.MAX_VALUE><Long.MAX_VALUE>, at the same time because the records in HBase are sorted according to Rowkey, This makes the query faster.

Advantages and disadvantages of three hbase
1 columns can be dynamically added, and the column is empty without storing data, saving storage space.

2 HBase is actively slicing data itself. Makes the data storage self-active with horizontal scalability.

3 HBase can provide high ConcurrencySupport for read and write operations

Disadvantages of HBase:

1 cannot support conditional queries. Only support is queried according to row key.

2 temporarily does not support failover of master server, and when Master goes down, the entire storage system hangs up.

About the database Elasticity ofA little information:
http://www.jurriaanpersyn.com/archives/2009/02/12/database-sharding-at-netlog-with-mysql-and-php/

http://adam.blog.heroku.com/past/2009/7/6/sql_databases_dont_scale/

Basic concepts of "DB" HBase

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More