Basic concepts of "DB" HBase

Source: Internet
Author: User
Tags cassandra

What's an hbase?
Before we say Hase is a guy, let's start by looking at two concepts, row-oriented storage and column-oriented storage. Row-oriented storage, I believe everyone should know that we are familiar with the RDBMS is this type, row-oriented storage database is mainly suitable for transactional requirements, or the storage system oriented to OLTP, but according to the CAP theory, traditional RDBMS, in order to achieve strong consistency, Synchronization through rigorous acid transactions, which results in system availability and Elasticity ofAspect greatly discounted, and the present many NoSQLProducts, including HBase, are a system of eventual consistency that sacrifices part of the consistency for high availability. As I said above, what is column-oriented storage? Hbase,casandra,bigtable are all part of the column-oriented storage DistributedStorage System. See here, if you do not understand what hbase is a thing, it doesn't matter, I summed up the next:

HBase is a column storage-oriented DistributedStorage System, it has the advantage of being able to achieve high performance ConcurrencyRead and write operations, and HBase also transparently splits the data so that the storage itself has a level Elasticity of


Two hbase data Model
Hbase,cassandra's data model is very similar, their ideas are from Google's bigtable, so the data model of the three is very similar, the only difference is Cassandra with Super Cloumn family concept, And HBase I didn't find out at the moment. Okay, let's talk less, and we'll see what the HBase data model is.

In HBase there are the following two main concepts, Row Key,column Family, we first look at column Family,column Family Chinese aka "Column Family", column Family is pre-defined before the system starts, Each column family can have more than one column according to the qualifier. Let's take an example and it will be very clear.

If there is a user table in the system, if you follow the traditional RDBMS, the columns in the user table are fixed, such as the schema defines the attributes such as Name,age,sex, the user's properties cannot be dynamically incremented. But if we use a columnstore system, such as HBase, then we can define the user table and define the Info column family, and the user data can be divided into: Info:name = Zhangsan,info:age=30,info:sex=male, etc. If you want to add another property later, it's convenient to just info:newproperty.

Perhaps the previous example is not clear enough, let us give an example to explain, familiar with SNS friends, should know that there is a friend feed, the general design feed, we are in accordance with "someone in a certain time to do the title of something," but in general we will also set aside the key words, For example, sometimes the feed may need to url,feed the image property, etc., so that the property of the feed itself is indeterminate, so if the traditional relational database will be very cumbersome, and the relational database will cause some of the null unit waste, and Columnstore will not have this problem, In HBase, if each column element has no value, it takes up space. Below we have two images to represent this relationship:





is a traditional RDBMS design feed table, we can see how many columns of the feed are fixed, can not be increased, and the null column wasted space. But we look at, for the hbase,cassandra,bigtable data Model diagram, can be seen from the Feed table column can be dynamically increased, and empty columns are not stored, which greatly saves space, the key is the feed this thing with the system running, A variety of feeds will appear, and we have no way to predict how many feeds we have in advance, so there is no way to determine how many columns The feed table has, so Hbase,cassandra,bigtable's Columnstore-based data model is perfect for this scenario. Here, the use of hbase in this way, there is a very important benefit is that the feed will be automatically segmented, when the data in the feed table exceeds a certain threshold, hbase will automatically slice the data for us, so that the query has a Elasticity of, and with the weak transactional nature of HBase, the write operation to HBase will also become very fast.




It said column family, then I said the row key is what, in fact, you can understand that row key is the primary key of a row in the RDBMS, but because HBase does not support conditional query and order by queries, so row Key design will be based on your system's query requirements to design the amount. I also take the example of the feed, we generally query some of the latest feed, so we feed the row key can have the following three parts constitute <userid><timestamp><feedid> Since then we can specify start Rowkey to <userid><0><0>,end Rowkey to <userid><long.max_ when we want to query the most advanced feed of a person value><long.max_value> to query, and because the records in HBase are sorted by Rowkey, this makes the query very fast.


Advantages and disadvantages of three hbase
1 columns can be dynamically increased and listed as empty without storing data, saving storage space.

2 hbase automatically splits data so that the data store automatically has a horizontal scalability.

3 HBase can provide high ConcurrencySupport for read and write operations

Disadvantages of HBase:

1 cannot support conditional queries, only query by row key is supported.

2 cannot support failover of master server temporarily, and when Master goes down, the entire storage system hangs up.



About the database Elasticity ofA little information:
http://www.jurriaanpersyn.com/archives/2009/02/12/database-sharding-at-netlog-with-mysql-and-php/

http://adam.blog.heroku.com/past/2009/7/6/sql_databases_dont_scale/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.