Comparison between HBase and traditional relational databases

Source: Internet
Author: User
Tags cassandra

Before I talk about HBase, I'd like to say a few more words. Do the Internet application of the Buddies should be clear, the Internet application of this thing, you can not predict when your system will be how many people to visit, you face the number of users, perhaps today your users are less, tomorrow system users become more, the result of your system can not cope with it, not dry, This is not a few of my brother's sorrow, said the fashionable point is called "Cup with AH."

In fact, these are not clear in advance what the Internet application is the most important. From a system architecture perspective, Internet applications are more focused on system performance and scalability, while traditional enterprise applications are more focused on data integrity and data security. So let's talk about the scalability of Internet applications. For the scalability of this matter, I also wrote a few blog posts, want to see the brothers can refer to my previous blog post, for the Web Server,app server scalability, I do not say here first, Because this part of the scalability is relatively easy, I mainly look at some of the slowly growing Internet applications how to deal with the database layer of scaling.

First of all, people are not much, the pressure is not big, engage a database server is done, at this time all the stuff is stuffed into a server, including Web Server,app server,db server, but with more and more people, system pressure more and more, This time you may have the Web Server,app server and DB server separated, at least this can be dealt with for a while, but with the increasing number of users, you will find that the database this buddy is not, the speed of the old slow, and sometimes will be down, so this time, You have to give the database this buddy to find some company, this time Master-salve appeared, this time there is a master server dedicated to receive write operations, the other several salve server dedicated to read, so master this buddy finally don't complain, Finally read and write separation, the pressure is finally light, this time in fact, the main is the read operation has been horizontally expanded, by adding multiple salve to overcome the query CPU bottleneck. Generally so down, your system can cope with a certain pressure, but as the number of users increased, the pressure continues to increase, you will find the master server this buddy's writing pressure or change too big, no way, how to do this time? You have to slice ah, as the saying goes, "Only the segmentation, there will be scalability", so ah, this time can only be divided into libraries, which is also what we often say the database "vertical segmentation", such as some unrelated data stored in different libraries, deployed separately, so that finally can take part of the read and write pressure, Master can be a little easier, but as the data grows, the data in your database tables becomes very large, so the query is very inefficient and requires "horizontal partitioning", for example, by dividing the data in the user table by 10W, so that each table does not exceed 10W.

In summary, generally a popular web site will go through a single db, to master-slave replication, vertical partitioning to the horizontal partition of the painful process. In fact, the database segmentation this matter, looks like the principle seems very simple, if really do, I would like to all sharding the database of the Buddies are deeply suffering ah. For database scaling articles, Buddy can look at the following references.

Well, from the pile of crap above, we also found that the database storage level expansion scale out is a very painful thing, but fortunately technology is progressing, the industry's other brothers are working hard, 09 this year there are a lot of nosql database, more accurate should say no The relation database, which mostly provides a transparent level of expansion of unstructured data, greatly reduces the pressure on your buddy design. Here I take hbase this distributed Columnstore system says.

What's an hbase?
Before we say Hase is a guy, let's start by looking at two concepts, row-oriented storage and column-oriented storage. Row-oriented storage, I believe everyone should know that we are familiar with the RDBMS is this type, row-oriented storage database is mainly suitable for transactional requirements, or the storage system oriented to OLTP, but according to the CAP theory, traditional RDBMS, in order to achieve strong consistency, Synchronization through rigorous acid transactions results in significant discounts on the availability and scalability of the system, and many of the current NoSQL products, including hbase, are ultimately consistent systems that sacrifice part of the consistency for high availability. As I said above, what is column-oriented storage? Hbase,casandra,bigtable is a distributed storage system for Columnstore. See here, if you do not understand what hbase is a thing, it doesn't matter, I summed up the next:

HBase is a columnstore-oriented distributed storage system that has the advantage of achieving high-performance concurrent read and write operations while HBase transparently splits the data so that the storage itself has horizontal scalability.


Two hbase data Model
Hbase,cassandra's data model is very similar, their ideas are from Google's bigtable, so the data model of the three is very similar, the only difference is Cassandra with Super Cloumn family concept, And HBase I didn't find out at the moment. Okay, let's talk less, and we'll see what the HBase data model is.

In HBase there are the following two main concepts, Row Key,column Family, we first look at column Family,column Family Chinese aka "Column Family", column Family is pre-defined before the system starts, Each column family can have more than one column according to the qualifier. Let's take an example and it will be very clear.

If there is a user table in the system, if you follow the traditional RDBMS, the columns in the user table are fixed, such as the schema defines the attributes such as Name,age,sex, the user's properties cannot be dynamically incremented. But if we use a columnstore system, such as HBase, then we can define the user table and define the Info column family, and the user data can be divided into: Info:name = Zhangsan,info:age=30,info:sex=male, etc. If you want to add another property later, it's convenient to just info:newproperty.

Perhaps the previous example is not clear enough, let us give an example to explain, familiar with SNS friends, should know that there is a friend feed, the general design feed, we are in accordance with "someone in a certain time to do the title of something," but in general we will also set aside the key words, For example, sometimes the feed may need to url,feed the image property, etc., so that the property of the feed itself is indeterminate, so if the traditional relational database will be very cumbersome, and the relational database will cause some of the null unit waste, and Columnstore will not have this problem, In HBase, if each column element has no value, it takes up space. Below we have two images to represent this relationship:





is a traditional RDBMS design feed table, we can see how many columns of the feed are fixed, can not be increased, and the null column wasted space. But we look at, for the hbase,cassandra,bigtable data Model diagram, can be seen from the Feed table column can be dynamically increased, and empty columns are not stored, which greatly saves space, the key is the feed this thing with the system running, A variety of feeds will appear, and we have no way to predict how many feeds we have in advance, so there is no way to determine how many columns The feed table has, so Hbase,cassandra,bigtable's Columnstore-based data model is perfect for this scenario. In this case, the use of hbase this way, there is a very important benefit is that the feed will be automatically segmented, when the data in the feed table exceeds a certain threshold, hbase will automatically slice the data for us, so that the query has scalability, coupled with the weak transactional characteristics of hbase, Write operations to HBase will also become very fast.

It said column family, then I said the row key is what, in fact, you can understand that row key is the primary key of a row in the RDBMS, but because HBase does not support conditional query and order by queries, so row Key design will be based on your system's query requirements to design the amount. I also take the example of the feed, we generally query some of the latest feed, so we feed the row key can have the following three parts constitute <userid><timestamp><feedid> Since then we can specify start Rowkey to <userid><0><0>,end Rowkey to <userid><long.max_ when we want to query the most advanced feed of a person value><long.max_value> to query, and because the records in HBase are sorted by Rowkey, this makes the query very fast.


Advantages and disadvantages of three hbase
1 columns can be dynamically increased and listed as empty without storing data, saving storage space.

2 hbase automatically splits data so that the data store automatically has a horizontal scalability.

3 HBase provides support for high concurrency read and write operations

Disadvantages of HBase:

1 cannot support conditional queries, only query by row key is supported.

2 cannot support failover of master server temporarily, and when Master goes down, the entire storage system hangs up.

Four. Supplement

1. Data types, HBase has only a simple character type, all types are left to the user to handle, it only saves the string. The relational database has rich types and storage methods.
2. Data manipulation: HBase is simple to insert, query, delete, empty, and so on, the table and table are separated, there is no complex relationship between tables and tables, and traditional databases usually have a variety of functions and connection operations.
3. Storage mode: HBase is a column-based store, and each column family is saved by several files, separated by different column family files. The traditional relational database is saved based on the table structure and the row pattern.
4. Data maintenance, HBase Update operation should not be called update, it is actually inserting new data, and traditional database is replacing modify
5. Scalability, hbase this kind of distributed database is developed for this purpose, so it can easily increase or decrease the number of hardware, and the compatibility of the error is relatively high. Traditional databases typically require an additional middle tier to achieve similar functionality

The following is a comparison of the differences with detailed actual operations


1.nosql Database can delete columns
How a 2.nosql database deletes a record
What is the difference between a 3.nosql database column family and a lieder?
What is the difference between 4.nosql operations and traditional database operations?




For most people who do technology, we know what our traditional database looks like, so the object we manipulate is the line, as shown.
That is, adding and deleting changes, are the object.

1. Introduction to traditional database additions and deletions
Figure 1
Let's take MySQL for example:



Inserting Data
Mysql>insert into Blog_user (' user_name ', ' user_password ', ' user_emial ') VALUES (' Aboutyun ', ' Aboutyun ', ' [email Protected] ');



Delete data:
    1. Mysql> Delete from Blog_user where user_name= "Aboutyun";
Copy Code


2.Nosql Database Add Delete Introduction


Figure 2
Take HBase as an example:
To create a table:
    1. Create ' Blog_user ', ' userInfo '
Copy Code



Inserting data
This is a key point, and it's a place that many people don't understand easily.
    1. HBase (main):012:0> put ' blog_user ', ' www.aboutyun.com ', ' userinfo:user_name ', ' Aboutyun '
    2. 0 row (s) in 1.7530 seconds
Copy Code
We saw it up there.
What is shown in 1, we do not have in the traditional data block, which is unique to NoSQL, is a rowkey, is a system comes with, and is a unique identifier of a record in NoSQL. But this unique identity is somewhat different from our traditional database. As shown in 1, "Record 1" is Rowkey.

2 shows the column user_name we inserted, which is also the most difficult to understand, the column can be inserted. and its ' value ' is 3 or ' Aboutyun '

We've inserted the column, let's look at the effect:



Here's what the above means:
We'll see
1 for Rowkey, insert data ' www.aboutyun.com ',
2 is the name of the column family under column user_name
3 We did not add this column family in the design, so this is the system comes with, this is the record of the operation time, in the form of timestamps into hbase inside.
4 is the value of the user_name we inserted.

Below we are inserting password:
    1. HBase (main):015:0> put ' blog_user ', ' www.aboutyun.com ', ' Userinfo:user_password ', ' Aboutyun '
Copy Code



Query results again:
  1. HBase (main):016:0> scan ' Blog_user '
  2. ROW Column+cell
  3. Www.aboutyun.com Column=userinfo:user_name, timestamp=1400663775901, Value=aboutyun
  4. Www.aboutyun.com Column=userinfo:user_password, timestamp=1400665203430, Value=aboutyun
  5. 1 row (s) in 0.0390 seconds
Copy Code


Here we see two rows of records, traditional chunks think that this is two rows of data, and for NoSQL, this is a record.


Delete column data

Delete data into delete columns and delete records
1. Delete Columns
This inside of the delete, did not delete
Delete ' Blog_user ', ' www.aboutyun.com ', ' Userinfo:user_password '


From above, we see that the column was deleted.
2. Delete records:
    1. DeleteAll ' Blog_user ', ' www.aboutyun.com '
Copy Code
This is deleted before the results are displayed, here is already


Results after deletion





Summarize
For traditional databases, adding columns to a project, the change is very large. But for NoSQL, inserting columns and deleting columns is similar to adding records and deleting records in a traditional database

Comparison between HBase and traditional relational databases

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.