Preliminary knowledge of Hadoop:hbase loose data storage Design

Source: Internet
Author: User
Keywords nbsp yes yes nbsp yes.
Tags anchor based bigtable data data model data storage design different
Recently focused on Hadoop, so I've also been looking at Hadoop related projects. HBase is an Open-source project based on Hadoop and an implementation of Google's bigtable.

What is BigTable? Google's monitors the full explanation. Literally is a big table, in fact, and we imagine the traditional database table is still somewhat different. Loose data can be said to be a data between the map Entry (key & Value) and DB row. When I use memcache, sometimes the need is to store more than just a simple key corresponding to a value, may I need similar to the database table structure of the multiple attributes of the storage, but there is no traditional database table structure so many related to the needs of the relationship, In fact, this kind of data is called loose data. BigTable The most superficial view is a very large table, the table's properties can be dynamically increased according to demand, but there is no table and table associated with the query needs.

Internet applications have one of the biggest features, is the speed, function again strong, slow, or will be discarded. Therefore, a large number of visits to the site are taken before and after the cache to improve performance and response time. For the map entry type of data, centralized distributed cache has a lot of choices, for traditional relational data, from MySQL to Oracle has been very good support, only loose data such data, the use of both the two solutions can not maximize its processing capacity. So BigTable has it.

HBase as an open source project for Apache is also out of the starting stage, because the Hadoop that it relies on cannot be said to have matured, so there is a lot of room for development, which also provides us with more space for these open source enthusiasts to contribute. Here the main talks to HBase's framework design knowledge and some of its characteristics, whether or not to use hbase to solve the problems in the work, a good process design will always give developers and architects a number of ideological sparks.

HBase Design Introduction Data Model

Every table in the hbase is called BigTable. BigTable stores a series of row records with three basic types of definitions: row Key,time stamp,column. Row key is the unique identifier of the row in BigTable, and time stamp is the corresponding timestamp for each data operation, and can be considered an SVN version, column defined as: <FAMILY>:<LABEL> These two sections allow you to uniquely specify a storage column for a single data, and the definition and modification of accessibility requires a HBase DDL operation, and for the use of labels, you do not need to define the direct use, which also provides a means for dynamically customizing columns. Accessibility another effect is that physical storage optimizes read and write operations, and the data physically stored with the accessibility is relatively close, so you can use this feature in the business design process.

Look at the logical data model:

Row Key

Time Stamp

Column "Contents:"

Column "Anchor:"

Column "MIME:"

"Com.cnn.www"

T9

"Anchor:cnnsi.com"

"CNN"

T8

"Anchor:my.look.ca"

"CNN.com"

T6

"<html> ..."

"HTML"

T5

"<html> ..."

T3

"<html> ..."

There is a column in the table, the column is uniquely identified as COM.CNN.WWW, and each logical modification has a timestamp association corresponding to a total of four column definitions:<contents:>,<anchor:cnnsi.com>,< Anchor:my.look.ca>,<mime:>. If the traditional concept of bigtable to explain, then bigtable can be considered as a DB Schema, each row is a table, row key is the table name, the table according to the different columns can be divided into multiple versions, Also, each version of the operation has a timestamp associated with the action's line.

Take a look at HBase's physical data model:

Row Key

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.