Recently focused on Hadoop, so I've also been looking at Hadoop related projects. HBase is an Open-source project based on Hadoop and an implementation of Google's bigtable.
What is BigTable? Google's monitors the full explanation. Literally is a big table, in fact, and we imagine the traditional database table is still somewhat different. Loose data can be said to be a data between the map Entry (key & Value) and DB row. When I use memcache, sometimes the need is to store more than just a simple key corresponding to a value, may I need similar to the database table structure of the multiple attributes of the storage, but there is no traditional database table structure so many related to the needs of the relationship, In fact, this kind of data is called loose data. BigTable The most superficial view is a very large table, the table's properties can be dynamically increased according to demand, but there is no table and table associated with the query needs.
Internet applications have one of the biggest features, is the speed, function again strong, slow, or will be discarded. Therefore, a large number of visits to the site are taken before and after the cache to improve performance and response time. For the map entry type of data, centralized distributed cache has a lot of choices, for traditional relational data, from MySQL to Oracle has been very good support, only loose data such data, the use of both the two solutions can not maximize its processing capacity. So BigTable has it.
HBase as an open source project for Apache is also out of the starting stage, because the Hadoop that it relies on cannot be said to have matured, so there is a lot of room for development, which also provides us with more space for these open source enthusiasts to contribute. Here the main talks to HBase's framework design knowledge and some of its characteristics, whether or not to use hbase to solve the problems in the work, a good process design will always give developers and architects a number of ideological sparks.
HBase Design Introduction Data Model
Every table in the hbase is called BigTable. BigTable stores a series of row records with three basic types of definitions: row Key,time stamp,column. Row key is the unique identifier of the row in BigTable, and time stamp is the corresponding timestamp for each data operation, and can be considered an SVN version, column defined as: <FAMILY>:<LABEL> These two sections allow you to uniquely specify a storage column for a single data, and the definition and modification of accessibility requires a HBase DDL operation, and for the use of labels, you do not need to define the direct use, which also provides a means for dynamically customizing columns. Accessibility another effect is that physical storage optimizes read and write operations, and the data physically stored with the accessibility is relatively close, so you can use this feature in the business design process.
Look at the logical data model:
Row Key
Time Stamp
Column "Contents:"
Column "Anchor:"
Column "MIME:"
"Com.cnn.www"
T9
"Anchor:cnnsi.com"
"CNN"
T8
"Anchor:my.look.ca"
"CNN.com"
T6
"<html> ..."
"HTML"
T5
"<html> ..."
T3
"<html> ..."
There is a column in the table, the column is uniquely identified as COM.CNN.WWW, and each logical modification has a timestamp association corresponding to a total of four column definitions:<contents:>,<anchor:cnnsi.com>,< Anchor:my.look.ca>,<mime:>. If the traditional concept of bigtable to explain, then bigtable can be considered as a DB Schema, each row is a table, row key is the table name, the table according to the different columns can be divided into multiple versions, Also, each version of the operation has a timestamp associated with the action's line.
Take a look at HBase's physical data model:
Row Key