Hbase (1)-Data Model

Source: Internet
Author: User

 

What is bigtable? Google's paper gives a full description of it. Literally, it is a large table, which is actually different from the tables of traditional databases we imagine. Loose data is a data between map entry (Key & value) and DB row. When I use memcache, sometimes I need to store more than just a simple key that corresponds to a value. Maybe I need to store multiple attributes in a database table structure, however, there is no need for many associations in the traditional database table structure. In fact, such data is called loose data. Bigtable is a very large table. The attributes of a table can be dynamically increased as needed, but there is no need to associate queries between tables.

One of the biggest features of Internet applications is speed, powerful functionality, and slow speed. Therefore, high-traffic websites adopt cache to improve performance and response time. For map entry-type data, centralized distributed cache has many options. for traditional relational data, MySQL and Oracle provide good support. Only data such as loose data is available, neither of the two solutions can maximize the processing capability. Therefore, bigtable is useful.

 

Hbase
Is a scalable, distributed, and column-oriented dynamic mode database for structured data. It can effectively and reliably manage massive data (GB or more) distributed across thousands of commodity servers ).
Hbase is modeled based on Google's bigtable database. It is the hadoop of Apache Software Foundation.
Project.

 

The data stored in hbase is sparse (unstructured or semi-structured data ). Hbase is good at storing such data because hbase uses column-
Oriented column-oriented storage mechanism, while the well-known RDBMS is Row-
Oriented row-oriented storage mechanism (depressing: I have read n articles about relational databases and never mentioned row-
Oriented row-oriented storage ). The column-oriented storage mechanism does not occupy any space for null value storage. For example, if a table
Usertable has 10 columns, but only one column contains data during storage, then the nine columns with other null values do not occupy the storage space (how does a common database MySQL occupy the storage space ?).
Another reason why hbase is suitable for storing unstructured sparse data is that it processes the column set column families.
For example, what are the differences between dynamic languages like Ruby and Python and C ++ and Java compiling languages?
For me, the most obvious difference is that you do not need to specify a type for the variable. OK
Now hbase has brought this exciting feature to the DBA in the future. You just need to tell your data to the column families stored in hbase.
You do not need to specify the specific type: Char, varchar, Int, tinyint, text, and so on.

 

Hbase also has many features. For example, hbase does not support join queries, but you can use the Parent-Child tuple method to store the data in disguise.

 

 

Note: At the time of writing this article, the latest version of hbase is v0.19.3. The information provided in this Article applies to this version.

 

 

Data Model

 

 

 

Hbase data is modeled as multidimensional ing, where values (TableUnit) Using four key indexes:

 

value = Map(TableName, RowKey, ColumnKey, Timestamp)

 

Where:

  • TableNameIs a string.
  • RowKeyAndColumnKey
    Is a binary value (Java typebyte[]).
  • TimestampIs a 64-bit integer (Java typelong).
  • valueIs an uninterpreted byte array (Java typebyte[]).

Binary data is encoded as base64 for transmission over the network.

The row key is the table's primary key, usually a string. Rows are ordered alphabetically by the row key.

The structure of the information stored in the table isColumn family), You can consider this structureCategory. Each column family can have any numberMember, They pass throughTag(OrModifier) Recognition.columnKey Family Name,:And tags. For exampleinfoAnd Membersdate, Column key isinfo:date.

One hbase table mode defines multiple columns. However, when you insert a row into the table, the application can create a new member at runtime. For a column family, different rows in a table can have different numbers of members. In other words, hbase supports oneDynamic ModeModel.

 

Table 1 showsPersonsA simple example of an hbase table. The table has two column families:nameAndcontact.

 

Row key Timestamp Column family
Name Contact
000001 T3   Contact: HTTP research.google.com/people/jeff/
T2 Name: First Jeffrey  
T1 Name: Last Dean  
000002 T5 Name: First Gabriel  
T4 Name: Last mateescu  

 

An empty unit does not have a value associated with the unit key. In table 1(000002, contact:http, t4)The associated unit is empty. Empty cells are not stored in hbase. Reading empty cells is similar to extracting values from ing Based on nonexistent keys. Hbase tables are adapted in this waySparseLine.

For any row, only one member of one columnfamily can be accessed at a time (different from a relational database, in a relational database, a query can access multiple column units in one row ). You can treat a column family member in a rowChild row.

The table is divided into multiple tables.Region, Equivalent to bigtableTablet). A region contains rows in a certain range. Splitting a table into multiple regions is a key mechanism for efficient processing of large tables.

 

Each table in hbase is a bigtable. Bigtable stores a series of Row Records, which have three basic types: Row key, time stamp, and column. The row key is the unique identifier of a row in bigtable, and the time stamp is the timestamp associated with each data operation. It can be seen as a version similar to SVN, and the column is defined as: <family>: <label>, you can use these two parts to uniquely specify a data storage column. To define and modify a family, you must perform DB-like DDL operations on hbase, you do not need to define the columns and can use them directly. This also provides a means to dynamically customize columns. Another role of family is to optimize the read/write operations of physical storage. The physical storage of family data is approaching. Therefore, this feature can be used during business design.

 

Let's take a look at the Logical Data Model:


Row key

Time stamp

Column "Contents :"

Column "Anchor :"

Column "MIME :"

"Com. CNN. www"

T9

 

"Anchor: cnnsi.com"

"CNN"

 

T8

 

"Anchor: My. Look. ca"

"Cnn.com"

 

T6

"<HTML> ..."

   

"Text/html"

T5

"<HTML> ..."

     

T3

"<HTML> ..."

 

 

   

The table above contains a column with the unique identifier COM. CNN. WWW, each logical modification has a timestamp Association, which has four column definitions: <contents:>, <anchor: cnnsi.com>, <anchor: My. look. ca>, <MIME:>. If bigtable is interpreted as a traditional concept, bigtable can be considered as a DB schema. Each row is a table, and the row key is the table name, this table can be divided into multiple versions based on different columns, and each version of the operation will have a timestamp associated with the operation row.


Let's take a look at the hbase physical data model:

Row key

Time stamp

Column "Contents :"

"Com. CNN. www"

T6

"<HTML> ..."

T5

"<HTML> ..."

T3

"<HTML> ..."

Row key

Time stamp

Column "Anchor :"

"Com. CNN. www"

T9

"Anchor: cnnsi.com"

"CNN"

T8

"Anchor: My. Look. ca"

"Cnn.com"

Row key

Time stamp

Column "MIME :"

"Com. CNN. www"

T6

"Text/html"

 

 

 

The physical data model is actually to divide a row in the logical model into a physical model stored according to Column family.

When you operate the data model of bigtable, the row is locked and the atomic operation of the row is ensured.

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.