Hbase (1)-Data Model

Last Update:2018-12-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

What is bigtable? Google's paper gives a full description of it. Literally, it is a large table, which is actually different from the tables of traditional databases we imagine. Loose data is a data between map entry (Key & value) and DB row. When I use memcache, sometimes I need to store more than just a simple key that corresponds to a value. Maybe I need to store multiple attributes in a database table structure, however, there is no need for many associations in the traditional database table structure. In fact, such data is called loose data. Bigtable is a very large table. The attributes of a table can be dynamically increased as needed, but there is no need to associate queries between tables.

One of the biggest features of Internet applications is speed, powerful functionality, and slow speed. Therefore, high-traffic websites adopt cache to improve performance and response time. For map entry-type data, centralized distributed cache has many options. for traditional relational data, MySQL and Oracle provide good support. Only data such as loose data is available, neither of the two solutions can maximize the processing capability. Therefore, bigtable is useful.

Hbase
Is a scalable, distributed, and column-oriented dynamic mode database for structured data. It can effectively and reliably manage massive data (GB or more) distributed across thousands of commodity servers ).
Hbase is modeled based on Google's bigtable database. It is the hadoop of Apache Software Foundation.
Project.

The data stored in hbase is sparse (unstructured or semi-structured data ). Hbase is good at storing such data because hbase uses column-
Oriented column-oriented storage mechanism, while the well-known RDBMS is Row-
Oriented row-oriented storage mechanism (depressing: I have read n articles about relational databases and never mentioned row-
Oriented row-oriented storage ). The column-oriented storage mechanism does not occupy any space for null value storage. For example, if a table
Usertable has 10 columns, but only one column contains data during storage, then the nine columns with other null values do not occupy the storage space (how does a common database MySQL occupy the storage space ?).
Another reason why hbase is suitable for storing unstructured sparse data is that it processes the column set column families.
For example, what are the differences between dynamic languages like Ruby and Python and C ++ and Java compiling languages?
For me, the most obvious difference is that you do not need to specify a type for the variable. OK
Now hbase has brought this exciting feature to the DBA in the future. You just need to tell your data to the column families stored in hbase.
You do not need to specify the specific type: Char, varchar, Int, tinyint, text, and so on.

Hbase also has many features. For example, hbase does not support join queries, but you can use the Parent-Child tuple method to store the data in disguise.

Note: At the time of writing this article, the latest version of hbase is v0.19.3. The information provided in this Article applies to this version.

Data Model

Hbase data is modeled as multidimensional ing, where values (TableUnit) Using four key indexes:

value = Map(TableName, RowKey, ColumnKey, Timestamp)

Where:

TableNameIs a string.
RowKeyAndColumnKey
Is a binary value (Java typebyte[]).
TimestampIs a 64-bit integer (Java typelong).
valueIs an uninterpreted byte array (Java typebyte[]).

Binary data is encoded as base64 for transmission over the network.

The row key is the table's primary key, usually a string. Rows are ordered alphabetically by the row key.

The structure of the information stored in the table isColumn family), You can consider this structureCategory. Each column family can have any numberMember, They pass throughTag(OrModifier) Recognition.columnKey Family Name,:And tags. For exampleinfoAnd Membersdate, Column key isinfo:date.

One hbase table mode defines multiple columns. However, when you insert a row into the table, the application can create a new member at runtime. For a column family, different rows in a table can have different numbers of members. In other words, hbase supports oneDynamic ModeModel.

Table 1 showsPersonsA simple example of an hbase table. The table has two column families:nameAndcontact.

Row key	Timestamp	Column family
Row key	Timestamp	Name	Contact
000001	T3		Contact: HTTP research.google.com/people/jeff/
	T2	Name: First Jeffrey
	T1	Name: Last Dean
000002	T5	Name: First Gabriel
000002	T4	Name: Last mateescu

An empty unit does not have a value associated with the unit key. In table 1(000002, contact:http, t4)The associated unit is empty. Empty cells are not stored in hbase. Reading empty cells is similar to extracting values from ing Based on nonexistent keys. Hbase tables are adapted in this waySparseLine.

For any row, only one member of one columnfamily can be accessed at a time (different from a relational database, in a relational database, a query can access multiple column units in one row ). You can treat a column family member in a rowChild row.

The table is divided into multiple tables.Region, Equivalent to bigtableTablet). A region contains rows in a certain range. Splitting a table into multiple regions is a key mechanism for efficient processing of large tables.

Each table in hbase is a bigtable. Bigtable stores a series of Row Records, which have three basic types: Row key, time stamp, and column. The row key is the unique identifier of a row in bigtable, and the time stamp is the timestamp associated with each data operation. It can be seen as a version similar to SVN, and the column is defined as: <family>: <label>, you can use these two parts to uniquely specify a data storage column. To define and modify a family, you must perform DB-like DDL operations on hbase, you do not need to define the columns and can use them directly. This also provides a means to dynamically customize columns. Another role of family is to optimize the read/write operations of physical storage. The physical storage of family data is approaching. Therefore, this feature can be used during business design.

Let's take a look at the Logical Data Model:

Row key	Time stamp	Column "Contents :"	Column "Anchor :"		Column "MIME :"
"Com. CNN. www"	T9		"Anchor: cnnsi.com"	"CNN"
	T8		"Anchor: My. Look. ca"	"Cnn.com"
	T6	"<HTML> ..."			"Text/html"
	T5	"<HTML> ..."
	T3	"<HTML> ..."

The table above contains a column with the unique identifier COM. CNN. WWW, each logical modification has a timestamp Association, which has four column definitions: <contents:>, <anchor: cnnsi.com>, <anchor: My. look. ca>, <MIME:>. If bigtable is interpreted as a traditional concept, bigtable can be considered as a DB schema. Each row is a table, and the row key is the table name, this table can be divided into multiple versions based on different columns, and each version of the operation will have a timestamp associated with the operation row.

Let's take a look at the hbase physical data model:

Row key	Time stamp	Column "Contents :"
"Com. CNN. www"	T6	"<HTML> ..."
	T5	"<HTML> ..."
	T3	"<HTML> ..."

Row key	Time stamp	Column "Anchor :"
"Com. CNN. www"	T9	"Anchor: cnnsi.com"	"CNN"
"Com. CNN. www"	T8	"Anchor: My. Look. ca"	"Cnn.com"

Row key	Time stamp	Column "MIME :"
"Com. CNN. www"	T6	"Text/html"

The physical data model is actually to divide a row in the logical model into a physical model stored according to Column family.

When you operate the data model of bigtable, the row is locked and the atomic operation of the row is ensured.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Hbase (1)-Data Model

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Hbase (1)-Data Model

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support