What is bigtable? Google's paper gives a full description of it. Literally, it is a large table, which is actually different from the tables of traditional databases we imagine. Loose data is a data between map entry (Key & value) and DB row. When I use memcache, sometimes I need to store more than just a simple key that corresponds to a value. Maybe I need to store multiple attributes in a database table structure, however, there is no need for many associations in the traditional database table structure. In fact, such data is called loose data. Bigtable is a very large table. The attributes of a table can be dynamically increased as needed, but there is no need to associate queries between tables.
One of the biggest features of Internet applications is speed, powerful functionality, and slow speed. Therefore, high-traffic websites adopt cache to improve performance and response time. For map entry-type data, centralized distributed cache has many options. for traditional relational data, MySQL and Oracle provide good support. Only data such as loose data is available, neither of the two solutions can maximize the processing capability. Therefore, bigtable is useful.
Hbase
Is a scalable, distributed, and column-oriented dynamic mode database for structured data. It can effectively and reliably manage massive data (GB or more) distributed across thousands of commodity servers ).
Hbase is modeled based on Google's bigtable database. It is the hadoop of Apache Software Foundation.
Project.
The data stored in hbase is sparse (unstructured or semi-structured data ). Hbase is good at storing such data because hbase uses column-
Oriented column-oriented storage mechanism, while the well-known RDBMS is Row-
Oriented row-oriented storage mechanism (depressing: I have read n articles about relational databases and never mentioned row-
Oriented row-oriented storage ). The column-oriented storage mechanism does not occupy any space for null value storage. For example, if a table
Usertable has 10 columns, but only one column contains data during storage, then the nine columns with other null values do not occupy the storage space (how does a common database MySQL occupy the storage space ?).
Another reason why hbase is suitable for storing unstructured sparse data is that it processes the column set column families.
For example, what are the differences between dynamic languages like Ruby and Python and C ++ and Java compiling languages?
For me, the most obvious difference is that you do not need to specify a type for the variable. OK
Now hbase has brought this exciting feature to the DBA in the future. You just need to tell your data to the column families stored in hbase.
You do not need to specify the specific type: Char, varchar, Int, tinyint, text, and so on.
Hbase also has many features. For example, hbase does not support join queries, but you can use the Parent-Child tuple method to store the data in disguise.
Note: At the time of writing this article, the latest version of hbase is v0.19.3. The information provided in this Article applies to this version.
Data Model
Hbase data is modeled as multidimensional ing, where values (TableUnit) Using four key indexes:
value = Map(TableName, RowKey, ColumnKey, Timestamp) |
Where:
TableName
Is a string.
RowKey
AndColumnKey
Is a binary value (Java typebyte[]
).
Timestamp
Is a 64-bit integer (Java typelong
).
value
Is an uninterpreted byte array (Java typebyte[]
).
Binary data is encoded as base64 for transmission over the network.
The row key is the table's primary key, usually a string. Rows are ordered alphabetically by the row key.
The structure of the information stored in the table isColumn family), You can consider this structureCategory. Each column family can have any numberMember, They pass throughTag(OrModifier) Recognition.column
Key Family Name,:
And tags. For exampleinfo
And Membersdate
, Column key isinfo:date
.
One hbase table mode defines multiple columns. However, when you insert a row into the table, the application can create a new member at runtime. For a column family, different rows in a table can have different numbers of members. In other words, hbase supports oneDynamic ModeModel.
Table 1 showsPersonsA simple example of an hbase table. The table has two column families:name
Andcontact
.
Row key |
Timestamp |
Column family |
Name |
Contact |
000001 |
T3 |
|
Contact: HTTP research.google.com/people/jeff/ |
T2 |
Name: First Jeffrey |
|
T1 |
Name: Last Dean |
|
000002 |
T5 |
Name: First Gabriel |
|
T4 |
Name: Last mateescu |
|
An empty unit does not have a value associated with the unit key. In table 1(000002, contact:http, t4)
The associated unit is empty. Empty cells are not stored in hbase. Reading empty cells is similar to extracting values from ing Based on nonexistent keys. Hbase tables are adapted in this waySparseLine.
For any row, only one member of one columnfamily can be accessed at a time (different from a relational database, in a relational database, a query can access multiple column units in one row ). You can treat a column family member in a rowChild row.
The table is divided into multiple tables.Region, Equivalent to bigtableTablet). A region contains rows in a certain range. Splitting a table into multiple regions is a key mechanism for efficient processing of large tables.
Each table in hbase is a bigtable. Bigtable stores a series of Row Records, which have three basic types: Row key, time stamp, and column. The row key is the unique identifier of a row in bigtable, and the time stamp is the timestamp associated with each data operation. It can be seen as a version similar to SVN, and the column is defined as: <family>: <label>, you can use these two parts to uniquely specify a data storage column. To define and modify a family, you must perform DB-like DDL operations on hbase, you do not need to define the columns and can use them directly. This also provides a means to dynamically customize columns. Another role of family is to optimize the read/write operations of physical storage. The physical storage of family data is approaching. Therefore, this feature can be used during business design.
Let's take a look at the Logical Data Model:
Row key |
Time stamp |
Column "Contents :" |
Column "Anchor :" |
Column "MIME :" |
"Com. CNN. www" |
T9 |
|
"Anchor: cnnsi.com" |
"CNN" |
|
T8 |
|
"Anchor: My. Look. ca" |
"Cnn.com" |
|
T6 |
"<HTML> ..." |
|
|
"Text/html" |
T5 |
"<HTML> ..." |
|
|
|
T3 |
"<HTML> ..." |
|
|
|
The table above contains a column with the unique identifier COM. CNN. WWW, each logical modification has a timestamp Association, which has four column definitions: <contents:>, <anchor: cnnsi.com>, <anchor: My. look. ca>, <MIME:>. If bigtable is interpreted as a traditional concept, bigtable can be considered as a DB schema. Each row is a table, and the row key is the table name, this table can be divided into multiple versions based on different columns, and each version of the operation will have a timestamp associated with the operation row.
Let's take a look at the hbase physical data model:
Row key |
Time stamp |
Column "Contents :" |
"Com. CNN. www" |
T6 |
"<HTML> ..." |
T5 |
"<HTML> ..." |
T3 |
"<HTML> ..." |
Row key |
Time stamp |
Column "Anchor :" |
"Com. CNN. www" |
T9 |
"Anchor: cnnsi.com" |
"CNN" |
T8 |
"Anchor: My. Look. ca" |
"Cnn.com" |
Row key |
Time stamp |
Column "MIME :" |
"Com. CNN. www" |
T6 |
"Text/html" |
The physical data model is actually to divide a row in the logical model into a physical model stored according to Column family.
When you operate the data model of bigtable, the row is locked and the atomic operation of the row is ensured.