Deep Learning about hbase: tables, column families, column identifiers, versions, and cells

Source: Internet
Author: User

Hbase is a column-Oriented Distributed Database, which is very different from traditional relational databases: physical and logical models. Here we will first talk about several basic concepts related to hbase databases that are different from relational databases:
Table: hbase organizes data into its own htable table. This table stores data physically based on the colomn family. Each column family has its own folder and storefiles, unlike a relational database that saves a table as a file, it indicates that it is also part of the file system path.
Row: the rows in hbase are logical rows, and the physical model uplink is accessed separately by colomn family. Hbase has the concept of rowkey. rowkwy has no data type and is always considered as a byte []. It is equivalent to a primary key in a relational data table. In addition, the design of rowkey has a long relationship with reading data. It can be considered as the most important link to design an hbase table. In a series data table, rows are actually rows, logically and physically organizing data by rows.
Column family: rows in hbase tables are grouped by a column family named colomn family, and data is stored by column family on disks. For this reason, therefore, when defining an hbase table, apart from defining the table name, you must also define the column family. Traditional databases do not have column families.
Column ID: data in a column family is described by a colomn qualifier. Column identifiers can be dynamically defined when you define a table or when saving data. This concept does not apply to databases.
Version: Data in hbase has a version concept. Each time data is generated or modified, a version information is saved. The version data is a timestamp, when defining a table, you can dynamically set the number of versions of the row to be saved. The default number of versions is 1. Data of different versions of the same row is sorted in reverse chronological order, other data, such as rowkey and column ID, are arranged in alphabetical order, which is also an optimization method for reading data. For a row, when the number of versions of the row to be saved is greater than the set value, the data row of the oldest version will be deleted when major compaction is executed. Version data is added by default. The timestamp when the value is saved is long (long integer ). This concept does not apply to databases.
Cell: In hbase tables, rowkey + (colomn family: colomn qualifier) + version identifies a cell. The specific data saved by the user is stored in this cell, if its value is of the byte [] type, you must change it to the desired type on the client.
 
Table, row, rowkey, colomn family, colomn qualifier, version (timestamp), and cell
We can see that:
Rowkey-> jacky20130429 jacky20130430
Comlom family-> info events
Version-> T3 T6 T9
Colomn qualifier-> email Sex Address type name
Info and events, two columns of the table, are saved to the disk.
Because[Jacky20130429, info, email, T9] ------> [email protected]Therefore, hbase can be considered as a key-value database. In addition, hbase can also be seen as a map with sorted map:
Sortedmap <
Rowkey, list <
Sortedmap <
Colomn, list <
Value, timestamp
>
>
>
>
 
The sortedmap at the first layer represents the htable of hbase, including a set of colomn families. Each column family includes a sortedmap at the next layer, which contains a set of columns and associated data.
Rows in hbase tables are atomic, but different versions of a row may be distributed in different storefile files. Expired data can only be deleted during the major compaction operation.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.