HBase System Architecture and Data Structure

Source: Internet
Author: User
Tags time in milliseconds hadoop ecosystem

HBase tables have the following features:

1 large: A table can have hundreds of millions of rows and millions of Columns

2. Column-oriented: storage and permission Control for columns (families), and independent column (family) retrieval.

3 sparse: Columns with null values do not occupy storage space. Therefore, tables can be designed to be sparse.

The following figure shows the location of Hbase in Hadoop Ecosystem.

Ii. Logic View

HBase stores data in tables. A table consists of rows and columns. The column is divided into several row families)

Row Key

Like nosql databases, row keys are the primary keys used to retrieve records. There are only three methods to access rows in hbase table:

1. access through a single row key

2. Use the range of the row key

3. Full table Scan

The Row key can be any string (the maximum length is 64 KB, and the actual length is generally 10-bytes). In hbase, the Row key is saved as a byte array.

Data is stored in the Lexicographic order (byte order) of the Row key. When designing keys, you need to fully sort and store the rows that are frequently read together. (Location correlation)

Note:

The result of the lexicographically ordered int is 1, 10, 11, 12, 13, 16, 17, 18, 19, 21 ,..., 9,91, 92,93, 94,95, 96,97, 98,99. To maintain the natural order of the integer, the row key must be left filled with 0.

One read/write operation on a row is an atomic operation (no matter how many columns are read/written at a time ). This design decision makes it easy for users to understand the program's behavior when performing concurrent update operations on the same row.

Column family

Each column in The hbase table belongs to a column family. A column family is a part of the table's chema (rather than a column) and must be defined before the table is used. All column names are prefixed with column families. For example, courses: history, courses: math

All belong to the courses column family.

Access control, disk and memory usage statistics are all performed at the column family level. In practical applications, the control permissions on the columnfamily can help us manage different types of applications: we allow some applications to add new basic data, some applications to read basic data and create inherited columnfamily, and some applications to only browse data (or even not because of privacy ). all data ).

Timestamp

In HBase, a storage unit identified by row and columns is called cell. Each cell stores multiple versions of the same data. Versions are indexed by timestamps. The timestamp type is a 64-bit integer. The timestamp can be assigned by hbase (automatically when data is written). The timestamp is accurate to the current system time in milliseconds. The timestamp can also be explicitly assigned by the customer. To avoid data version conflicts, the application must generate a unique timestamp. In each cell, data of different versions are sorted in reverse chronological order, that is, the latest data is ranked first.

To avoid the management (including storage and indexing) burden caused by excessive data versions, hbase provides two data version recycling methods. The first is to save the last n versions of the data, and the second is to save the versions (for example, the last seven days) in the recent period ). You can set for each column family.

Cell

Uniquely identified by {row key, column (= +), version. The data in cell is of no type and all are stored in bytecode format.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Next Page

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.