HBase Basic concepts and hbase Shell Common command usage

Last Update:2015-10-19 Source: Internet

Author: User

Tags hadoop mapreduce

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Introduction

HBase is a distributed, column-oriented, open-source database derived from a Google paper, BigTable: A distributed storage system of structured data. HBase is an open source implementation of Google BigTable, which leverages Hadoop HDFs as its file storage system, leverages Hadoop MapReduce to handle massive amounts of data in HBase, and leverages zookeeper as a collaborative service.

2. The table structure of HBase

HBase stores data in the form of a table. The table is made up of rows and columns. The columns are divided into a number of column family/column families (column family).

Row Key

Column-family1

Column-family2

Column-family3

Column1

Column2

Column1

Column2

Column3

Column1

Key1

T1:abc

T2:gdxdf

T4:dfads

T3:hello

T2:world

Key2

T3:abc

T1:gdxdf

T4:dfads

T3:hello

T2:dfdsfa

T3:dfdf

Key3

T2:dfadfasd

T1:dfdasddsf

T2:dfxxdfasd

T1:taobao.com

As shown, Key1,key2,key3 is the only row key value for three records, Column-family1,column-family2,column-family3 is a three-column family, and several columns are included under each column family. For example column-family1 This column family consists of two columns, the name is Column1 and COLUMN2,T1:ABC,T2:GDXDF is a cell that is uniquely determined by row Key1 and Column-family1-column1. There are two data in this cell, ABC and GDXDF. The timestamp of two values is different, t1,t2, and HBase returns the value of the most recent time to the requestor.

The specific meanings of these nouns are as follows:

(1) Row Key

Like NoSQL databases, row key is the primary key used to retrieve records. There are only three ways to access rows in HBase table:

(1.1) Access via a single row key

(1.2) through the range of row key

(1.3) Full table scan

Row key line keys (row key) can be any string (the maximum length is 64KB, the actual application length is generally 10-100bytes), inside HBase, the row key is saved as a byte array.

When stored, the data is sorted by the dictionary order (byte order) of the row key. When designing a key, to fully sort the storage feature, put together the row stores that are often read together. (Positional dependency)

Attention:

The result of the dictionary ordering of int is 1,10,100,11,12,13,14,15,16,17,18,19,2,20,21,..., 9,91,92,93,94,95,96,97,98,99. To maintain the natural order of shaping, the row key must be left padded with 0.

One read or write of a row is an atomic operation (no matter how many columns are read or written). This design decision makes it easy for the user to understand the behavior of the program when concurrent update operations are performed on the same row.

(2) Row Family column family

Each column in an hbase table is attributed to a column family. The column family is part of the Chema of the table (and the column is not) and must be defined before the table is used. Column names are prefixed with the column family. For example Courses:history, Courses:math belong to the courses family.

Access control, disk, and memory usage statistics are performed at the column family level. In practical applications, control permissions on the column family help us manage different types of applications: we allow some apps to add new basic data, some apps can read basic data and create inherited column families, and some apps will only allow browsing data (and maybe not even browsing all data for privacy reasons).

(3) Unit cell

A storage unit identified by row and columns in HBase is called a cell. The only unit determined by {row key, column (=+), version}. The data in the cell is of no type and is all stored in bytecode form.

(4) Timestamp timestamp

Each cell holds multiple versions of the same piece of data. The version is indexed by time stamp. The type of timestamp is a 64-bit integer. The timestamp can be assigned by HBase (automatically when the data is written), at which time the timestamp is the current system time that is accurate to milliseconds. Timestamps can also be explicitly assigned by the customer. If your application avoids data versioning conflicts, it must generate its own unique timestamp. In each cell, different versions of the data are sorted in reverse chronological order, that is, the most recent data is in the front row.

To avoid the burden of management (including storage and indexing) caused by too many versions of data, HBase provides two ways to recover data versions. The first is to save the last n versions of the data, and the second is to save the version for the most recent period (for example, the last seven days). Users can set them for each column family.

3. Basic usage of HBase shell

HBase provides a shell terminal to interact with the user. You can see the Help information for the command by performing a get.

Demonstrate the use of hbase with an example of an online Student score table.

Name

Grad

Course

Math

Art

Zkb

Baoniu

Here grad for the table is a column, course for the table is a column family, this column family consists of two columns of math and art, of course, we can according to our needs in the course to build more column family, such as computer, Add the course column family to the corresponding columns such as physics. Note In the figure is 90 this value, the column family under the column can also have no name.

(1) Create a table scores with two column families grad and Courese

HBase (main):001:0> create ' scores ', ' Grade ', ' course '

0 row (s) in 0.4780 seconds

(2) See what tables are in the current hbase

HBase (main):002:0> list

TABLE nbsp , &NB Sp , &NB Sp , &NB Sp , &NB Sp &NBSP;&NBSP;

Scores

1 row (s) in 0.0270 seconds

(3) View the structure of the table

HBase (main):004:0> describe ' scores '

DESCRIPTION ENABLED

{name = ' scores ', families = [{name = ' Course ', bloomfilter = ' NONE ', Replication_scope = ' 0 ', true

COMPRESSION = ' NONE ', VERSIONS = ' 3 ', TTL = ' 2147483647 ', BLOCKSIZE = ' 65536 ', in_memory = ' fal

Se ', Blockcache = ' true '}, {NAME = ' grade ', Bloomfilter = ' NONE ', Replication_scope = ' 0 ', COMPR

ession = ' NONE ', VERSIONS = ' 3 ', TTL = ' 2147483647 ', BLOCKSIZE = ' 65536 ', in_memory = ' false ',

Blockcache = ' true '}]}

1 row (s) in 0.0390 seconds

(4) Add a row of data, row name is ZKB column family grad column named "" Value bit 5

HBase (main):006:0> put ' scores ', ' zkb ', ' Grade: ', ' 5 '

0 row (s) in 0.0420 seconds

(5) Add a column to the column family course of the data for the ZKB row,97>

HBase (main):007:0> put ' scores ', ' zkb ', ' Course:math ', ' 97 '

0 row (s) in 0.0270 seconds

(6) Add a column to the column family course of the data for the ZKB row,87>

HBase (main):008:0> put ' scores ', ' zkb ', ' Course:art ', ' 87 '

0 row (s) in 0.0260 seconds

(7) Add a row of data, row name is baoniu column family grad column named "" Value is 4

HBase (main):009:0> put ' scores ', ' Baoniu ', ' Grade: ', ' 4 '

0 row (s) in 0.0260 seconds

(8) Add a column to the column family course of the data for the Baoniu row,89>

HBase (main):010:0> put ' scores ', ' Baoniu ', ' Course:math ', ' 89 '

0 row (s) in 0.0270 seconds

(9) Add a column to the column family course of the data for Jerry's line,80>

HBase (main):011:0> put ' scores ', ' Baoniu ', ' course:art ', ' 80 '

0 row (s) in 0.0270 seconds

(10) View ZKB data in scores table

HBase (main):012:0> get ' scores ', ' ZKB '

COLUMN CELL

Course:art timestamp=1316100110921, value=87

Course:math timestamp=1316100025944, value=97

grade:timestamp=1316099975625, value=5

3 row (s) in 0.0480 seconds

(11) View all data in the scores table

Note: The scan command can specify startrow,stoprow to scan multiple row, for example: Scan ' User_test ', {COLUMNS = ' info:username ', LIMIT =>10, StartRow = > ' Test ',stoprow=> ' test2 '}

HBase (main):013:0> scan ' scores '

ROW Column+cell

Baoniu Column=course:art, timestamp=1316100293784, value=80

Baoniu Column=course:math, timestamp=1316100234410, value=89

Baoniu Column=grade:, timestamp=1316100178609, value=4

ZKB Column=course:art, timestamp=1316100110921, value=87

ZKB Column=course:math, timestamp=1316100025944, value=97

ZKB Column=grade:, timestamp=1316099975625, value=5

2 row (s) in 0.0470 seconds

(12) View all data in the scores table for all Data courses column families

HBase (main):017:0> scan ' scores ', {COLUMNS = ' course '}

ROW Column+cell

Baoniu Column=course:art, timestamp=1316100293784, value=80

Baoniu Column=course:math, timestamp=1316100234410, value=89

ZKB Column=course:art, timestamp=1316100110921, value=87

ZKB Column=course:math, timestamp=1316100025944, value=97

2 row (s) in 0.0350 seconds

(13) Delete scores table

HBase (Main):024:0> disable ' scores '

0 row (s) in 0.0330 seconds

HBase (Main):025:0> drop ' scores '

0 row (s) in 1.0840 seconds

This article is from the "Yang Hailong blog" blog, make sure to keep this source http://7218743.blog.51cto.com/7208743/1704072

HBase Basic concepts and hbase Shell Common command usage

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More