[Reprinted] basic concepts of hbase and Common commands and usage of hbase Shell

Last Update:2018-12-05 Source: Internet

Author: User

Tags time in milliseconds hadoop mapreduce

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Introduction

Hbase is a distributed, column-oriented open-source database. It originated from Google's paper bigtable: a distributed storage system for structured data. Hbase is an open-source implementation of Google bigtable. It uses hadoop HDFS as its file storage system, hadoop mapreduce to process massive data in hbase, and zookeeper as a collaborative service.

2. hbase table structure

Hbase stores data in tables. A table consists of rows and columns. The column is divided into several column families/column families ).

Row key	Column-family1		Column-family2			Column-family3
Row key	Column1	Column2	Column1	Column2	Column3	Column1
Key1	T1: ABC T2: gdxdf		T4: dfads T3: Hello T2: World
Key2	T3: ABC T1: gdxdf		T4: dfads T3: Hello		T2: dfdsfa T3: dfdf
Key3		T2: dfadfasd T1: dfdasddsf				T2: dfxxdfasd T1: taobao.com

As shown in, key1, key2, key3 is the unique row key value of three records, column-family1, column-family2, and column-family3 are three columns, each containing several columns. For example, the column-family1 family contains two columns named column1 and column2, T1: ABC, T2: gdxdf is a cell uniquely identified by row key1 and the column-family1-column1. The cell contains two data types: ABC and gdxdf. The timestamps of the two values are different, T1, T2, and hbase returns the value of the latest time to the requester.

The specific meanings of these terms are as follows:

(1) Row key

Like nosql databases, row keys are the primary keys used to retrieve records. There are only three methods to access rows in hbase table:

(1.1) access through a single row key

(1.2) use the range of the row key

(1.3) full table Scan

The row key can be any string (the maximum length is 64 KB, and the actual length is generally 10-bytes). In hbase, the row key is saved as a byte array.

Data is stored in the Lexicographic Order (byte order) of the row key. When designing keys, you need to fully sort and store the rows that are frequently read together. (Location correlation)

Note:

The result of the lexicographically ordered int is 1, 10, 11, 12, 13, 16, 17, 18, 19, 21 ,..., 9,91, 92,93, 94,95, 96,97, 98,99. To maintain the natural order of the integer, the row key must be left filled with 0.

One read/write operation on a row is an atomic operation (no matter how many columns are read/written at a time ). This design decision makes it easy for users to understand the program's behavior when performing concurrent update operations on the same row.

(2) column family

Each column in The hbase table belongs to a column family. A column family is a part of the table's Chema (rather than a column) and must be defined before the table is used. All column names are prefixed with column families. For example, courses: History and courses: math all belong to the courses column family.

Access control, disk and memory usage statistics are all performed at the column family level. In practical applications, the control permissions on the columnfamily can help us manage different types of applications: we allow some applications to add new basic data, some applications to read basic data and create inherited columnfamily, and some applications to only browse data (or even not because of privacy ). all data ).

(3) Cell

In hbase, a storage unit identified by row and columns is called cell. Uniquely identified by {row key, column (= <family> + <label>), and version. The data in cell is of no type and all are stored in bytecode format.

(4) Timestamp

Each cell stores multiple versions of the same data. Versions are indexed by timestamps. The timestamp type is a 64-bit integer. The timestamp can be assigned by hbase (automatically when data is written). The timestamp is accurate to the current system time in milliseconds. The timestamp can also be explicitly assigned by the customer. To avoid data version conflicts, the application must generate a unique timestamp. In each cell, data of different versions are sorted in reverse chronological order, that is, the latest data is ranked first.

To avoid the management (including storage and indexing) burden caused by excessive data versions, hbase provides two data version recycling methods. The first is to save the last n versions of the data, and the second is to save the versions (for example, the last seven days) in the recent period ). You can set for each column family.

3. Basic usage of hbase Shell

Hbase provides a shell terminal for user interaction. Run help get to view the help information of the command.

The usage of hbase is demonstrated using an example of an online student sequence table.

Name	Grad	Course
Name	Grad	Math	Art
Zkb	5	97	87
Baoniu	4	89	80

Here, grad is a column for a table, and course is a column family for the table. This column family consists of two columns: Math and art, of course, we can create more columns in course as needed, such as computer and physics, and add columns to the course columnfamily. Note the value 90 in the figure. The columns in the column family can also have no names.

(1) create a table. Scores has two columns: grad and courese.

hbase(main):001:0> create'scores','grade'

'course'

0 row(s) in 0.4780 seconds

(2) view the tables in the current hbase

hbase(main):002:0> list

TABLE

scores

1 row(s) in 0.0270 seconds

(3) view the table structure

hbase(main):004:0> describe'scores'

DESCRIPTION                                                 ENABLED

{NAME => 'scores', FAMILIES => [{NAME =>'course'

, BLOOMFILTER =>

'NONE', REPLICATION_SCOPE =>'0'

true

COMPRESSION => 'NONE', VERSIONS =>'3'

, TTL =>

'2147483647', BLOCKSIZE =>'65536'

, IN_MEMORY => 'fal

se', BLOCKCACHE => 'true'}, {NAME => 'grade', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPR

ESSION => 'NONE', VERSIONS =>
'3', TTL => '2147483647', BLOCKSIZE =>'65536'

, IN_MEMORY =>

'false',

BLOCKCACHE => 'true'

}]}

1 row(s) in 0.0390 seconds

(4) Add a row of data. The column name of the row named grad in the zkb column family is "" value 5

hbase(main):006:0> put'scores','zkb','grade:','5'

0 row(s) in 0.0420 seconds

(5) Add a column to the column family course of the Data row zkb <math, 97>

hbase(main):007:0> put'scores','zkb','course:math','97'

0 row(s) in 0.0270 seconds

(6) Add a column to the column family course of the Data row zkb <art, 87>

hbase(main):008:0> put'scores','zkb','course:art','87'

0 row(s) in 0.0260 seconds

(7) Add a row of data. The row name is the grad column name of the baoniu columnfamily. The column name is "" and the value is 4.

hbase(main):009:0> put'scores','baoniu','grade:','4'

0 row(s) in 0.0260 seconds

(8) Add a column to the column family course of the Data row baoniu <math, 89>

hbase(main):010:0> put'scores','baoniu','course:math','89'

0 row(s) in 0.0270 seconds

(9) Add a column to the column family course of the Data row of Jerry <art, 80>

hbase(main):011:0> put'scores','baoniu','course:art','80'

0 row(s) in 0.0270 seconds

(10) view zkb data in the scores table

hbase(main):012:0> get'scores','zkb'

COLUMN                     CELL

course:art timestamp=1316100110921, value=87

course:math timestamp=1316100025944, value=97

grade: timestamp=1316099975625, value=5

3 row(s) in 0.0480 seconds

(11) view all data in the scores table

Note: The scan command can specify startrow and stoprow to scan multiple rows, for example, scan 'user _ test', {columns => 'info: username', limit => 10, startrow => 'test', stoprow => 'test2 '}

hbase(main):013:0> scan'scores'

ROW                        COLUMN+CELL

baoniu column=course:art, timestamp=1316100293784, value=80

baoniu column=course:math, timestamp=1316100234410, value=89

baoniu column=grade:, timestamp=1316100178609, value=4

zkb column=course:art, timestamp=1316100110921, value=87

zkb column=course:math, timestamp=1316100025944, value=97

zkb column=grade:, timestamp=1316099975625, value=5

2 row(s) in 0.0470 seconds

(12) view all data in the courses column family in the scores table

hbase(main):017:0> scan'scores'

,{COLUMNS =>

'course'}

ROW                        COLUMN+CELL

baoniu column=course:art, timestamp=1316100293784, value=80

baoniu column=course:math, timestamp=1316100234410, value=89

zkb column=course:art, timestamp=1316100110921, value=87

zkb column=course:math, timestamp=1316100025944, value=97

2 row(s) in 0.0350 seconds

(13) Delete the scores table

hbase(main):024:0> disable'scores'

0 row(s) in 0.0330 seconds

hbase(main):025:0> drop'scores'

0 row(s) in 1.0840 seconds

To sum up, hbase shell commonly used operation Commands include create, describe, disable, drop, list, scan, put, get, delete, deleteall, Count, status, etc, you can see the detailed usage through help.

This article Reprinted from: http://archive.cnblogs.com/a/2178064/

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More