[Reprinted] basic concepts of hbase and Common commands and usage of hbase Shell

Source: Internet
Author: User
Tags time in milliseconds hadoop mapreduce

1. Introduction

Hbase is a distributed, column-oriented open-source database. It originated from Google's paper bigtable: a distributed storage system for structured data. Hbase is an open-source implementation of Google bigtable. It uses hadoop HDFS as its file storage system, hadoop mapreduce to process massive data in hbase, and zookeeper as a collaborative service.

2. hbase table structure

Hbase stores data in tables. A table consists of rows and columns. The column is divided into several column families/column families ).

Row key

Column-family1

Column-family2

Column-family3

Column1

Column2

Column1

Column2

Column3

Column1

Key1

T1: ABC

T2: gdxdf

 

T4: dfads

T3: Hello

T2: World

     

Key2

T3: ABC

T1: gdxdf

 

T4: dfads

T3: Hello

 

T2: dfdsfa

T3: dfdf

 

Key3

 

T2: dfadfasd

T1: dfdasddsf

     

T2: dfxxdfasd

T1: taobao.com

As shown in, key1, key2, key3 is the unique row key value of three records, column-family1, column-family2, and column-family3 are three columns, each containing several columns. For example, the column-family1 family contains two columns named column1 and column2, T1: ABC, T2: gdxdf is a cell uniquely identified by row key1 and the column-family1-column1. The cell contains two data types: ABC and gdxdf. The timestamps of the two values are different, T1, T2, and hbase returns the value of the latest time to the requester.

The specific meanings of these terms are as follows:

(1) Row key

Like nosql databases, row keys are the primary keys used to retrieve records. There are only three methods to access rows in hbase table:

(1.1) access through a single row key

(1.2) use the range of the row key

(1.3) full table Scan

The row key can be any string (the maximum length is 64 KB, and the actual length is generally 10-bytes). In hbase, the row key is saved as a byte array.

Data is stored in the Lexicographic Order (byte order) of the row key. When designing keys, you need to fully sort and store the rows that are frequently read together. (Location correlation)

Note:

The result of the lexicographically ordered int is 1, 10, 11, 12, 13, 16, 17, 18, 19, 21 ,..., 9,91, 92,93, 94,95, 96,97, 98,99. To maintain the natural order of the integer, the row key must be left filled with 0.

One read/write operation on a row is an atomic operation (no matter how many columns are read/written at a time ). This design decision makes it easy for users to understand the program's behavior when performing concurrent update operations on the same row.

(2) column family

Each column in The hbase table belongs to a column family. A column family is a part of the table's Chema (rather than a column) and must be defined before the table is used. All column names are prefixed with column families. For example, courses: History and courses: math all belong to the courses column family.

Access control, disk and memory usage statistics are all performed at the column family level. In practical applications, the control permissions on the columnfamily can help us manage different types of applications: we allow some applications to add new basic data, some applications to read basic data and create inherited columnfamily, and some applications to only browse data (or even not because of privacy ). all data ).

(3) Cell

In hbase, a storage unit identified by row and columns is called cell. Uniquely identified by {row key, column (= <family> + <label>), and version. The data in cell is of no type and all are stored in bytecode format.

(4) Timestamp

Each cell stores multiple versions of the same data. Versions are indexed by timestamps. The timestamp type is a 64-bit integer. The timestamp can be assigned by hbase (automatically when data is written). The timestamp is accurate to the current system time in milliseconds. The timestamp can also be explicitly assigned by the customer. To avoid data version conflicts, the application must generate a unique timestamp. In each cell, data of different versions are sorted in reverse chronological order, that is, the latest data is ranked first.

To avoid the management (including storage and indexing) burden caused by excessive data versions, hbase provides two data version recycling methods. The first is to save the last n versions of the data, and the second is to save the versions (for example, the last seven days) in the recent period ). You can set for each column family.

3. Basic usage of hbase Shell

Hbase provides a shell terminal for user interaction. Run help get to view the help information of the command.

The usage of hbase is demonstrated using an example of an online student sequence table.

Name

Grad

Course

Math

Art

Zkb

5

97

87

Baoniu

4

89

80

Here, grad is a column for a table, and course is a column family for the table. This column family consists of two columns: Math and art, of course, we can create more columns in course as needed, such as computer and physics, and add columns to the course columnfamily. Note the value 90 in the figure. The columns in the column family can also have no names.

(1) create a table. Scores has two columns: grad and courese.

hbase(main):001:0> create'scores','grade',
'course'
0 row(s) in 0.4780 seconds

(2) view the tables in the current hbase

hbase(main):002:0> list
TABLE                                                                                                                                                                                         
scores                                                                                                                                                                               
1 row(s) in 0.0270 seconds

(3) view the table structure

hbase(main):004:0> describe'scores'
DESCRIPTION                                                 ENABLED                                               
 {NAME => 'scores', FAMILIES => [{NAME =>'course', BLOOMFILTER =>
'NONE', REPLICATION_SCOPE =>'0'
true
 COMPRESSION => 'NONE', VERSIONS =>'3', TTL =>
'2147483647', BLOCKSIZE =>'65536', IN_MEMORY => 'fal                                                       
 se', BLOCKCACHE => 'true'}, {NAME => 'grade', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', COMPR                                                       
 ESSION => 'NONE', VERSIONS =>
'3', TTL => '2147483647', BLOCKSIZE =>'65536', IN_MEMORY =>
'false',                                                        
 BLOCKCACHE => 'true'}]}                                                                                                                                    
1 row(s) in 0.0390 seconds

(4) Add a row of data. The column name of the row named grad in the zkb column family is "" value 5

hbase(main):006:0> put'scores','zkb','grade:','5'
0 row(s) in 0.0420 seconds

(5) Add a column to the column family course of the Data row zkb <math, 97>

hbase(main):007:0> put'scores','zkb','course:math','97'
0 row(s) in 0.0270 seconds

(6) Add a column to the column family course of the Data row zkb <art, 87>

hbase(main):008:0> put'scores','zkb','course:art','87'
0 row(s) in 0.0260 seconds

(7) Add a row of data. The row name is the grad column name of the baoniu columnfamily. The column name is "" and the value is 4.

hbase(main):009:0> put'scores','baoniu','grade:','4'
0 row(s) in 0.0260 seconds

(8) Add a column to the column family course of the Data row baoniu <math, 89>

hbase(main):010:0> put'scores','baoniu','course:math','89'
0 row(s) in 0.0270 seconds

(9) Add a column to the column family course of the Data row of Jerry <art, 80>

hbase(main):011:0> put'scores','baoniu','course:art','80'
0 row(s) in 0.0270 seconds

(10) view zkb data in the scores table

hbase(main):012:0> get'scores','zkb'
COLUMN                     CELL                                                                                                               
 course:art                              timestamp=1316100110921, value=87
 course:math                             timestamp=1316100025944, value=97
 grade:                                  timestamp=1316099975625, value=5
3 row(s) in 0.0480 seconds

(11) view all data in the scores table

Note: The scan command can specify startrow and stoprow to scan multiple rows, for example, scan 'user _ test', {columns => 'info: username', limit => 10, startrow => 'test', stoprow => 'test2 '}

hbase(main):013:0> scan'scores'
ROW                        COLUMN+CELL                                                                                                        
 baoniu                                  column=course:art, timestamp=1316100293784, value=80
 baoniu                                  column=course:math, timestamp=1316100234410, value=89
 baoniu                                  column=grade:, timestamp=1316100178609, value=4
 zkb                                     column=course:art, timestamp=1316100110921, value=87
 zkb                                     column=course:math, timestamp=1316100025944, value=97
 zkb                                     column=grade:, timestamp=1316099975625, value=5
2 row(s) in 0.0470 seconds

(12) view all data in the courses column family in the scores table

hbase(main):017:0> scan'scores',{COLUMNS =>
'course'}
ROW                        COLUMN+CELL                                                                                                        
 baoniu                                  column=course:art, timestamp=1316100293784, value=80
 baoniu                                  column=course:math, timestamp=1316100234410, value=89
 zkb                                     column=course:art, timestamp=1316100110921, value=87
 zkb                                     column=course:math, timestamp=1316100025944, value=97
2 row(s) in 0.0350 seconds

(13) Delete the scores table

hbase(main):024:0> disable'scores'
0 row(s) in 0.0330 seconds
 
hbase(main):025:0> drop'scores'
0 row(s) in 1.0840 seconds

To sum up, hbase shell commonly used operation Commands include create, describe, disable, drop, list, scan, put, get, delete, deleteall, Count, status, etc, you can see the detailed usage through help.

This article Reprinted from: http://archive.cnblogs.com/a/2178064/

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.