HBase Introduction

Last Update:2014-12-25 Source: Internet

Author: User

Keywords Time stamp can through each save

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, Introduction
History
started by Chad Walters and Jim
2006.11 G release monitors on http://www.aliyun.com/zixun/ Aggregation/14239.html ">bigtable
2007.2 inital HBase prototype created as Hadoop contrib
2007.10-A-useable Hbase
2008.1 Hadoop become Apache top-level project and Hbase becomes
subproject
2008.10 Hbase 0.18,0.19 Released
HBase is a bigtable open source Cottage version. is built on top of the HDFS, providing high reliability, high-performance, column storage, scalable, real-time read and write database system. It is between the NoSQL and the RDBMS, only retrieving data through the range of primary key (row key) and primary key, supporting only single-line transactions (complex operations such as multiple table joins can be implemented through hive support). Mainly used to store unstructured and semi-structured loose data.
like Hadoop, the Hbase goal relies heavily on scaling to increase computing and storage capabilities by increasing the number of inexpensive business servers. The tables in the
HBase typically have this feature:
1: A table can have hundreds of billions of rows, and millions of columns
2 are column-oriented: column (family) storage and permissions control, column (family) independent retrieval.
3 sparse: for null (NULL) columns, the storage space is not occupied, so the table can be designed very sparse.
The following picture is the position of hbase in the Hadoop ecosystem.

Two, logical view
HBase stores data as a table. The table consists of rows and columns. Columns are divided into several column families (row accessibility)
Row key
, like the NoSQL database, the row key is the primary key used to retrieve records. There are only three ways to access the rows in the HBase table:
1 access through a single row key
2 a range
3 full table scan through row key
row key row keys (row key) can be any string (maximum length is 64KB, The length of the actual application is generally 10-100bytes), within the hbase, the row key is saved as a byte array. When stored, the data is sorted according to the dictionary order of the row key (byte orders). When you design a key, you want to fully sort the storage feature and put together the row stores that are often read together. (Position Dependencies)
Note:
The result of the dictionary order for int is
1,10,100,11,12,13,14,15,16,17,18,19,2,20,21,..., 9,91,92,93,94,95,96,97,98,99. To maintain the natural order of the reshaping, the line keys must be filled with 0 left. One read and write to a row is an atomic operation (regardless of how many columns are read or written). This design decision makes it easy for the user to understand the behavior of the program when the concurrent update operation is performed on the same row. Each column in the
column family
HBase table belongs to a column family. The column family is part of the Chema of the table (and the column is not) and must be defined before the table is used. The column names are prefixed by the column family. For example, Courses:history,courses:math belong to the courses clan.
access control, disk, and memory usage statistics are performed at the column family level. Practical application, control permissions on the column family can help us manage different types of applications: we allow some applications to add new basic data, some applications can read basic data and create inherited column families, and some applications only allow browsing of data (and may even be unable to browse all data for privacy reasons). The
Timestamp
HBase, determined by row and columns, is called a cell for a storage unit. Each cell holds multiple versions of the same data. The version is indexed by the timestamp. The timestamp type is a 64-bit integral type. Timestamps can be assigned by HBase (automatically when data is written), when the timestamp is the current system time that is accurate to milliseconds. The timestamp can also be explicitly assigned by the customer. If your application wants to avoid data version conflicts, you mustYou must generate a unique timestamp yourself. In each cell, different versions of the data are sorted in reverse chronological order, that is, the most recent data is in the front.
to avoid the burden of management (including storage and indexing) caused by excessive versions of data, HBase provides two ways to recycle data versions. The first is to save the last n versions of the data, and the second is to save the latest version (for example, the last seven days). Users can set up for each column family. The
Cell
is uniquely determined by {row key, column (=<family> + <label>), version}. The data in the cell
is not of type, and is all byte-code-form storage.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More