Hadoop learning notes ----- hbase Theory

Last Update:2014-09-04 Source: Internet

Author: User

Tags columnar database

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Hbase

Hbase is a distributed, column-oriented open source database. This technology comes from the Google paper "bigtable: a distributed storage system for structured data" written by Chang et al ".

Just as bigtable uses the distributed data storage provided by the Google file system, hbase provides bittable-like capabilities on hadoop.

Hbase is a subproject of Apache hadoop project.

Hbase is different from general relational databases. It is a database suitable for storing unstructured data. The other difference is that hbase is column-based instead of Row-based.

Big Table ideas:

Take the student table that stores the link as an example:

In the idea of using bigtable, the big tables with three columns: Student ID (key), attribute (name, age,), and value (value)

All the tables in the world can be represented by big tables with three columns:

The id value of the row key object,

Properties,

Value

Bigtable query: Quick query of key-Value

Hbase logic model:

Store data in the form of tables;

A table consists of rows and columns. Each column belongs to a column family. The storage unit determined by rows and columns is called an element;

Each element stores multiple versions of the same data, which are identified by timestamps;

Habase has multiple tables

Columnfamily needs to be pre-defined, and columns in the columnfamily do not need to be pre-defined. columns in the column family are limited by delimiters;

The row key can be repeated;

Hbase solves the problem that HDFS file systems cannot be modified:

Mark Deletion

Set up a mechanism in the memory, data storage memory, data modification (append,) in the memory data reorganization mechanism,

After collecting data for a certain period of time in the memory, write a file to the hard disk as a block

Data is reorganized and merged at every break time, and small files are merged to solve the deletion problem.

Time-oriented query: Suitable for social networking websites and other application scenarios

Row key:

The row key is the unique identifier of a Data row in the table and serves as the primary key for record retrieval;

There are only three ways to access the rows in the table:

Access through a single row key

Range access for a given row key

Full table Scan

The row key can be any string with a maximum length of no more than 64 KB and stored in lexicographically;

For rows that are frequently read together, you must carefully design the row key values so that they can be stored together;

Column families and columns:

The column is represented as <column family >:< qualifier>

The column family is pre-defined, and columns in the column family are randomly added.

Hbase stores data on disks according to the columnfamily. This columnar database design is very suitable for data analysis;

It is recommended that the elements in the column family have the same read/write mode (such as long strings) to improve performance;

Column-oriented storage: When the row keys are the same, columns with the same family will be put together,

Timestamp:

The time corresponding to each data operation can be automatically generated by the system or displayed by the user;

Hbase supports two data version recycling methods:

1. Each data unit only stores the latest version of a specified number.

2. Save the version with the specified time length

Common Client time queries "the latest data from a certain time point" or "giving me all versions of Data"

The element is determined by the row key, column family qualifier, and unique timestamp;

The elements are stored in bytecode, and there are no types;

Hbase physical model:

650) This. width = 650; "Title =" hbase.png "src =" http://s3.51cto.com/wyfs02/M01/48/66/wKioL1QHVYKzK1M7AAIcMsaxpOo869.jpg "alt =" wkiol1qhvykzk1m7aaicmsaxpoo869.jpg "/>

This article from "Linux _ ant" blog, please be sure to keep this source http://onlyoulinux.blog.51cto.com/7941460/1548558

Hadoop learning notes ----- hbase Theory

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More