Hadoop learning notes ----- hbase Theory

Source: Internet
Author: User
Tags columnar database

Hbase

 

Hbase is a distributed, column-oriented open source database. This technology comes from the Google paper "bigtable: a distributed storage system for structured data" written by Chang et al ".

Just as bigtable uses the distributed data storage provided by the Google file system, hbase provides bittable-like capabilities on hadoop.

Hbase is a subproject of Apache hadoop project.

Hbase is different from general relational databases. It is a database suitable for storing unstructured data. The other difference is that hbase is column-based instead of Row-based.

 

 

Big Table ideas:

Take the student table that stores the link as an example:

In the idea of using bigtable, the big tables with three columns: Student ID (key), attribute (name, age,), and value (value)

 

All the tables in the world can be represented by big tables with three columns:

The id value of the row key object,

Properties,

Value

 

Bigtable query: Quick query of key-Value

 

Hbase logic model:

Store data in the form of tables;

A table consists of rows and columns. Each column belongs to a column family. The storage unit determined by rows and columns is called an element;

Each element stores multiple versions of the same data, which are identified by timestamps;

Habase has multiple tables

Columnfamily needs to be pre-defined, and columns in the columnfamily do not need to be pre-defined. columns in the column family are limited by delimiters;

 

The row key can be repeated;

 

Hbase solves the problem that HDFS file systems cannot be modified:

Mark Deletion

Set up a mechanism in the memory, data storage memory, data modification (append,) in the memory data reorganization mechanism,

After collecting data for a certain period of time in the memory, write a file to the hard disk as a block

Data is reorganized and merged at every break time, and small files are merged to solve the deletion problem.

 

Time-oriented query: Suitable for social networking websites and other application scenarios

 

 

Row key:

 

The row key is the unique identifier of a Data row in the table and serves as the primary key for record retrieval;

There are only three ways to access the rows in the table:

Access through a single row key

Range access for a given row key

Full table Scan

The row key can be any string with a maximum length of no more than 64 KB and stored in lexicographically;

For rows that are frequently read together, you must carefully design the row key values so that they can be stored together;

 

Column families and columns:

 

The column is represented as <column family >:< qualifier>

The column family is pre-defined, and columns in the column family are randomly added.

Hbase stores data on disks according to the columnfamily. This columnar database design is very suitable for data analysis;

It is recommended that the elements in the column family have the same read/write mode (such as long strings) to improve performance;

Column-oriented storage: When the row keys are the same, columns with the same family will be put together,

 

Timestamp:

 

The time corresponding to each data operation can be automatically generated by the system or displayed by the user;

Hbase supports two data version recycling methods:

1. Each data unit only stores the latest version of a specified number.

2. Save the version with the specified time length

Common Client time queries "the latest data from a certain time point" or "giving me all versions of Data"

The element is determined by the row key, column family qualifier, and unique timestamp;

The elements are stored in bytecode, and there are no types;

 


Hbase physical model:

650) This. width = 650; "Title =" hbase.png "src =" http://s3.51cto.com/wyfs02/M01/48/66/wKioL1QHVYKzK1M7AAIcMsaxpOo869.jpg "alt =" wkiol1qhvykzk1m7aaicmsaxpoo869.jpg "/>

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


This article from "Linux _ ant" blog, please be sure to keep this source http://onlyoulinux.blog.51cto.com/7941460/1548558

Hadoop learning notes ----- hbase Theory

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.