What is HBase?

Source: Internet
Author: User
Tags hadoop ecosystem

HBase is a distributed, column-oriented database built on the Hadoop file system. It is an open source project and is scaled horizontally.

HBase is a data model, similar to Google's large table design, which provides fast random access to massive structured data. It leverages the fault-tolerant capabilities provided by the file system (HDFS) of Hadoop.

It is a hadoop ecosystem that provides random, real-time read/write access to data and is part of the Hadoop file system.

People can store HDFS data directly or through HBase. Use HBase to read consumption/random access data in HDFs. HBase is on the file system of Hadoop and provides read and write access. HBase and HDFS

HDFS HBase
HDFs is a distributed file system that is suitable for storing large volumes of files. HBase is a database built on top of HDFs.
HDFs does not support fast individual record lookup. HBase provides quick lookup in large tables
It provides high latency batch processing, and no batch processing concepts. It provides billions of records for low latency access to single Row Records (random access).
The data it provides can only be accessed sequentially. HBase internally uses a hash table and provides random access, and it stores the index to quickly find the data in the HDFs file.
storage mechanism for HBase

HBase is a column-oriented database that is sorted by rows in a table. The table schema definition can only be column family, that is, the key value pair. A table has multiple column families and each column family can have any number of columns. The values of subsequent columns are stored continuously on disk. Each cell value in the table has a timestamp. In short, in a hbase: A table is a collection of rows. Rows are collections of column families. A column family is a collection of columns. A column is a collection of key-value pairs.

An example of the hbase pattern is shown in the table below.

col3
Rowide Column Family Column Family Column Family Column Family
col1 col2 col3 col1 col2 col2 col3 col1 col2 col3
1
2
3
column-oriented and row-oriented

A column-oriented database is the part that stores the data tables as columns of data, not as row data. In short, they have a row family.

row-type database column-type database
It applies to online transaction processing (OLTP). It is suitable for online analytical processing (OLAP).
Such a database is designed to be a small number of rows and columns. A large table for column-oriented database design.

The following illustration shows the column family in a column-oriented database: HBase and RDBMS

HBase RDBMS
HBase, which does not have the concept of a fixed column pattern, defines only the column family. An RDBMS has its schema, which describes the constraints of the overall structure of the table.
It is created specifically as a wide table. HBase is spread horizontally. These are thin and specially designed for small tables. It's hard to form a scale.
No transaction exists in HBase. The RDBMS is transactional.
It is anti normalized data. It has normalized data.
It is very good for semi-structured and structured data. For structured data is very good.
Characteristics of HBaseHBase linear extensible. It has automatic fault support. It provides a consistent read and write. It integrates Hadoop as a source and destination. Client-friendly Java APIs. It provides replication across clustered data. where to use the hbase. Apache HBase was once a random, real-time read/write access to large data. It is hosted on the top of the cluster common hardware is a very large table. Apache HBase is the previous Google bigtable analog-relational database. BigTable on Google File system operations, similar to the Apache HBase work at the top of the Hadoop HDFs. the application of HBaseIt is used when there is a need to write heavy applications. HBase is used when we need to provide fast random access data. Many companies, such as Facebook,twitter, Yahoo, and Adobe, are using HBase.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.