Non-relational distributed database: HBase

Last Update:2014-12-22 Source: Internet

Author: User

Keywords nbsp ; can each that is

Tags access access control address api array based basic bigtable

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

HBase is a distributed, column-oriented, open source database based on Google's article "Bigtable: A Distributed Storage System for Structured Data" by Fay Chang. Just as Bigtable takes advantage of the distributed data storage provided by Google's File System, HBase provides Bigtable-like capabilities over Hadoop. HBase implements the compression algorithms, memory manipulation, and Bloom filters mentioned in the Bigtable paper on the columns. Base is a Apache Hadoop database that provides random, real-time read and write access to large data. The goal of HBase is to store and process large data. HBase tables can be used as inputs and outputs for MapReduce tasks, accessed via the Java API, and accessed through REST, Avro, or Thrift's APIs. HBase is different from the general relational database, which is a suitable database for unstructured data storage. Another difference is that HBase is based on a column rather than a row-based model.

HBase is different from the general relational database, which is a suitable database for unstructured data storage. The so-called unstructured data storage HBase is based on the column rather than row-based mode, so easy to read and write big data content. HBase is a data storage method between Map Entry (key & value) and DB Row. It's a little bit like Memcache, which is now popular, but not just a simple key that corresponds to a value. You probably need to store data structures for multiple attributes, but there is not as much correlation between traditional database tables. This is called loose data . In a nutshell, the table you create in HBase can be thought of as a large table, and the properties of the table can be dynamically increased on demand. There is no query associated with a table in HBase. You just have to tell your data to be stored in the HBase column families, without having to specify the specific type: char, varchar, int, tinyint, text, and so on. However, you need to be aware that HBase does not include the functionality of such a transaction. As with hadoop, HBase targets rely primarily on scale-out, adding computing and storage capacity by adding more and less expensive commodity servers. HBase in the table generally have such characteristics:

Large: a table can have billions of rows, millions of columns for the column: for column (family) storage and access control, column (family) independent search. Sparse: Null (null) columns, does not take up storage space, so the table can be designed very sparse.

The advantages of HBase:

Column can be dynamically increased, and the column is empty, no data is stored, saving storage space HBase automatically splits data, making the data storage automatically level scalabilityHBase can provide high concurrent read and write support

The disadvantage of HBase:

Can not support the condition query, only supports the query in accordance with the Row key Temporarily can not support the failover of the Master server, when the Master down, the entire storage system will hang up

The figure below is where HBase is in the Hadoop Ecosystem:

HBase is located in the structured storage layer, around HBase, HBase support for each component:

HDFS: Highly Reliable Underlying Storage Support MapReduce: High-Performance Computing Zookeeper: Stable Services and Failover Pig & Hive: High-Level Language Support for Data Statistics Sqoop: Provides RDBMS Data Import for Legacy Database Migration to HBase

HBase data model

HBase stores data as tables. Table has rows and columns. Column is divided into a number of column families (row family)

Row Key: Table primary key row Table records in accordance with the Row Key Timestamp: each time the corresponding data operation timestamp, that is, the data version numberColumn Family: cluster, a table in the horizontal direction of one or more cluster, Columns can be composed of any number of columns, the cluster supports dynamic expansion, without the need to pre-define the number and type, binary storage, the user need to type conversion

Row Key

Like the nosql database, row key is the primary key used to retrieve records. There are only three ways to access a row in HBase table:

Through a single row key access through the row key range full table scan

Row key The row key (Row key) can be any string (the maximum length is 64KB, the actual length of the application is generally 10-100bytes), in HBase, the row key is saved as a byte array. When stored, the data is sorted by Row key's byte order. Design key, to fully sort the storage of this feature will often read together to store the line. (Location Relevance). Need to pay attention to: the dictionary order of the results of the int is 1,10,100,11,12,13,14,15,16,17,18,19,2,20,21, ..., 9,91,92,93, 94,95,96,97,98,99. To preserve the natural order of plastic, the row key must be left filled with 0. Reading and writing a row is atomic operation (no matter how many columns at a time). This design decision enables users to easily understand the behavior of programs when they perform concurrent updates on the same row.

Lie family

Each column in the HBase table belongs to a column family. Column families are part of the schema of the table (not columns) and must be defined before using the table. Column names are listed as a prefix. For example courses: history, courses: math belong to the courses of this column family.

Access control, disk and memory usage statistics are at the column family level. In practice, control of column families helps us to manage different types of applications: we allow some applications to add new basic data, some to read basic data and create inherited column families, and others to browse only Data (it is possible that you may not even be able to browse all the data for privacy reasons).

Time stamp

A storage cell identified by row and columns in HBase is called a cell. Each cell holds multiple versions of the same data. The version is indexed by the timestamp. The type of timestamp is a 64-bit integer. The timestamp can be assigned by HBase (automatically at data write time), where the timestamp is the current system time that is accurate to milliseconds. The time stamp can also be assigned explicitly by the customer. If your application wants to avoid data version conflicts, you must generate its own unique timestamp. In each cell, different versions of the data are sorted in reverse chronological order, that is, the newest data comes first. In order to avoid the management (including storage and indexing) burden caused by excessive versions of data, HBase provides two data version recovery methods. One is to save the last n versions of the data, and the second is to save the recent version (for example, the last seven days). Users can set for each column family. Cell is uniquely identified by {row key, column (= <family> + <label>), version}. Cell data is not type, all bytecode storage.

HBase physical storage

All data files in HBase are stored on a Hadoop HDFS file system in two main formats:

HFile HBase KeyValue data storage format, HFile Hadoop binary format file, in fact StoreFile is a lightweight HFile package, that is, the underlying StoreFile HFileHLog File, HBase WAL (Write Ahead Log) storage format, physical On Hadoop Sequence File

HFile

The format of HFile is:

HFile is divided into six parts:

Data Block Section - save the data in the table, this part can be compressed Meta Block section (optional) - Save user-defined kv pairs can be compressed. File Info section-Hfile meta-information, is not compressed, the user can also add your own meta-information in this section. Data Block Index - The index of the Data Block. The key of each index is the key of the first record of the indexed block. Meta Block Index Section (Optional) - The index of the Meta Block. Trailer - this section is fixed length. Save the offset of each section, when reading a HFile, it will first read Trailer, Trailer saves the start position of each section (the Magic Number section for security check), then, DataBlock Index will be read Access to memory, so that when retrieving a key, do not need to scan the entire HFile, but only from the memory to find the block where the key, through a disk io read the entire block to memory, and then find the required key . DataBlock Index LRU mechanism eliminated.

HFile files are indefinitely long, with only two fixed-length blocks: Trailer and FileInfo

Trailer pointer points to the starting point of other data files File Info document some of the Meta information, such as: AVG_KEY_LEN, AVG_VALUE_LEN, LAST_KEY, COMPARATOR, MAX_SEQ_ID_KEY, etc.

HFile Data Block, Meta Block usually compressed storage, compression can greatly reduce the network IO and disk IO, followed by the overhead is of course need to spend cpu compression and decompression. The target Hfile compression support two ways: Gzip, Lzo.

The Data Index and Meta Index blocks record the start of each Data and Meta block. Data Block is the basic unit of HBase I / O. For efficiency, HRegionServer has LRU-based Block Cache mechanism. The size of each Data block can be specified by parameters when creating a Table, the large Block is conducive to the order of Scan, Trunk Block conducive to random queries. In addition to the beginning of each Data block is a KeyValue apart from the mosaic, Magic content is a random number, the purpose is to prevent data corruption.

Each KeyValue pair in HFile is a simple byte array. This byte array contains a lot of items, and there is a fixed structure.

KeyLength and ValueLength: two fixed length, representing the length of the Key and Value Key part: Row Length is a fixed length value, said RowKey length, Row is RowKey, Column Family Length is a fixed length value, said Family length , Followed by Column Family, followed by Qualifier followed by two fixed-length numbers indicating that the Time Stamp and Key Type (Put / Delete) Value sections are not as complex as pure binary data

HLog File

The HLog file is a normal Hadoop Sequence File. The Key of the Sequence File is an HLogKey object. HLogKey records the ownership information of the data to be written, besides the name of the table and the region, the sequence number and the timestamp. The timestamp is "Write Time ", The starting value of the sequence number is 0, or is the most recent sequence number stored in the file system. The value of HLog Sequece File is HBase's KeyValue object, which corresponds to KeyValue in HFile.

HBase system architecture

Client

Contains access to the HBase interface, the client maintains some cache to speed up access to HBase, such as the location of the regione.

Use HBase RPC mechanism to communicate with HMaster and HRegionServer Client to communicate with HMaster for management class operation Client and HRegionServer for data read-write class operation

Zookeeper

Guarantee at any time, the cluster only one master, to avoid the HMaster single point of storage address problems for all Region entry. In real time, the status of the Region Server is monitored, and the local server and the offline server of the Region Server are notified to the master in real time. The HRegionServer registers itself with the Zookeeper in Ephedral mode. The HMaster keeps track of the health of each HRegionServer. The HBase schema is stored, including the tables, What is the table column family, Zookeeper Quorum storage-ROOT-table address, HMaster address

HMaster

There is no single point in the HMaster. Multiple HBAs can be started in HBase. The Master Election mechanism in Zookeeper ensures that there is always one Master running, which is mainly responsible for the management of Table and Region.

Manage user additions and deletions to a table. Check Operation Manage load balancing of HRegionServer. After adjusting Region Distribution, you are responsible for the distribution of the new Region, discovering the invalid Region Server and reassigning the region on it. Responsible for the failure after HRegionServer is down Region Relocation on HRegionServer Responsible for recycling of junk files on GFS

HRegionServer

The core module in HBase, responsible for responding to user I / O requests, reads and writes data to the HDFS file system

HRegionServer management column HRegion object Each HRegion corresponds to a Region in the Table, HRegion by multiple HStore Each HStore corresponds to a Column Family Table Storage Column Family is a centralized storage unit, so will have the same IO Column In a Column Family will be more efficient HRegionServer is responsible for cutting the region became too large in the course of operation

As you can see, client access to HBase data does not require HMaster participation (addressing access to zookeeper and HRegionServer, data access to read and write HRegioneServer), HMaster only to maintain the table metadata of the region and the region, the load is low.

HStore

HBase storage core. Consists of MemStore and StoreFile. MemStore is Sorted Memory Buffer. User to write data process:

Client Write -> Save MemStore until MemStore Fill -> Flush into a StoreFile, until it reaches a certain threshold -> Start Compact merge operation -> Multiple StoreFile merge into a StoreFile, at the same time the version of the merge and delete data -> When StoreFiles Compact, and gradually form a larger StoreFile -> a single StoreFile size exceeds a certain threshold, trigger the Split operation, the current Region Split into 2 Region, Region will be offline, the newly split 2 children Region will Is assigned by HMaster to the corresponding HRegionServer so that the pressure of the previous Region can be diverted to 2 Regions. From this process, we can see that HBase only adds data, and updates and deletions are all done during the Compact stage. Therefore, user writes only need to go back to memory to ensure immediate I / O performance.

HLog

In a distributed system environment, system errors or downtime can not be avoided. Once HRegionServer exits, the memory data in MemStore is lost. Introducing HLog prevents this situation. Each HRegionServer will have an HLog object. HLog is a class that implements Write Ahead Log. Each time a user writes a message to a Memstore, it also writes a piece of data to the HLog file. The HLog file periodically scrolls out and deletes Old file (data persisted to StoreFile). When HRegionServer terminates unexpectedly, HMaster will be perceived through Zookeeper. HMaster first processes the remaining HLog files, splits the log data of different regions into corresponding region directories, and then reallocates the invalid regions to obtain these regions HRegionServer In the process of Load Region, you will find that there is a history HLog needs to be processed, so the data in Replay HLog will be sent to MemStore and then flushed to StoreFiles to complete data recovery.

HBase visit

Native Java API, the most conventional and efficient access method for Hadoop MapReduce jobs Parallel batch HBase table data HBase Shell, HBase command line tool, the easiest interface for HBase management Thrift Gateway, Thrift Serialization technology, support C ++, PHP, Python, and more for other heterogeneous systems Online access to HBase table data REST Gateway, support for RESTful Http API Access to HBase, lifting of the language limit Pig, you can use the Pig Latin programming language to operate HBase Data, and Hive is similar to the essence of the final is compiled into MapReduce Job to deal with HBase table data, suitable for data statistics Hive, Release version of the current Hive has not yet joined the HBase support, but in the next version of Hive 0.7.0 will Support HBase, you can use a similar SQL language to access HBase

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More