HBase underlying storage principle--I am, and Cassandra essentially no difference ah! Are all KV column storage, but one is peer to the other is a centralized type only!

Last Update:2017-01-12 Source: Internet

Author: User

Tags cassandra zookeeper hadoop wiki

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Understanding HBase (an open source Google bigtable practical application) The biggest difficulty is what is HBase's data structure concept? First, HBase differs from the general relational database, which is a database suitable for unstructured data storage. The other difference is that HBase is column-based instead of row-based patterns.

Google's BigTable paper clearly explains what BigTable is:
BigTable is a loosely distributed, long-lasting, multidimensional sort of map, which is indexed by row key, column key, and timestamp. Each value is a contiguous byte array. (Abigtable is a sparse, distributed, persistent multidimensional SortedMap. The map is indexed by a row key, column key, and a timestamp; Eachvalue in the map are an uninterpreted array of bytes.)

The HBase schema page for the Hadoop wiki mentions:
HBase uses the same data model as bigtable. The user stores data rows in a table. A data row has an optional key and any number of columns. The table is loosely stored, so the user can define various columns for the row. (HBase uses a data model very similar to that ofbigtable. The Users store data rows in labelled tables. A data row has asortable key and an arbitrary number of columns. The table is storedsparsely, so and rows in the same table can have crazily-varyingcolumns, if the user likes.)

First, the structure of ideas

HBase is a Hadoop-based project, so we typically use the HDFs file system directly, where we don't deepin talk how HDFs constructs its Distributed file system, just know that although HBase has multiple regionserver concepts, Does not mean that data is persisted on regionserver, in fact, Regionserver is the dispatcher, managing regions, but the data is persisted on HDFs. With this in mind, in the discussion that follows, we directly abstract the filesystem into HDFs, and no longer delve into it.

HBase is a distributed database that uses zookeeper to manage clusters. At the architectural level is divided into master (leader in zookeeper) and multiple regionserver, the basic architecture

In the concept of hbase, regionserver corresponds to a node in a cluster, while a regionserver is responsible for managing multiple region. A region represents part of a table's data, so a table in HBase might require a number of region to store its data, but the data in each region is not cluttered, HBase defines a range of Rowkey for each region when it manages the region, and data that falls within a specific range will be handed over to a specific region, distributing the load across multiple nodes to take advantage of the benefits of the distribution. In addition, HBase automatically adjusts the location of region, and if a regionserver becomes hot (a large number of requests fall on the server-managed region), HBase moves the region to a relatively idle node, Ensure that the cluster environment is fully utilized in turn.

Second, the storage model

With architectural assurance, the next thing is to focus on the specific storage of the data. This is the work that every region has to bear. We know that a region represents a specific rowkey range of data in an HBase table, and HBase is a columnstore-oriented database, so there are multiple files in a region that store these columns. Data columns in HBase are organized by column families, so each column cluster has a corresponding data structure, and HBase abstracts the stored data structure of the cluster to store, and a store represents a column cluster.

So here you can see why we want to minimize unnecessary columns when we query, and the columns that are often queried together are organized into a single cluster: Because the more columns you need to query, the more store files you need to scan, the more time it takes.

Let's drill down to how the data is stored in the store. The implementation of HBase is the use of a LSM tree structure!

The LSM tree is improved by the B + tree, so let's start with a simple look at the B + tree.

This is a simple B + tree, meaning it goes without saying, but this data structure is not suitable for application scenarios in HBase. Such a data structure is very efficient in memory, but the data in HBase is stored in the file, if stored according to this structure, it means that each time we insert data by the first-level index to find files and then in the file to maintain the order of the data, which is undoubtedly inefficient. So HBase uses the structure of the LSM tree, the key is that each time the insertion operation will first enter the Memstore (memory buffer), when the Memstore reached the upper limit, HBase outputs the in-memory data as ordered StoreFile file data (sorted by Rowkey, version, column name, which is irrelevant to the column cluster because the store belongs to the same cluster). This creates a lot of small storefile in the store, and when these small file numbers reach a threshold, HBase uses a thread to merge the small file into a large file. In this way, HBase transforms the insert and move operations in inefficient files into simple file output and merge operations.

From the above, the data in the store data structure at the bottom of HBase is ordered, but the storefile is not necessarily orderly, and the store only needs to manage the index of the StoreFile. StoreFile. It is also possible to see why specifying versions and Rowkey can enhance the efficiency of queries, because queries of the specified version and Rowkey can take advantage of StoreFile's index to skip some data that definitely does not contain the target data.

HBase vs Cassandra

	HBase	Cassandra
Language	Java	Java
Starting point	BigTable	BigTable and Dynamo
License	Apache	Apache
Protocol	Http/rest (also Thrift)	Custom, Binary (Thrift)
Data distribution	Table divided into multiple region exists on different region servers	Improved consistent hashing (virtual node)
Storage target	Large file	Small files
Consistency	Strong consistency	Final consistency, Quorum NRW strategy
Architecture	Master/slave	P2p
High Availability	Namenode is a single point of failure for HDFs	Peer and de-centric design with no single point of failure
Elasticity of	Region server expansion, evenly distributing region by publishing itself to Master,master	Expansion needs to adjust data distribution between multiple nodes on the hash ring
Read/write performance	Data read/write targeting may be up to 6 network RPCs with low performance.	Very fast data and read and write positioning
Data conflict handling	Optimistic concurrency controls (optimistic concurrency control)	Vector clock
Temporary fault handling	Region server downtime, redo Hlog	Data callback mechanism: a node down, hash to the node's new data automatically routed to the next node to do hinted handoff, the source node after the recovery, push back to the source node.
Permanent failure Recovery	Region server recovery, master re-assigns it region	Merkle hash tree, synchronizing Merkle Tree with gossip protocol to maintain data consistency among cluster nodes
Member Communication and error detection	Zookeeper	Based on gossip
CAP	1, strong consistency, 0 data loss. 2, low availability. 3, the expansion is convenient.	1, weak consistency, data may be lost. 2, high availability. 3, the expansion is convenient.

Transferred from: https://yq.aliyun.com/articles/25706

HBase underlying storage principle--I am, and Cassandra essentially no difference ah! Are all KV column storage, but one is peer to the other is a centralized type only!

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More