HBase underlying storage principle--I am, and Cassandra essentially no difference ah! Are all KV column storage, but one is peer to the other is a centralized type only!

Source: Internet
Author: User
Tags cassandra zookeeper hadoop wiki

Understanding HBase (an open source Google bigtable practical application) The biggest difficulty is what is HBase's data structure concept? First, HBase differs from the general relational database, which is a database suitable for unstructured data storage. The other difference is that HBase is column-based instead of row-based patterns.

Google's BigTable paper clearly explains what BigTable is:
BigTable is a loosely distributed, long-lasting, multidimensional sort of map, which is indexed by row key, column key, and timestamp. Each value is a contiguous byte array. (Abigtable is a sparse, distributed, persistent multidimensional SortedMap. The map is indexed by a row key, column key, and a timestamp; Eachvalue in the map are an uninterpreted array of bytes.)

The HBase schema page for the Hadoop wiki mentions:
HBase uses the same data model as bigtable. The user stores data rows in a table. A data row has an optional key and any number of columns. The table is loosely stored, so the user can define various columns for the row. (HBase uses a data model very similar to that ofbigtable. The Users store data rows in labelled tables. A data row has asortable key and an arbitrary number of columns. The table is storedsparsely, so and rows in the same table can have crazily-varyingcolumns, if the user likes.)

First, the structure of ideas

HBase is a Hadoop-based project, so we typically use the HDFs file system directly, where we don't deepin talk how HDFs constructs its Distributed file system, just know that although HBase has multiple regionserver concepts, Does not mean that data is persisted on regionserver, in fact, Regionserver is the dispatcher, managing regions, but the data is persisted on HDFs. With this in mind, in the discussion that follows, we directly abstract the filesystem into HDFs, and no longer delve into it.

HBase is a distributed database that uses zookeeper to manage clusters. At the architectural level is divided into master (leader in zookeeper) and multiple regionserver, the basic architecture


In the concept of hbase, regionserver corresponds to a node in a cluster, while a regionserver is responsible for managing multiple region. A region represents part of a table's data, so a table in HBase might require a number of region to store its data, but the data in each region is not cluttered, HBase defines a range of Rowkey for each region when it manages the region, and data that falls within a specific range will be handed over to a specific region, distributing the load across multiple nodes to take advantage of the benefits of the distribution. In addition, HBase automatically adjusts the location of region, and if a regionserver becomes hot (a large number of requests fall on the server-managed region), HBase moves the region to a relatively idle node, Ensure that the cluster environment is fully utilized in turn.

Second, the storage model

With architectural assurance, the next thing is to focus on the specific storage of the data. This is the work that every region has to bear. We know that a region represents a specific rowkey range of data in an HBase table, and HBase is a columnstore-oriented database, so there are multiple files in a region that store these columns. Data columns in HBase are organized by column families, so each column cluster has a corresponding data structure, and HBase abstracts the stored data structure of the cluster to store, and a store represents a column cluster.



So here you can see why we want to minimize unnecessary columns when we query, and the columns that are often queried together are organized into a single cluster: Because the more columns you need to query, the more store files you need to scan, the more time it takes.

Let's drill down to how the data is stored in the store. The implementation of HBase is the use of a LSM tree structure!

The LSM tree is improved by the B + tree, so let's start with a simple look at the B + tree.


This is a simple B + tree, meaning it goes without saying, but this data structure is not suitable for application scenarios in HBase. Such a data structure is very efficient in memory, but the data in HBase is stored in the file, if stored according to this structure, it means that each time we insert data by the first-level index to find files and then in the file to maintain the order of the data, which is undoubtedly inefficient. So HBase uses the structure of the LSM tree, the key is that each time the insertion operation will first enter the Memstore (memory buffer), when the Memstore reached the upper limit, HBase outputs the in-memory data as ordered StoreFile file data (sorted by Rowkey, version, column name, which is irrelevant to the column cluster because the store belongs to the same cluster). This creates a lot of small storefile in the store, and when these small file numbers reach a threshold, HBase uses a thread to merge the small file into a large file. In this way, HBase transforms the insert and move operations in inefficient files into simple file output and merge operations.

From the above, the data in the store data structure at the bottom of HBase is ordered, but the storefile is not necessarily orderly, and the store only needs to manage the index of the StoreFile. StoreFile. It is also possible to see why specifying versions and Rowkey can enhance the efficiency of queries, because queries of the specified version and Rowkey can take advantage of StoreFile's index to skip some data that definitely does not contain the target data.

HBase vs Cassandra
HBase Cassandra
Language Java Java
Starting point BigTable BigTable and Dynamo
License Apache Apache
Protocol Http/rest (also Thrift) Custom, Binary (Thrift)
Data distribution Table divided into multiple region exists on different region servers Improved consistent hashing (virtual node)
Storage target Large file Small files
Consistency Strong consistency Final consistency, Quorum NRW strategy
Architecture Master/slave P2p
High Availability Namenode is a single point of failure for HDFs Peer and de-centric design with no single point of failure
Elasticity of Region server expansion, evenly distributing region by publishing itself to Master,master Expansion needs to adjust data distribution between multiple nodes on the hash ring
Read/write performance Data read/write targeting may be up to 6 network RPCs with low performance. Very fast data and read and write positioning
Data conflict handling Optimistic concurrency controls (optimistic concurrency control) Vector clock
Temporary fault handling Region server downtime, redo Hlog Data callback mechanism: a node down, hash to the node's new data automatically routed to the next node to do hinted handoff, the source node after the recovery, push back to the source node.
Permanent failure Recovery Region server recovery, master re-assigns it region Merkle hash tree, synchronizing Merkle Tree with gossip protocol to maintain data consistency among cluster nodes
Member Communication and error detection Zookeeper Based on gossip
CAP 1, strong consistency, 0 data loss. 2, low availability. 3, the expansion is convenient. 1, weak consistency, data may be lost. 2, high availability. 3, the expansion is convenient.
Transferred from: https://yq.aliyun.com/articles/25706

HBase underlying storage principle--I am, and Cassandra essentially no difference ah! Are all KV column storage, but one is peer to the other is a centralized type only!

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.