Why HBase enables fast queries

Source: Internet
Author: User

What do you mean by fast? is based on the million-level record of fast query, or in real-time query data.

A: If fast query (read data from disk), HBase is based on Rowkey query, as long as can quickly locate Rowkey, can achieve fast query, mainly the following factors:
1, HBase can be divided into multiple regions, you can simply understand as a relational database of multiple partitions.
2, the key is a well-ordered
3, column-based storage

First, you can quickly find the region of the row (partition), assuming that the table has 1 billion records, Occupy space 1TB, divided into 500 regions, 1 regions accounted for 2 G. Up to 2G records can be found;

Second, is the column storage, in fact, is a column family, the assumption is divided into 3 columns family, each column family is 666M, if you want to query the things in which 1 column family, 1 column family contains 1 or more hstorefile, assuming a hstorefile is 128M, The column family consists of 5 hstorefile on disk. The rest is in memory.

Again, is in order, you want the record may be at the front, also may be in the last side, assuming in the middle, we only need to traverse 2.5 hstorefile Total 300M

Finally, each hstorefile (hfile package) is stored as a key-value pair (Key-value), as long as the position of the key in the data block is traversed, and the criteria can be determined. The general key is a finite length, assuming that the value is 1:19 (ignoring the other blocks on the hfile), and ultimately only 15M of the corresponding records can be obtained, according to the disk access 100m/s, only 0.15 seconds. With the block caching mechanism (LRU principle), higher efficiency is achieved.

B: Real-time query
Real-time query, can be considered to be from memory query, general response time in 1 seconds. The mechanism of hbase is that the data is written into memory first, when the amount of data reaches a certain amount (such as 128M), and then to disk, in memory, is not the updating or merging of data, only increase the data, this allows the user's write operation as long as enter memory can return immediately, ensure the high performance of hbase I/O.

Real-time query, that is, the reaction according to the current time of the data, can be considered that the data is always in memory, to ensure the real-time data response.

Why HBase enables fast queries

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.