What do you mean by fast? is based on the million-level record of fast query, or in real-time query data.
A: If fast query (read data from disk), HBase is based on Rowkey query, as long as can quickly locate Rowkey, can achieve fast query, mainly the following factors:
1, HBase can be divided into multiple regions, you can simply understand as a relational database of multiple partitions.
2, the key is a well-ordered
3, column-based storage
First, you can quickly find the region of the row (partition), assuming that the table has 1 billion records, Occupy space 1TB, divided into 500 regions, 1 regions accounted for 2 G. Up to 2G records can be found;
Second, is the column storage, in fact, is a column family, the assumption is divided into 3 columns family, each column family is 666M, if you want to query the things in which 1 column family, 1 column family contains 1 or more hstorefile, assuming a hstorefile is 128M, The column family consists of 5 hstorefile on disk. The rest is in memory.
Again, is in order, you want the record may be at the front, also may be in the last side, assuming in the middle, we only need to traverse 2.5 hstorefile Total 300M
Finally, each hstorefile (hfile package) is stored as a key-value pair (Key-value), as long as the position of the key in the data block is traversed, and the criteria can be determined. The general key is a finite length, assuming that the value is 1:19 (ignoring the other blocks on the hfile), and ultimately only 15M of the corresponding records can be obtained, according to the disk access 100m/s, only 0.15 seconds. With the block caching mechanism (LRU principle), higher efficiency is achieved.
B: Real-time query
Real-time query, can be considered to be from memory query, general response time in 1 seconds. The mechanism of hbase is that the data is written into memory first, when the amount of data reaches a certain amount (such as 128M), and then to disk, in memory, is not the updating or merging of data, only increase the data, this allows the user's write operation as long as enter memory can return immediately, ensure the high performance of hbase I/O.
Real-time query, that is, the reaction according to the current time of the data, can be considered that the data is always in memory, to ensure the real-time data response.
Why HBase enables fast queries