Transfer from http://blog.csdn.net/lifuxiangcaohui/article/details/40621067
HBase's three-dimensional sequential storage is defined as: Rowkey (row primary key), column key (columnfamily+qualifier), timestamp (timestamp) three-dimensional ordered storage.
1.rowkey, we know that Rowkey is the primary key for a row, and hbase can only use a rowkey, or a rowkey range, or scan, to find the data. So the design of Rowkey is very important, it is related to the query efficiency of your application layer. We know that rowkey are sorted in dictionary order. And the stored bytecode, the dictionary sort, we know, if it is the letter, that is the order of letters, for example, there are two rowkey,rowkey1:aaa222,rowkey2:bbb111, then Rowkey1 is in front of the Rowkey2, because by the dictionary, A row in front of B, if the first bit of Rowkey2 is also a, then according to the second place to compare, if the same, then the third is, the same back. This understanding, we are in accordance with the Rowkey range of query, we generally know startrowkey, if we pass the scan only startrowkey:d start, then the query is all larger than D is checked, and we only need to start with D data, That will be limited by Endrowkey. We can set the Endrowkey to: D start, the following according to your Rowkey combination to set, generally is the addition of startkey large one. For example, Rowkey design: The user id-date, then check a user one day of data, Startkey for 3231-20121212,endkey: 3231+201213, then you find that the user is 3231 in 20121212 this day of data.
2.column key
Column key is the second dimension, and after the data is sorted by Rowkey dictionary, if Rowkey is the same, it is sorted according to column key, and also by dictionary.
We should learn to use this when we design the table. Like our inbox. We sometimes need to sort by topic, so we can set the theme to our column key, which is designed as a columnfamily+ theme. The
3.timestamp
Timestamp timestamp is the third dimension, which is sorted in descending order, where the most recent data is first. There's nothing to say about this. Other blogs on the internet also mention more.
This article is basically a simple rowkey dictionary ordering (three-dimensional order)
Hbase Rowkey Design One