Original link http://my.oschina.net/sdzzboy/blog/164130
HBase stores data in the form of a table. As with relational databases, in HBase, tables consist of rows and columns.
Unlike relational databases, HBase also has the concept of "column family".
A table consists of several "column families", each of which contains several columns (column).
at the same time, each cell in the table has a timestamp .
So we can think of it as a three-dimensional database .
In addition to rows and columns, there is a time dimension in which different versions of each cell (cell) are saved.
Like a relational database, There is a primary key (row key) for each row in HBase.
HBase retrieves data from a row key.
HBase has three main ways to retrieve data:
1. retrieving a row from a single row key
2. return multiple records via the range of row key [row key Start,row key end]
3. full table scan, return to entire tables
In HBase , all rows are sorted by row key .
In physics,
each table is divided by rows into one or Multiple hregion .
a hregion contains a portion of the table, which is a number of rows .
Hregion are split by size, and each table starts with only one hregion,
as data continues to be inserted into the table, hregion continues to grow, When you increase to a threshold,, hregion will wait for the chapter. Two new hregion
As the rows in the table grow, there will be more and more hregion.
Hregion is the smallest unit of distributed storage and load balancing in Hbase .
The smallest unit means that different hregion can be distributed on different hregion servers. However, a hregion is not split across multiple servers.
Although the hregion is the smallest unit of distributed storage, it is not the smallest unit of storage .
In fact
Hregion is made up of one or more Store ,
save one columns family per store.
Each strore is made up of one memstore and 0 to more storefile.
storefile saved on HDFS
Span style= "color: #ff0000;" >hlog (Wal log), the file is a log file
hlog all changes to record data
n (from a different table) logs are mixed together constantly append a single file relative to each HRegion Server maintains a hlog, Instead of each hregion a .
so different Regio writing multiple files at the same time, you can Reduce the number of disk addresses
The trouble is, if a region server downline, In order to restore the region on it, the log on region server will need to be To split,
It is then distributed to other region servers for recovery.
"Go" hbase Fundamentals