Memstore introduction:
650) This. width = 650; "class =" aligncenter "src =" http://images.cnblogs.com/cnblogs_com/shitouer/247860/o_hbase_read_write_path2_small.png "width =" 441 "Height =" 271 "style =" white-space: normal; "/>
This is a rough description of the hbase read/write process;
Write Request Process: client ----------> Wal (write ahead log) -----------> memstore -------------> hfile ---------> end
Read Request Process: client ----------> memstore --------> blockcache -------> hfile ------------> end
Location of memstore in hbase:
Hbase is composed of master and hregionserver. In actual reading and writing, we have not had many opportunities to deal with the master, mainly hregionserver, it can be seen that each hregionserver is composed of one hlog and multiple region. A region contains multiple stores, and each strore is composed of one memstore and multiple storefiles. memstore is a region of hbase in the memory, the underlying storefile is hfile, which is a file in HDFS.
When memstore works:
Write: when the client initiates a write operation, the write operation is first written to the wal and then to the memstore. After certain Preset conditions are met, the content in memstore is written to storefile, And the write operation is complete.
(Now the problem is coming.
1. Why do we need to write data to Wal first?
WAL is a file in HDFS, and memstore is a block area in the memory. When we mention the memory, we can think of it as insecure. We can see that the data is stored only when the data in memstore is written to storefile, data is written to the disk. Therefore, when data in memstore is lost due to system downtime and has not been written to the disk, hbase restores data based on the wal file stored in HDFS.
2. Flush policy?
The following describes in detail.
)
Read: when the client initiates a read operation, hbase first searches for the memstore of the corresponding region. If it cannot be found, it will be searched in blockcache (blockcache is an optimized read policy for hbase, which will be explained below). If not, it will be searched in storefile (hfile), and The read operation is completed.
2. Introduction to flush
Flush is an important operation in hbase. We must configure a good flush policy to ensure the stability of the hbase cluster.
Flush is an operation for storing hbase data into a disk. After flush is performed, the data persists. Each flush operation generates a storefile in region and deletes edits in Wal.
Flush is a region level. When memstore in a store in a region reaches the Preset conditions, all sotre in a region.
The following are the logs generated when a table is flushed:
16:58:28, 801 info [priority. rpcserver. Handler = 1, Port = 60020] regionserver. hregionserver: flushing T1, 1413622522846.58fd75078b4a47b8c6a20705f23209b7.
16:58:28, 816 debug [priority. rpcserver. Handler = 1, Port = 60020] regionserver. hregion: started memstore flush for T1, region, current region memstore size 168
16:58:29, 457 info [priority. rpcserver. handler = 1, Port = 60020] regionserver. defaultstoreflusher: flushed, sequenceid = 3, memsize = 168, hasbloomfilter = true, into tmp file HDFS: // beh/hbase/data/default/T1/Hangzhou /. TMP/6ad49d65c8b94b678bab3c892bdb0d03
16:58:29, 733 debug [priority. rpcserver. handler = 1, Port = 60020] regionserver. hregionfilesystem: Committing store file HDFS: // beh/hbase/data/default/T1/58fd75078b4a47b8c6a20705f23209b7 /. TMP/6ad49d65c8b94b678bab3c892bdb0d03 as HDFS: // beh/hbase/data/default/T1/58fd75078b4a47b8c6a20705f23209b7/CF/release
16:58:29, 838 info [priority. rpcserver. handler = 1, Port = 60020] regionserver. hstore: added HDFS: // beh/hbase/data/default/T1/58fd75078b4a47b8c6a20705f23209b7/CF/Hangzhou, entries = 1, sequenceid = 3, filesize = 1021
16:58:29, 879 info [priority. rpcserver. Handler = 1, Port = 60020] regionserver. hregion: Finished memstore flush ~ 168/168, currentsize = 0/0 for Region T1, 1413622522846.58fd75078b4a47b8c6a20705f23209b7. in 1063 ms, sequenceid = 3, compaction requested = false
We can see that memstore is first flushed to. tmp, and then moved to the corresponding columnfamily under the region directory.
This article is from the "when" blog, please be sure to keep this source http://hellowode.blog.51cto.com/8646864/1565505
Memstore + flush for hbase