Memstore + flush for hbase

Source: Internet
Author: User
Tags tmp file
  1. Memstore introduction:

    650) This. width = 650; "class =" aligncenter "src =" http://images.cnblogs.com/cnblogs_com/shitouer/247860/o_hbase_read_write_path2_small.png "width =" 441 "Height =" 271 "style =" white-space: normal; "/>

    This is a rough description of the hbase read/write process;


Write Request Process: client ----------> Wal (write ahead log) -----------> memstore -------------> hfile ---------> end

Read Request Process: client ----------> memstore --------> blockcache -------> hfile ------------> end


Location of memstore in hbase:

Hbase is composed of master and hregionserver. In actual reading and writing, we have not had many opportunities to deal with the master, mainly hregionserver, it can be seen that each hregionserver is composed of one hlog and multiple region. A region contains multiple stores, and each strore is composed of one memstore and multiple storefiles. memstore is a region of hbase in the memory, the underlying storefile is hfile, which is a file in HDFS.

When memstore works:

Write: when the client initiates a write operation, the write operation is first written to the wal and then to the memstore. After certain Preset conditions are met, the content in memstore is written to storefile, And the write operation is complete.

(Now the problem is coming.

1. Why do we need to write data to Wal first?

WAL is a file in HDFS, and memstore is a block area in the memory. When we mention the memory, we can think of it as insecure. We can see that the data is stored only when the data in memstore is written to storefile, data is written to the disk. Therefore, when data in memstore is lost due to system downtime and has not been written to the disk, hbase restores data based on the wal file stored in HDFS.

2. Flush policy?

The following describes in detail.

)

Read: when the client initiates a read operation, hbase first searches for the memstore of the corresponding region. If it cannot be found, it will be searched in blockcache (blockcache is an optimized read policy for hbase, which will be explained below). If not, it will be searched in storefile (hfile), and The read operation is completed.


2. Introduction to flush

Flush is an important operation in hbase. We must configure a good flush policy to ensure the stability of the hbase cluster.

Flush is an operation for storing hbase data into a disk. After flush is performed, the data persists. Each flush operation generates a storefile in region and deletes edits in Wal.

Flush is a region level. When memstore in a store in a region reaches the Preset conditions, all sotre in a region.

The following are the logs generated when a table is flushed:

16:58:28, 801 info [priority. rpcserver. Handler = 1, Port = 60020] regionserver. hregionserver: flushing T1, 1413622522846.58fd75078b4a47b8c6a20705f23209b7.

16:58:28, 816 debug [priority. rpcserver. Handler = 1, Port = 60020] regionserver. hregion: started memstore flush for T1, region, current region memstore size 168

16:58:29, 457 info [priority. rpcserver. handler = 1, Port = 60020] regionserver. defaultstoreflusher: flushed, sequenceid = 3, memsize = 168, hasbloomfilter = true, into tmp file HDFS: // beh/hbase/data/default/T1/Hangzhou /. TMP/6ad49d65c8b94b678bab3c892bdb0d03

16:58:29, 733 debug [priority. rpcserver. handler = 1, Port = 60020] regionserver. hregionfilesystem: Committing store file HDFS: // beh/hbase/data/default/T1/58fd75078b4a47b8c6a20705f23209b7 /. TMP/6ad49d65c8b94b678bab3c892bdb0d03 as HDFS: // beh/hbase/data/default/T1/58fd75078b4a47b8c6a20705f23209b7/CF/release

16:58:29, 838 info [priority. rpcserver. handler = 1, Port = 60020] regionserver. hstore: added HDFS: // beh/hbase/data/default/T1/58fd75078b4a47b8c6a20705f23209b7/CF/Hangzhou, entries = 1, sequenceid = 3, filesize = 1021

16:58:29, 879 info [priority. rpcserver. Handler = 1, Port = 60020] regionserver. hregion: Finished memstore flush ~ 168/168, currentsize = 0/0 for Region T1, 1413622522846.58fd75078b4a47b8c6a20705f23209b7. in 1063 ms, sequenceid = 3, compaction requested = false


We can see that memstore is first flushed to. tmp, and then moved to the corresponding columnfamily under the region directory.

This article is from the "when" blog, please be sure to keep this source http://hellowode.blog.51cto.com/8646864/1565505

Memstore + flush for hbase

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.