HBase manual flush Mechanism Grooming

Last Update:2016-01-23 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Corresponds to HBase version 0.94.1, against the open source version and a release version of the work used

question : What happens after the input in the HBase shell flush ‘table_or_region_name‘ ? What is the specific implementation? For an existing table, how do I estimate the time of the flush execution before doing the operation?

1. HBase Shell Portal

The HBase shell uses the Ruby implementation, which in Putty knocks hbase shell , invokes the ${HBASE_HOME}/bin/hbase bash script, which, according to shell this parameter, triggers the call to Ruby code, and the relevant sections are as follows:

if["$COMMAND"="Shell" ] ; Thenif["$JRUBY _home"!="" ] ; Thenclasspath="$JRUBY _home/lib/jruby.jar: $CLASSPATH"hbase_opts="$HBASE _opts-djruby.home= $JRUBY _home-djruby.lib= $JRUBY _home/lib"Ficlass="org.jruby.main-x+o ${jruby_opts} ${hbase_home}/bin/hirb.rb"

Inside the hirb.rb, introduce the relevant package ( ${HBASE_HOME}/lib/ruby under directory) and start a running CLI environment.

Get to the chase.

In the HBase shell, all executed commands are in the ${HBASE_HOME}/lib/ruby/shell/commands directory with corresponding ${COMMAND}.rb files.

Find FLUSH.RB, the core code is as follows:

def command (table_or_region_name) Format_simple_command Doadmin.flush (table_or_region_name) endend

Here's how to call the ADMIN.RB in this file:

@admin = org.apache.hadoop.hbase.client.HBaseAdmin.new (configuration)def  Flush (table_or_ Region_name) @admin. Flush (table_or_region_name) End

Here, we find the entrance to the Java program and call HBaseAdmin.flush(table_or_region_name) this method.
The following sections of the class diagram are as follows:

2. Hbaseadmin Packaging

The Hbaseadmin class contains three flush methods:

 Public void throws IOException, interruptedexception {}  Public void flush (bytethrows  IOException, interruptedexception {}private void throws IOException {}

The first, as a portal, converts the String argument to byte[] and gives the second
The second, the main work method, according to the input parameter is region name, partition table, non-partitioned table, respectively to handle
Third, flush the region separately

The first one skips over.

Second, clear logic:

If the parameter is region, a third flush processing is called
If it is not a partitioned table, it gets all the regions contained in the table, calling the third flush processing,
If the partition table is handled differently from the other, a partitioned table is called to the public processing method execpartitiontableaction the anonymous class partitiontableactioncallablefactory is implemented separately.

Attention

For tables without pre-partitioning, simply inside a for loop, the serial processing
For partitioned tables, concurrency data structures are used in execpartitiontableaction for future, and partitions are processed in parallel

The third one, flush each region, is actually the final destination for all the case in the second flush.

In a third flush, the implementation code is as follows:

 This . Connection.gethregionconnection (Sn.gethostname (), Sn.getport ()); rs.flushregion (HRI);

Hregioninterface is an abstract interface, and Flushregion is an abstract method. In this version of 0.94.1, only hregionserver implements the Hregioninterface interface, so you need to find the specific code implementation in Hregionserver.

3. Hregionserver Packaging

Inside the Hregionserver class, three flush implementations are included:

 Public void flushregion (bytethrows  illegalargumentexception, IOException {}public  void flushregion (bytelongthrows  illegalargumentexception, IOException {} @QosPriority (priority =100)publicvoidthrows Notservingregionexception, IOException {}

First, simply pass in Regionname, identify region online, and then callregion.flushcache()
The second, the incoming regionname and timeout timestamp ifolderthants, determines that region is online and does not time out, flush the data out
The third, @QosPriority (priority=100) token, uses a custom declaration to assign a priority to the RPC call to the method, and the method body checkOpen() checks Regionserver online, callingregion.flushcache()

Next, look at flushcache() the implementation below the Hregion class.

4. Hregion implementation

Flushcache is just an entry method that will do some preparation before flush, including: Set up task status monitoring, Judge Coprocessor, handle the put of not WAL, write locking, etc. After that, the internal method is called to internalFlushcache start flush.

In the implementation of Internalflushcache method, some work of MVCC is done, and finally, the Flushcache method of Storeflusher is called.

Internalflushcache in order to ensure data consistency do a lot of checks, check, lock, the current skill is not enough, first labeled, into the next layer.

Look at the realization of the storeflusher.

5. Storeflusher implementation

Storeflusher is an interface, in 0.94.1 this version, only store.storeflusherimpl an implementation class.

As you can see in the Storeflusher interface, the flush operation is executed with 3 parts:

Prepare, this is a short operation that creates a snapshot, which pauses the write operation
Flushcache,flush does not block any operations on the store (read-write) during execution
Commit, add the flush file to the store directory, clear the Memstore snapshot, short operation, will be enough to pause scan

6. Storeflusherimpl implementation

Storeflusherimpl is the Store class's internal private class, the previous mentioned Storeflusher 3 methods, implemented by STOREFLUSHERIMPL, prepare is its own implementation, Flushcache and commit are called external Store class method to complete.

6.1 Prepare

 Public void prepare () {memstore.snapshot (); this. Snapshot = memstore.getsnapshot (); this. Snapshottimerangetracker = memstore.getsnapshottimerangetracker ();}

Called the Memstore method to do the snapshot.

6.2 Flushcache

The Flushcache method of calling the Store class from Storeflusherimpl, wrapping the internalFlushCache method to implement.
The logic is quite clear:

Start a storescanner to find the rows that need to be Flush based on timestamps and ScanType parameters
Starts a storefile writer, writes the read data to a storefile, and returns the StoreFile path for use in subsequent commit phases

6.3 Commit

The commit method of the Storeflusherimpl class first invokes the method of the external store class commitFile , and the main thing to do is two pieces:

Move the Flushcache generated storefile to the store directory
Update the relevant statistical parameters of the Store

The external store class is then called to update the storefile of the store class updateStorefiles , and after the file is updated, it needs to be called needsCompaction() to see if the file changes caused by this flush execution will trigger compaction. If compaction is triggered, a set of compaction related mechanisms will be initiated to continue, followed by a separate introduction.

At this point, manual flush operation behind the implementation, the initial carding completed. The front is just a comb of the call path, which continues to enrich and replenish.

HBase manual flush Mechanism Grooming

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

HBase manual flush Mechanism Grooming

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

HBase manual flush Mechanism Grooming

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support