HBase manual flush Mechanism Grooming

Source: Internet
Author: User

Corresponds to HBase version 0.94.1, against the open source version and a release version of the work used

question : What happens after the input in the HBase shell flush ‘table_or_region_name‘ ? What is the specific implementation? For an existing table, how do I estimate the time of the flush execution before doing the operation?

1. HBase Shell Portal

The HBase shell uses the Ruby implementation, which in Putty knocks hbase shell , invokes the ${HBASE_HOME}/bin/hbase bash script, which, according to shell this parameter, triggers the call to Ruby code, and the relevant sections are as follows:

if["$COMMAND"="Shell" ] ; Thenif["$JRUBY _home"!="" ] ; Thenclasspath="$JRUBY _home/lib/jruby.jar: $CLASSPATH"hbase_opts="$HBASE _opts-djruby.home= $JRUBY _home-djruby.lib= $JRUBY _home/lib"Ficlass="org.jruby.main-x+o ${jruby_opts} ${hbase_home}/bin/hirb.rb"

Inside the hirb.rb, introduce the relevant package ( ${HBASE_HOME}/lib/ruby under directory) and start a running CLI environment.

Get to the chase.

In the HBase shell, all executed commands are in the ${HBASE_HOME}/lib/ruby/shell/commands directory with corresponding ${COMMAND}.rb files.

Find FLUSH.RB, the core code is as follows:

def command (table_or_region_name) Format_simple_command Doadmin.flush (table_or_region_name) endend

Here's how to call the ADMIN.RB in this file:

@admin = org.apache.hadoop.hbase.client.HBaseAdmin.new (configuration)def  Flush (table_or_ Region_name) @admin. Flush (table_or_region_name) End

Here, we find the entrance to the Java program and call HBaseAdmin.flush(table_or_region_name) this method.
The following sections of the class diagram are as follows:

2. Hbaseadmin Packaging

The Hbaseadmin class contains three flush methods:

 Public void throws IOException, interruptedexception {}  Public void flush (bytethrows  IOException, interruptedexception {}private void throws IOException {}


    • The first, as a portal, converts the String argument to byte[] and gives the second
    • The second, the main work method, according to the input parameter is region name, partition table, non-partitioned table, respectively to handle
    • Third, flush the region separately

The first one skips over.

Second, clear logic:

    • If the parameter is region, a third flush processing is called
    • If it is not a partitioned table, it gets all the regions contained in the table, calling the third flush processing,
    • If the partition table is handled differently from the other, a partitioned table is called to the public processing method execpartitiontableaction the anonymous class partitiontableactioncallablefactory is implemented separately.

Attention

    • For tables without pre-partitioning, simply inside a for loop, the serial processing
    • For partitioned tables, concurrency data structures are used in execpartitiontableaction for future, and partitions are processed in parallel

The third one, flush each region, is actually the final destination for all the case in the second flush.

In a third flush, the implementation code is as follows:

 This . Connection.gethregionconnection (Sn.gethostname (), Sn.getport ()); rs.flushregion (HRI);


Hregioninterface is an abstract interface, and Flushregion is an abstract method. In this version of 0.94.1, only hregionserver implements the Hregioninterface interface, so you need to find the specific code implementation in Hregionserver.

3. Hregionserver Packaging

Inside the Hregionserver class, three flush implementations are included:

 Public void flushregion (bytethrows  illegalargumentexception, IOException {}public  void flushregion (bytelongthrows  illegalargumentexception, IOException {} @QosPriority (priority =100)publicvoidthrows Notservingregionexception, IOException {}

    • First, simply pass in Regionname, identify region online, and then callregion.flushcache()
    • The second, the incoming regionname and timeout timestamp ifolderthants, determines that region is online and does not time out, flush the data out
    • The third, @QosPriority (priority=100) token, uses a custom declaration to assign a priority to the RPC call to the method, and the method body checkOpen() checks Regionserver online, callingregion.flushcache()

Next, look at flushcache() the implementation below the Hregion class.

4. Hregion implementation

Flushcache is just an entry method that will do some preparation before flush, including: Set up task status monitoring, Judge Coprocessor, handle the put of not WAL, write locking, etc. After that, the internal method is called to internalFlushcache start flush.

In the implementation of Internalflushcache method, some work of MVCC is done, and finally, the Flushcache method of Storeflusher is called.

Internalflushcache in order to ensure data consistency do a lot of checks, check, lock, the current skill is not enough, first labeled, into the next layer.

Look at the realization of the storeflusher.

5. Storeflusher implementation

Storeflusher is an interface, in 0.94.1 this version, only store.storeflusherimpl an implementation class.

As you can see in the Storeflusher interface, the flush operation is executed with 3 parts:

    1. Prepare, this is a short operation that creates a snapshot, which pauses the write operation
    2. Flushcache,flush does not block any operations on the store (read-write) during execution
    3. Commit, add the flush file to the store directory, clear the Memstore snapshot, short operation, will be enough to pause scan
6. Storeflusherimpl implementation

Storeflusherimpl is the Store class's internal private class, the previous mentioned Storeflusher 3 methods, implemented by STOREFLUSHERIMPL, prepare is its own implementation, Flushcache and commit are called external Store class method to complete.

6.1 Prepare
 Public void prepare () {memstore.snapshot (); this. Snapshot = memstore.getsnapshot (); this. Snapshottimerangetracker = memstore.getsnapshottimerangetracker ();}


Called the Memstore method to do the snapshot.

6.2 Flushcache

The Flushcache method of calling the Store class from Storeflusherimpl, wrapping the internalFlushCache method to implement.
The logic is quite clear:

    • Start a storescanner to find the rows that need to be Flush based on timestamps and ScanType parameters
    • Starts a storefile writer, writes the read data to a storefile, and returns the StoreFile path for use in subsequent commit phases
6.3 Commit

The commit method of the Storeflusherimpl class first invokes the method of the external store class commitFile , and the main thing to do is two pieces:

    • Move the Flushcache generated storefile to the store directory
    • Update the relevant statistical parameters of the Store

The external store class is then called to update the storefile of the store class updateStorefiles , and after the file is updated, it needs to be called needsCompaction() to see if the file changes caused by this flush execution will trigger compaction. If compaction is triggered, a set of compaction related mechanisms will be initiated to continue, followed by a separate introduction.

At this point, manual flush operation behind the implementation, the initial carding completed. The front is just a comb of the call path, which continues to enrich and replenish.

HBase manual flush Mechanism Grooming

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.