Corresponds to HBase version 0.94.1, against the open source version and a release version of the work used
question : What happens after the input in the HBase shell flush ‘table_or_region_name‘
? What is the specific implementation? For an existing table, how do I estimate the time of the flush execution before doing the operation?
1. HBase Shell Portal
The HBase shell uses the Ruby implementation, which in Putty knocks hbase shell
, invokes the ${HBASE_HOME}/bin/hbase
bash script, which, according to shell
this parameter, triggers the call to Ruby code, and the relevant sections are as follows:
if["$COMMAND"="Shell" ] ; Thenif["$JRUBY _home"!="" ] ; Thenclasspath="$JRUBY _home/lib/jruby.jar: $CLASSPATH"hbase_opts="$HBASE _opts-djruby.home= $JRUBY _home-djruby.lib= $JRUBY _home/lib"Ficlass="org.jruby.main-x+o ${jruby_opts} ${hbase_home}/bin/hirb.rb"
Inside the hirb.rb, introduce the relevant package ( ${HBASE_HOME}/lib/ruby
under directory) and start a running CLI environment.
Get to the chase.
In the HBase shell, all executed commands are in the ${HBASE_HOME}/lib/ruby/shell/commands
directory with corresponding ${COMMAND}.rb
files.
Find FLUSH.RB, the core code is as follows:
def command (table_or_region_name) Format_simple_command Doadmin.flush (table_or_region_name) endend
Here's how to call the ADMIN.RB in this file:
@admin = org.apache.hadoop.hbase.client.HBaseAdmin.new (configuration)def Flush (table_or_ Region_name) @admin. Flush (table_or_region_name) End
Here, we find the entrance to the Java program and call HBaseAdmin.flush(table_or_region_name)
this method.
The following sections of the class diagram are as follows:
2. Hbaseadmin Packaging
The Hbaseadmin class contains three flush methods:
Public void throws IOException, interruptedexception {} Public void flush (bytethrows IOException, interruptedexception {}private void throws IOException {}
- The first, as a portal, converts the String argument to byte[] and gives the second
- The second, the main work method, according to the input parameter is region name, partition table, non-partitioned table, respectively to handle
- Third, flush the region separately
The first one skips over.
Second, clear logic:
- If the parameter is region, a third flush processing is called
- If it is not a partitioned table, it gets all the regions contained in the table, calling the third flush processing,
- If the partition table is handled differently from the other, a partitioned table is called to the public processing method execpartitiontableaction the anonymous class partitiontableactioncallablefactory is implemented separately.
Attention
- For tables without pre-partitioning, simply inside a for loop, the serial processing
- For partitioned tables, concurrency data structures are used in execpartitiontableaction for future, and partitions are processed in parallel
The third one, flush each region, is actually the final destination for all the case in the second flush.
In a third flush, the implementation code is as follows:
This . Connection.gethregionconnection (Sn.gethostname (), Sn.getport ()); rs.flushregion (HRI);
Hregioninterface is an abstract interface, and Flushregion is an abstract method. In this version of 0.94.1, only hregionserver implements the Hregioninterface interface, so you need to find the specific code implementation in Hregionserver.
3. Hregionserver Packaging
Inside the Hregionserver class, three flush implementations are included:
Public void flushregion (bytethrows illegalargumentexception, IOException {}public void flushregion (bytelongthrows illegalargumentexception, IOException {} @QosPriority (priority =100)publicvoidthrows Notservingregionexception, IOException {}
- First, simply pass in Regionname, identify region online, and then call
region.flushcache()
- The second, the incoming regionname and timeout timestamp ifolderthants, determines that region is online and does not time out, flush the data out
- The third, @QosPriority (priority=100) token, uses a custom declaration to assign a priority to the RPC call to the method, and the method body
checkOpen()
checks Regionserver online, callingregion.flushcache()
Next, look at flushcache()
the implementation below the Hregion class.
4. Hregion implementation
Flushcache is just an entry method that will do some preparation before flush, including: Set up task status monitoring, Judge Coprocessor, handle the put of not WAL, write locking, etc. After that, the internal method is called to internalFlushcache
start flush.
In the implementation of Internalflushcache method, some work of MVCC is done, and finally, the Flushcache method of Storeflusher is called.
Internalflushcache in order to ensure data consistency do a lot of checks, check, lock, the current skill is not enough, first labeled, into the next layer.
Look at the realization of the storeflusher.
5. Storeflusher implementation
Storeflusher is an interface, in 0.94.1 this version, only store.storeflusherimpl an implementation class.
As you can see in the Storeflusher interface, the flush operation is executed with 3 parts:
- Prepare, this is a short operation that creates a snapshot, which pauses the write operation
- Flushcache,flush does not block any operations on the store (read-write) during execution
- Commit, add the flush file to the store directory, clear the Memstore snapshot, short operation, will be enough to pause scan
6. Storeflusherimpl implementation
Storeflusherimpl is the Store class's internal private class, the previous mentioned Storeflusher 3 methods, implemented by STOREFLUSHERIMPL, prepare is its own implementation, Flushcache and commit are called external Store class method to complete.
6.1 Prepare
Public void prepare () {memstore.snapshot (); this. Snapshot = memstore.getsnapshot (); this. Snapshottimerangetracker = memstore.getsnapshottimerangetracker ();}
Called the Memstore method to do the snapshot.
6.2 Flushcache
The Flushcache method of calling the Store class from Storeflusherimpl, wrapping the internalFlushCache
method to implement.
The logic is quite clear:
- Start a storescanner to find the rows that need to be Flush based on timestamps and ScanType parameters
- Starts a storefile writer, writes the read data to a storefile, and returns the StoreFile path for use in subsequent commit phases
6.3 Commit
The commit method of the Storeflusherimpl class first invokes the method of the external store class commitFile
, and the main thing to do is two pieces:
- Move the Flushcache generated storefile to the store directory
- Update the relevant statistical parameters of the Store
The external store class is then called to update the storefile of the store class updateStorefiles
, and after the file is updated, it needs to be called needsCompaction()
to see if the file changes caused by this flush execution will trigger compaction. If compaction is triggered, a set of compaction related mechanisms will be initiated to continue, followed by a separate introduction.
At this point, manual flush operation behind the implementation, the initial carding completed. The front is just a comb of the call path, which continues to enrich and replenish.
HBase manual flush Mechanism Grooming