[Reprinted] compaction process in hbase

Last Update:2018-12-05 Source: Internet

Author: User

Tags sorts

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

When the client sends put () data to the hregion end, hregion checks whether the current memstore size is greater than the hbase parameter. hregion. memstore. flush. if the value of size is greater than, perform the flushcache () operation to refresh memstore on hregion to the store files file.

In flushcache, the system first checks whether the current region meets the following conditions:

Store Files number> value of the hbase. hstore. blockingstorefiles Parameter

If yes, the region will be placed in the queue of the compactsplitthread thread, waiting for compactsplit.

Continue the flushcache () operation, create a snapshot, and then refresh the snapshot to the HDFS store files.

After flushcache is complete, the system checks whether the number of files in the store is greater

The hbase. hstore. compaction. Min parameter. If yes, it is added to the compactsplitthread queue.

In the thread, compactsplitthread performs compact and split on region.

Compact combines store files that have previously been refreshed to HDFS multiple times into a file, and clears the previous logs.

Split is to split a large region into two region.

The following describes the merge process of compact.

Previously, store files were refreshed from memstore memory to HDFS. Each file is ordered. Now, we need to assemble these multiple sequential store files into a sequential file. How is hbase implemented?

Package each store file into a storefiletings class. This class has the following methods.

PublicKeyValue PEEK (); // obtain the current KV.

PublicKeyValue next () // obtain the next KV. Read the next kV from HDFS.

Public BooleanSeek (keyValue key) // assign values to the current KV.

Create a storefiletings object for the store files to be merged. The following code:

The filestocompact queue is the original store files.

List <storefileenders> scanners =

NewArraylist <storefiletings> (filestocompact. Size ());

For(Storefile file: filestocompact ){

Storefile. Reader r = file. createreader ();

Scanners. Add (R. getstorefilecks (cacheblocks, usepread ));

}

Then, use the storefilequeue object to wrap the previous storefilequeue object.

Signature =NewStoreiterator (This, Scan, scanners ,! Majorcompaction );

In this way, you only need to call the next () method of the sequence object to obtain the ordered kV pairs from these store files in sequence.

Then, append these ordered kV pairs to the newly created store file.

In this way, the entire merging process is completed. The Code is as follows:

Arraylist <keyValue> KVS = new arraylist <keyValue> ();

While (values. Next (KVS )){

If (writer = NULL &&! KVS. isempty ()){

Writer = createwriterintmp (maxkeycount,

This. compactioncompression );

}

If (writer! = NULL ){

// Output to writer:

For (keyValue KV: KVS ){

Writer. append (Kv );

}

Next, we need to pay attention to how the next () method in the storeprogram class obtains ordered kV pairs from its storefile1_object of multiple store files?

First, let's take a look at the constructors of the storefiles class. The keyvalueheap object is defined here, And the storefilepackages of the previously defined store files are packaged.

Storestores (store, scan, list <? Extends keyvalueers> scanners,

Boolean retaindeletesinoutput)

Throws ioexception {

This. Store = store;

This. cacheblocks = false;

This. isget = false;

Matcher = new scanquerymatcher (scan, store. getfamily (). getname (),

Null, store. TTL, store. comparator. getrawcomparator (),

Store. versionstoreturn (scan. getmaxversions (), retaindeletesinoutput );

// Seek all scanners to the initial key

For (keyvalueappstags: scanners ){

Criteria. Seek (matcher. getstartkey ());

} // Locate the marker object in all store files to the START key.

// Combine all seeked scanners with a heap

Heap = new keyvalueheap (scanners, store. comparator );

// Define the heap of a KV pair.

}

The storeiterator. Next () method shows that when storeiterator obtains the next ordered kV, it only needs to call the next () method of keyvalueheap to return the current smallest kV object.

KeyValue peeked = This. Heap. Peek ();

If (peeked = NULL ){

Close ();

Return false;

}

The most important thing is the keyvalueheap class.

This class has three variables:

Private priorityqueue <keyvalue0000> heap = NULL;

Queue of files objects

Private keyvalue1_current = NULL;

Region object of the current kV

Private kvscannercomparator comparator;

Comparison class of the consumer object. The default class is kvscannercomparator.

The priorityqueue priority queue in Java is used here. This queue sorts the elements in the order specified during construction. It can be sorted by comparator.

The comparison here is to obtain the current Kv of the stored objects of the store files before calling, and then compare them.

Public int compare (keyvalue1_left, keyvalue1_right ){

Int comparison = compare (left. Peek (), right. Peek ());

If (comparison! = 0 ){

Return comparison;

} Else {

Long leftsequenceid = left. getsequenceid ();

Long rightsequenceid = right. getsequenceid ();

If (leftsequenceid> rightsequenceid ){

Return-1;

} Else if (leftsequenceid <rightsequenceid ){

Return 1;

} Else {

Return 0;

}

When the store files upload object is added to the queue when the queue heap is built, all store files upload objects are sorted by kvscannercomparator, and the first Kv of the queue is always the smallest, assign the lowest kV value to the current variable.

Public keyvalueheap (list <? Extends keyvalueers> scanners,

Kvcomparator comparator ){

This. comparator = new kvscannercomparator (comparator );

If (! Scanners. isempty ()){

This. Heap = new priorityqueue <keyvalueue> (scanners. Size (),

This. comparator );

For (keyvalueappstags: scanners ){

If (response. Peek ()! = NULL ){

This. Heap. Add (listener );

} Else {

Response. Close ();

}

This. Current = heap. Poll ();

}

When you call the next () method, obtain the next smallest kV object from the queue object queue in store files. The Code is as follows:

Public Boolean next (list <keyValue> result, int limit) throws ioexception {

If (this. Current = NULL ){

Return false;

}

Internal1_currentasinternal = (internal1_) this. Current;

Boolean maycontainsmorerows = currentasinternal. Next (result, limit); // return the kV object of the current K in the current smallest vertex object

// Move the cursor object of the minimum element backward.

KeyValue pee = This. Current. Peek ();

// If the next kV pair is not null, it can be added to heap; otherwise, it is not added.

If (PEE = NULL |! Maycontainsmorerows ){

This. Current. Close ();

} Else {

This. Heap. Add (this. Current );

}

This. Current = This. Heap. Poll (); // obtain the smallest element of the iterator object from the heap.

Return (this. Current! = NULL );

}

Summary:

Store files are encapsulated using storefilepackages. The next () and peek () methods are provided to obtain the elements and current elements following the file.

Use keyvalueheap to sort all file storefile1_objects by the current kV size of the storefile1_object. If a stored object is the smallest kV, move it back to obtain the next element, to ensure the order.

Similarly, there is also a merge process in mapreduce, that is, merging the fragments transmitted by map on the reduce side. It is consistent with the result Merging Method in hbase.

Mapredue uses segment to wrap the file from map and provides the method to read the next kV from the current file. The mergequeue class inherits the priorityqueue and also provides the next () the method reads kV pairs from multiple files in sequence, and the detailed code is in the merger class.