Hbase source code analysis: org. Apache. hadoop. hbase. regionserver package

Source: Internet
Author: User
1. Under what conditions does splitpolicy meet? In version 0.94, the default value is increasingtoupperboundregionsplit1_y1) constantsizeregionsplitpolicy, 2) keyprefixregionsplitpolicy, 3) regionsplitpolicy, you can specify the policy implementation hbase in the configuration file. regionserver. region. split. policy. The default value is increasingtoupperboundregionsplitpolicy4) increasingtoupperboundregionsplitpolicy. hbase is not set for split. hregion. max. filesize (10 Gb by default) is very large, so it cannot be split. The following algorithms are required. For more information, see

IncreasingToUpperBoundRegionSplitPolicy is 0.94.0 default region split policy

Here, split has a judgment condition. Calculate the number of online region of the table on the tableregionscount (regionserver), and then calculate whether all the stores of this region are too large, this is calculated using the getsizetocheck method. If the total size of the current store is greater than this value, the region needs to be split. the getsizetocheck calculation method first checks whether tableregionscount is equal to 0. If so, hbase is returned. hregion. max. filesize. If not, math is calculated. min (getdesiredmaxfilesize (),

This. flushSize * (tableRegionsCount * tableRegionsCount ).

2. memstore flush, compact, and split

Flushrequester, memstoreflusher, implements the flushrequester interface. Based on the thread service, it is initialized when the regionserver is started, mainly for cache in flush memstore. Some parameters need to be noted here,
Hbase. regionserver. global. memstore. upperLimithbase. regionserver. global. memstore. lowerLimit
*. If the global memstore size exceeds HighWaterMark (0.4), Block, force flush region one by one and wait until the memstore size is smaller than HighWaterMark (0.4)
* If the global memstore size is between HighWaterMark (0.4) and LowWaterMark (0.35), submit a flush request to the flush thread. This is not blocked.

Hbase. hregion. memstore. Flush. Size indicates the sum of memstoresize of all memstores in region. If memstoresize is greater than this value, the flush operation is requested.
Hbase. hregion. memstore. block. multiplier (2), memstore default size (flushsize) 2 times (default) ----- blockingmemstorefile (the region server checks whether the total size of memstore corresponding to each region exceeds twice the default size of memstore (hbase. hregion. memstore. block. (determined by the multiplier). If this parameter is exceeded, the memstore will be locked so that new write requests are not allowed in and flush will be triggered to avoid oom)

Hbase. hstore. compaction. ratio (. 12f), when a file <= sum (a file smaller than it) * hbase. hstore. compaction. ratio will be selected for hbase. hstore. compaction. min (hbase. hstore. compactionthreshold, old version) Minimum number of storefiles selected before one compression: 3hbase. hstore. compaction. MAX: the maximum number of hstorefiles merged for each "small. Default Value: 10hbase. hstore. blockingstorefiles when an hstore contains more than this value of hstorefiles (each memstore flush generates an hstorefile), a merge operation is executed, and update is blocked until the merge is completed until hbase is exceeded. hstore. the default value of blockingwaittime is 7hbase. hstore. blockingwaittimehbase. hstore. the number of storefiles restricted by blockingstorefiles may cause update blocking. This time is used to limit the blocking time. When this time is exceeded, hregion stops the update blocking operation, but the merge operation is still incomplete. The default value is 90s. The default value is 90000hbase. hregion. majorcompaction. The time interval between major compactions of all hstorefiles in a region. The default value is 1 day. If it is set to 0, this function is disabled.
During the put and delete operations of region, check whether the size exceeds the memstore limit. If the size exceeds the limit, call hregion. requestflush ---> memstoreflusher. requestflush requests are flush to be processed by the flush thread. So when will the flush cache be called? The call to flush cache can be initiated by the hbaseadmin management command (hbaseadmin --> regionserver --> hregion. flushcache, which calls the storeflusher interface for operations). It can also be run by the memstoreflusher thread to initiate the flushregion operation (when hregion is called. before flushcache, determine whether there are too many storefiles> hbase. hstore. blockingstorefiles. If yes, perform the split & compact operation. The flushcache operation returns whether the Compact operation is required.) After the flush operation is completed (see the instructions below, it determines whether the split (increasingtoupperboundregionsplitpolicy) and compact are required. If necessary, the compactsplitthread of the regionserver calls store. compact (compactionrequest
Cr) will block store write operations ). During the flush process, Boolean hregion is called. flushcache (), returns whether to flush the mark, whether compact --- internalflushcache (final hlog Wal, final long myseqid, monitoredtask status) flush memstore to disk, see storeflusher introduction to write flush completed to log, with seq num for memstore flush clear data in memstore storeflusher, prepare --- create snapshot, during the creation process, the write operation (hregion. updateslock. writelock (). lock ();)
FlushCache --- flush cache, create storeFile commit --- submit flush, add storeFile to Store, and clear memstore snapshot (determine whether to perform compact operation) (call Store. updateStorefiles will apply a write lock to the lock of the store, so that other data writing operations will block the <read lock>)

 

MemStore and snapshot will block write operations (lock. writeLock (). lock () add/delete/getNextRow (lock. readLock (). lock (). snapshot cannot be performed at the same time.

 

CompactSplitThread: enables multi-threaded services to perform compact and split operations, and blocks write operations (compactSplitThread calls Store. compact (CompactionRequest cr) and blocks store write operations ). CompactionRequest, which is used as the call of a thread to perform a specific compact action.

 

3. The internal scan of the Server region first calls HRegionInterface to open a server scaner (public long opentracing (final byte [] regionName, final Scan scan )), return the scaner Id and call the Result next (final long scannerId) of the HRegionInterface according to the scanerId to obtain the Result HRegionServer. next (scanId) region. getCoprocessorHost (). preScannerNext (s, results, nbRows); RegionScannerImpl. next (List results) region. getCoprocessorHost (). postScannerNext (s, results, nbRows); internalscaninternalresults, unlike the client's scaner, which operates RowResults, the server operates HStoreKeys and byte [] ----- regionkeys, ----- RegionScannerImpl (KeyValueHeap storeHeap, Scan scan) --- requests an instance and Adds all the storeScaner of the family to the heap in the constructor. getask( scan, entry. getValue (); this. storeHeap = new KeyValueHeap (scanners, comparator); ----- call next (List <KeyValue> outResults, int limit), ---- startRegionOperation (); add a read lock for the region operation (ensure that only the read lock can be applied during the region operation, but not the write lock, to ensure data consistency) --- MVCC ----- nextInternal (limit) -- iterate peekRow () --- keyvalueiterator. peek () ---- peek () in KeyValueHeap --> call storetimeout. peek () --> Scanners in the Store <MemStoreScaner StoreFileScaners>. the getScannersNoCompaction will be called in the Structure Function of peek () // storegion to obtain the memstoreiterator and hfilestoreiterator ------- closeRegionOperation () under the store; to release the read lock for the operation on the region.

 

 

Keyvaluescan( Interface Class), KeyValueHeap, KeyvalueScaner (KeyValue) merging, merging owned stores at the region level, merging memstore at the Store level and storefilesstoreiterator under it, scan memstore (including snapshot) and StoreFiles. If booleamfilter is set, booleamfilterMemStoreScaner is used. scaner Based on memstore is used to implement the specific KeyValue peek operation StoreFileScaner, and scaner Based on storeFiles, storeFile is used for the peek operation of the specific KeyValue. reader (HFileReaderV2) reads keyValue scan and memsto from StoreFiles The use of re and blockcache in combination with BlockCache is at the HFile level. When reading HFile, the Block Cache part will be checked first. If cache block is set, read the block directly from the cache. Otherwise, read the block from HFile. For details, see HFileReaderV2.readBlock.

 

4. Leases: When performing a row operation, follow the normal process and obtain the lock before the row operation, after the row operation is complete, release the lock. At the same time, the Lease thread automatically obtains the lease from the queue according to the leaseCheckFrequency frequency. If the leasePeriod is exceeded, the lease is released, and call LeaseListener to implement the lease expiration callback action, such as releasing the lock. LeaseListener, RowLockListener, ScannerListener. The following is the code used to obtain and release the row lock in HRegion. --- internalObtainRowLock (CountDownLatch (1), await) while (true) {CountDownLatch existingLatch = lockedRows. putIfAbsent (rowKey, rowLatch); if (existingLatch = null) {break;} else {// row already locked if (! WaitForLock) {return null;} try {if (! ExistingLatch. await (this. rowLockWaitDuration, TimeUnit. MILLISECONDS) {return null ;}} catch (InterruptedException ie ){ // Empty }}---- releaserowlock (called by the expired rowlocklistener, countdown) if (! Existinglatch. await (this. rowlockwaitduration, timeunit. milliseconds) {return NULL; lruhashmap, lrublockcache, 5. Wal initializes the wal log during regionserver startup. For example, if server1 is in/hbase /. logs/server1 creates a loghlog. The specific implementation of Wal stores all records of hstore changes and performs regular roll. Each server has an hlog that contains multiple files, after data is flushed to the store file, logs smaller than the lsn (log sequence number) in the store file can be abolished. During the data buffer flush process, roll operations on logs are not allowed, but append operations are allowed. When opening a region, make sure that the seqnum of the current hlog is larger than the largest seq num of the store in the region, otherwise, set the seqnum of hlog to the seqnum of the region, focusing on the following operations: A, append operations (hregioninfo, hlogkey, waledit), buffer to the sync thread (pendingwrites in logsyncer) is used to write data in batches to HDFS. B. the sync operation can be scheduled by a separate thread logsyncer. You can also call append to perform the sync (flush) operation if the tag to be synchronized is set. In the implementation of the thread logsyncer, call the output stream sequencefilelogwriter of HDFS to perform the append operation (hlogflush), and then call sequencefilelogwriter. sync. The sync operation is sequencefile. write. syncfs or fsdataoutputstream. flush operation. During the append call process, specify dosync implementation. If region corresponds to a meta table, perform the flush operation. If deferred_log_flush is not set for this table, perform the flush operation. Deferred_log_flush is specified when the table is defined. The default value is false. If this parameter is set to true, you do not need to confirm that Wal is synchronized (written to the disk) to return. Call time. It is called at internaldelete and internalput of hregion (dosync is true by default ). You can set whether to write Wal logs when writing data in put or delete client APIs, such as put. setwritetowal (false); Delete. setwritetowal (false); logroroller, which periodically rolling multiversionconsistencycontrol for hlog, 6. hregionserverhregionserver, constructor: Create RPC server thread run: A, do pre-registration initializations; when zookeeper is connected to zookeeper, it is blocked to track whether the znode of the master on zookeeper is valid (the node is set by the master to indicate that the master is up ), it is used to track whether the cluster znode on zookeeper is valid and determine whether the cluster is up. R setting). Enable the status of the unassign node on zookeeper for the record system table root and meta tables. Enable memstoreflusher, compactsplitthread, compactionchecker, leaser, hregionthriftserver thread Service B. Report to hmaster, register a temporary server node on zookeeper RS, initialize rootdir, hlogc, register mbeand, and run cyclically. Do metircs, report to master current serverload until the cluster closes 7. hregion. The locks in hregion include: A and region-level close locks, // used to guard closes final reentrantreadwritelock lock = new reentrantreadwritelock ();--- --- Ensure that startregionoperation () and closeregionoperation () are called in other operations (such as compact (compactionrequest), flushcache (), scan and write operations on region ()), blocks close and split region operations. B. Update locks at the hregion level, // stop updates lock private final reentrantreadwritelock updateslock = new reentrantreadwritelock (); ----- in put, append, delete, add a read lock to the increment and add a write lock to internalflushcache to ensure that the fluscache operation is blocked during write operations. C. Row-level locks, private final concurrenthashmap Store, storefile, storefileher, and storeflusher are not completed yet. They need to be supplemented and improved ....

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.