HBase Write Ahead Log (WAL), hbasewal

Source: Internet
Author: User

HBase Write Ahead Log (WAL), hbasewal

HBase data write operations are first recorded in HLog and then truly written into MemStore.
The former is a write-friendly format, and the latter is a query-friendly format. Therefore, the former has a higher throughput and a high write success rate, which improves the system reliability. The "Basic" feature allows you to continue unfinished data update operations after a crash.

API

WAL interface provides external WAL APIs.
The most common method is append ().

long append(HRegionInfo info, WALKey key, WALEdit edits, boolean inMemstore) throws IOException;

It is appended to a series of WALEdit.

API caller

Each HBase region has a separate WAL interface instance:

HBase client = Protobuf protocol ==> HRegionServer.exe cRegionServerService () => MultiRowMutationProtos. callMethod () => MultiRowMutationProtos. mutateRows () => MultiRowMutationEndpoint. mutateRows () => HRegion. processRowsWithLocks () => HRegion. doWALAppend () is written to WAL.

HRegion. processRowsWithLocks () is the master method of the HRegion update operation. It drives the process of obtaining, writing, and writing data to MemStore.

 

 

Atomicity

To implement atomicity when HBase writes multiple columns in a row, update operations on all columns (that is, all keyvalues) on a row are included in the same WALEdit object:

So the most important member variable in WALEdit is a set of keyvalues (that is, cells:

 

AbstractFSWAL-provides generic support for file system-based WAL implementation and AbstractFSWAL. findRegionsToForceFlush ()-returns the Region that is completely flushed out of the oldest files in the current WAL instance

The so-called Flush should be to write the Region business data from MemStore to the Store.

If a Region is flushed, its business data has been stored in HFile. Then the Region WAL log (data operation record) does not need to exist and can be deleted to free up disk space.

AbstractFSWAL. findRegionsToForceFlush () is used to locate the Region that can be deleted from Flush-related WAL logs.

 

1. Find the first file from AbstractFWSAL. byWalRegionSequenceIds.

ConcurrentNavigableMap <Path, Map <byte [], Long> byWalRegionSequenceIds maintains all files of the current WAL, and the Region involved in each file (including the byte [] Name of Region and the sequence id of the last append operation in this Region)

That is, Path (WAL file name) => (byte [] Region name, Long sequence id)

 

2. Find all of its Region from the first file, which has not been Flush

ConcurrentMap <byte [], ConcurrentMap <byte [], Long> AbstractFWSAL. sequenceIdAccounting. lowestUnflushedSequenceIds maintains the bying of byte [] Region name + byte [] family name to the first (smallest) sequence id not flushed, which is called lowestUnflushedSequenceId. Here, each append operation corresponds to an auto-incrementing sequence id. All the sequence IDs greater than or equal to lowestUnflushedSequenceId, and their corresponding append operations are not flushed.

Therefore, for all the Region involved in the first WAL log file obtained in the first step, and the maximum sequence id of each Region, if the maximum sequece id is greater than the lowestUnflushedSequenceId of the Region, this indicates that the Region WAL log has not been flushed. Then this Region will be included in the findRegionsToForceFlush () result.

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.