HBase Write Ahead Log (WAL), hbasewal
HBase data write operations are first recorded in HLog and then truly written into MemStore.
The former is a write-friendly format, and the latter is a query-friendly format. Therefore, the former has a higher throughput and a high write success rate, which improves the system reliability. The "Basic" feature allows you to continue unfinished data update operations after a crash.
API
WAL interface provides external WAL APIs.
The most common method is append ().
long append(HRegionInfo info, WALKey key, WALEdit edits, boolean inMemstore) throws IOException;
It is appended to a series of WALEdit.
API caller
Each HBase region has a separate WAL interface instance:
HBase client = Protobuf protocol ==> HRegionServer.exe cRegionServerService () => MultiRowMutationProtos. callMethod () => MultiRowMutationProtos. mutateRows () => MultiRowMutationEndpoint. mutateRows () => HRegion. processRowsWithLocks () => HRegion. doWALAppend () is written to WAL.
HRegion. processRowsWithLocks () is the master method of the HRegion update operation. It drives the process of obtaining, writing, and writing data to MemStore.
Atomicity
To implement atomicity when HBase writes multiple columns in a row, update operations on all columns (that is, all keyvalues) on a row are included in the same WALEdit object:
So the most important member variable in WALEdit is a set of keyvalues (that is, cells:
AbstractFSWAL-provides generic support for file system-based WAL implementation and AbstractFSWAL. findRegionsToForceFlush ()-returns the Region that is completely flushed out of the oldest files in the current WAL instance
The so-called Flush should be to write the Region business data from MemStore to the Store.
If a Region is flushed, its business data has been stored in HFile. Then the Region WAL log (data operation record) does not need to exist and can be deleted to free up disk space.
AbstractFSWAL. findRegionsToForceFlush () is used to locate the Region that can be deleted from Flush-related WAL logs.
1. Find the first file from AbstractFWSAL. byWalRegionSequenceIds.
ConcurrentNavigableMap <Path, Map <byte [], Long> byWalRegionSequenceIds maintains all files of the current WAL, and the Region involved in each file (including the byte [] Name of Region and the sequence id of the last append operation in this Region)
That is, Path (WAL file name) => (byte [] Region name, Long sequence id)
2. Find all of its Region from the first file, which has not been Flush
ConcurrentMap <byte [], ConcurrentMap <byte [], Long> AbstractFWSAL. sequenceIdAccounting. lowestUnflushedSequenceIds maintains the bying of byte [] Region name + byte [] family name to the first (smallest) sequence id not flushed, which is called lowestUnflushedSequenceId. Here, each append operation corresponds to an auto-incrementing sequence id. All the sequence IDs greater than or equal to lowestUnflushedSequenceId, and their corresponding append operations are not flushed.
Therefore, for all the Region involved in the first WAL log file obtained in the first step, and the maximum sequence id of each Region, if the maximum sequece id is greater than the lowestUnflushedSequenceId of the Region, this indicates that the Region WAL log has not been flushed. Then this Region will be included in the findRegionsToForceFlush () result.