Region recovery Logic

Source: Internet
Author: User

Table of Contents

Region recovery Logic
Configuration parameters
Region recovery Logic

After the regionserver is down, the region on which it is deployed will be reassigned by master, and since some of the memstore data in the region may not have been flush before the outage, this part of the data needs to be restored. The restore process is implemented by reading the Hlog file.

To date (1.0), HBase has declared two region recovery strategies, based on log_splitting and Log_replay, respectively. Log_replay is a new strategy introduced since the 0.98 release, and its relative log_splitting strategy has the following advantages (refer to HBASE-7006):

(1) The process of creating and reading recovered.edits files is omitted;

(2) You can still perform write operations on the region during recovery.

Therefore, this article mainly revolves around the Log_replay strategy to describe.

Hmaster by listening to the Zookeeper/hbase/rs node to obtain the relevant regionserver of the outage events, so that the corresponding callback processing, processing logic is encapsulated by the Servershutdownhandler class, the details are as follows:

  1. First, find the Regionserver with metadata information, see which region it was deployed before, and mark the region as recovering state

    For each region record to be recovered, Zookeeper creates a corresponding/hbase/recovering-regions/[region]/[failed-regionserver] node to store the SequenceID when it was last executed flush. This process is done on the master side, through the Masterfilesystem Preparelogreplay method, because Regionserver will communicate with master once every 3 seconds by default ( Controlled by the Hbase.regionserver.msginterval parameter), so SequenceID information can be obtained from the communication content.

  2. Redistribution of region deployed on target Regionserver

    The allocation process is still performed on the master side, through the AssignmentManager assign (list

  3. Commit Logreplayhandler to group the Hlog files on the target regionserver by region and perform log_replay actions for each group

    For each hlog,master to be split, the corresponding Splitlogtask task is generated and the/hbase/splitwal/[hlog] node is created in zookeeper to store it, and the node name is the storage path of Hlog. The content is Splitlogtask object information.

    Although Splitlogtask is generated on the master side, the execution is on the Regionserver side, which is primarily coordinated through zookeeper. Whenever a/hbase/splitwal/[hlog] node is generated, zookeeper notifies all Regionserver nodes for task preemption, and the preemption logic is encapsulated through a splitlogworker thread, with the following specifics:

    First, the data content of the target ZK node is read, the version information and Splitlogtask object information is obtained, and then the Splitlogtask object is judged whether it is in unassigned state. If it is not stated that the task has been preempted by another regionserver, the state of Splitlogtask is modified to own, and the data content of the target node is reset by Zookeeper's SetData (Path,data,version) method. If the SetData method finds that the current version does not match the version of the target data during execution, the task is preempted by other regionserver and discards processing. The Regionserver node that grabbed the task started splitting the target hlog by opening the Walsplitterhandler thread.

    Walsplitter threads are implemented on an implementation based on the producer-consumer model, which encapsulates the buffers production queue to store the Hlog.entry entities that need to be recovered. A Splitlogfile production method is provided to add the log records in the target Hlog to the buffers collection that meet the following requirements:

    Hlogkey Logseqnum Property value > The last time the flush operation was performed in region Seqid

    Where the hlogkey belongs to region can be determined by its Encodingregionname attribute value, and seqid of the last flush is recorded in zookeeper/hbase/recovering-regions /[region]/[failed-regionserver] Node (created in step 1).

    After the buffers collection produces data, the Walsplitterhandler thread opens 3 threads by default to consume the contents of its data ( Hbase.regionserver.hlog.splitlog.writer.threads parameter control), each sub-thread acts as the consumer's role and is encapsulated by the Writerthread.

    The buffers collection is organized by the following data structures:

    Map<regionname, regionentrybuffer>

    In the consumer process, the regionentrybuffer of the largest amount of data is picked up from the collection and passed to Logreplayoutputsink for processing (by calling its Append method), the processing logic is roughly as follows:

      • Append log records from Regionentrybuffer to the Servertobufferqueuemap collection

        The storage structure of the Servertobufferqueuemap collection is roughly as follows: Servername#tablename-queue<row>

        The key can be used to locate the target table on the target regionserver, and value is the log data to perform the Log_replay operation on the target table.

      • Select the record with the most row number from the Servertobufferqueuemap collection and make the following judgments:

        (1) Whether the row number is greater than the hbase.regionserver.wal.logreplay.batch.size parameter value;

        (2) Total data volume for all row is greater than hbase.regionserver.hlog.splitlog.buffersize * 0.35

        If any one of the above conditions is met, the data is cached in the Servertobufferqueuemap collection before it is processed when the total amount of data reaches a certain scale.

      • Perform a log_replay operation on the data filtered successfully in the previous step

        The replay method of the remote Rsrpcservices service is executed through RPC requests to pass the log data to be synchronized in the past for data recovery.

Configuration parameters
    1. Hbase.master.distributed.log.replay

      Whether Log_ is enabled Replay policy, enable prerequisite: Hfile.format.version property value is not less than 3.

    2. hbase.hlog.split.skip.errors

      The default value is False, indicating that if there is a problem during the Hlog read process, the exception information is printed, and discard the next processing.

      If you set its property value to True, the problem occurs when you print the error message first, and then move the Hlog file that has the problem to the/hbase/.corrupt directory and continue with the next processing.

    3. Hbase.splitlog.report.interval.loglines

      Default value is 1024, Indicates that output information is printed once per 1024 rows of Hlog log records are processed.

    4. hbase.regionserver.hlog.splitlog.buffersize

      The default value is 128M, which means each log_ The total number of logs for the replay operation should be greater than 128M * 0.35 (fixed percentage) or satisfy the hbase.regionserver.wal.logreplay.batch.size parameter.

    5. hbase.regionserver.wal.logreplay.batch.size

      The default value is 64, which means that each execution Log_ The replay operation should contain at least 64 log records or satisfy the hbase.regionserver.hlog.splitlog.buffersize parameter.

    6. hbase.regionserver.hlog.splitlog.writer.threads

      Use this parameter to control the number of walsplitter.writerthread threads.

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Region recovery Logic

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.