LEVELDB Source Analysis-recover and Repair

Source: Internet
Author: User

LEVELDB persists data to disk as a KV storage engine, and for a storage engine that causes programs to be down and even data files to be corrupted due to some other reason in the stored procedure, the program cannot start again as normal process. It is a very important task to get the most out of the data after encountering these conditions, LEVELDB also provides this work.

Let's start with recover, which is the process that will be called every time the database is started. Its function is to restore the database in operation suddenly for some reason down and at this time the loss of the current state of Leveldb, and memtable even immtable in the data is not persisted to the sstable. We know that LEVELDB is in the way of the Wal, so for the current state in Leveldb is stored in the manifest file, by reading the persisted state, and then combined with the information of the Wal can be restored to the latest state While the data recovery in memtable and immtable is primarily to read data that is not persisted to sstable from log to memtable or sstable. The entire recovery process is as follows:

Status Dbimpl::recover (versionedit*edit) {ENV_-Createdir (dbname_); Status s= Env_->lockfile (Lockfilename (dbname_), &db_lock_); if(!env_->fileexists (Currentfilename (dbname_))) {        if(options_.create_if_missing) {s= Newdb ();//generate a new manifest and current file            returns; }} s= versions_->Recover (); // Restore current version information s= Env_->getchildren (dbname_, &filenames);//get a list of filesVersions_->addlivefiles (&expected);  for(size_t i =0; I < filenames.size (); i++) {        if(Parsefilename (Filenames[i], &number, &type)) {expected.erase (number);//Delete a file that exists            if(Type = = Klogfile && (number >= min_log) | | (Number = =Prev_log)))  Logs.push_back (number); //Store currently existing log files        }    }    if(!expected.empty ()) {//If the file is missing        returnstatus::corruption (BUF); } std::sort (Logs.begin (), Logs.end ()); //Sort log Files     for(size_t i =0; I < logs.size (); i++) {s= Recoverlogfile (Logs[i], edit, &max_sequence);//Redo Log OperationsVersions_->markfilenumberused (Logs[i]); }    if(S.ok ()) {if(Versions_->lastsequence () <max_sequence) {Versions_-setlastsequence (max_sequence); }    }}

Let's take a closer look at the process of recovering version information from the manifest file first

Status Versionset::recover () {//Read "Current" file, which contains a pointer to the current manifest fileStatus s = readfiletostring (Env_, Currentfilename (dbname_), &Current ); //generate the file name of the current manifestSTD::stringDscname = Dbname_ +"/"+Current ; Sequentialfile*file; S= Env_->newsequentialfile (Dscname, &file); if(!S.ok ()) {        returns; }    //various types of initialization    {        //read each Versionedit one by one and rebuild, record the log_num of the response, etc.         while(Reader. Readrecord (&record, &scratch) &&S.ok ()) {s=Edit.            Decodefrom (record); if(S.ok ()) {if(Edit.has_comparator_ &&Edit.comparator_! = Icmp_.user_comparator ()Name ()) {                }            }            if(S.ok ()) {Builder. Apply (&edit); }            if(edit.has_log_number_) {Log_number=Edit.log_number_; Have_log_number=true; }            if(edit.has_prev_log_number_) {Prev_log_number=Edit.prev_log_number_; Have_prev_log_number=true; }            if(edit.has_next_file_number_) {Next_file=Edit.next_file_number_; Have_next_file=true; }            if(edit.has_last_sequence_) {last_sequence=Edit.last_sequence_; Have_last_sequence=true;    }}} delete file; File=NULL; if(S.ok ()) {//judgment and handling of NUM,SEQ, etc. obtained at the time of recoverymarkfilenumberused (Prev_log_number);    Markfilenumberused (Log_number); }    if(S.ok ()) {Version* v =NewVersion ( This); Builder. SaveTo (v); //Store A full-volume version//Calculate compaction-related dataFinalize (v); Appendversion (v); //add version to the version list and then maintain the current version information based on the recovered informationManifest_file_number_ =Next_file; Next_file_number_= Next_file +1; Last_sequence_=last_sequence; Log_number_=Log_number; Prev_log_number_=Prev_log_number; }    returns;}

After restoring the basic information of the current version, the log operation can recover the data in memory, and redo log operation reads the operation recorded in the log, and then writes the read operation back to the LEVELDB.

Status dbimpl::recoverlogfile (uint64_t log_number, Versionedit*Edit, SequenceNumber*max_sequence) {     while(Reader. Readrecord (&record, &scratch) &&Status.ok ()) {        if(Record.size () < A) ;//size is not correctWritebatchinternal::setcontents (&batch, record); Status= Writebatchinternal::insertinto (&batch, mem);//Insert        if(Last_seq > *max_sequence) {            *max_sequence =Last_seq; }        if(Mem->approximatememoryusage () >options_.write_buffer_size) {Status= Writelevel0table (mem, edit, NULL);//written sstable        }    }    if(Status.ok () && mem! =NULL) {Status=writelevel0table (Mem, edit, NULL); //written sstable    }    returnstatus;}

Recover is a relatively lightweight recovery in leveldb, primarily to recover data and state that leveldb not persisted to disk, and is invoked every time it is started. So what if our database is hurt a lot more? For example sstable file corruption or even lost, manifest file is missing. Then this time must use the following repair, this is when the LEVELDB does not start normally need to be done manually.

Repair mainly contains the following processes to complete the repair of the data:

FindFiles ();               // Traverse Data Directory to parse file name to manifest, log, sstable   // parsing log files are born into sstble, mainly used in the convertlogtotable,    ExtractMetaData ();        /* iterate through parsing the scanned sstable and re-manifest the Filemeta information for later use,
                            If there is unresolved data in the parsing process, discard the unresolved data and generate a new sstable     *    /Writedescriptor ();       // Reset the manifest file number to 1 and generate the latest manifest to log it to the current file

Convertlogfilestotables mainly uses a convertlogtotable function similar to the Recoverlogfile function, Their main difference is that Recoverlogfile uses LEVELDB's Global environment memtable and immtable when converting log operations, so recover data is likely to persist to sstable, While some are left in the memtable in memory, Convertlogtotable creates a new local memtable to operate and then persists the data to sstable. The process of recovering data from log must be persisted to sstable.

We then focus on the process of scantable the ExtractMetaData function to the scanned sstable file individually:

voidscantable (uint64_t number) {std::stringfname = Tablefilename (dbname_, number);//generate LDB file nameStatus status = Env_->getfilesize (fname, &t.meta.file_size);//get file size, otherwise try SST file suffix name    if(!Status.ok ()) {        //slightly, similar to the above    }    if(!status.ok ()) {//If you can't get to the file size, the file fails to recover and puts it into lostArchivefile (Tablefilename (dbname_, number));        Archivefile (Ssttablefilename (dbname_, number)); return; }    //traverse the file for metadata information.iterator* iter =Newtableiterator (T.meta);  for(Iter->seektofirst (); Iter->valid (); iter->Next ()) {Slice key= iter->key (); if(! Parseinternalkey (Key, &parsed)) {            Continue; } counter++; if(empty) {//The first key record asks the minimum key of the fileEmpty =false;        T.meta.smallest.decodefrom (key); } //Otherwise, replace the last set of the largest key, so that the last one to go through to retain is the finalT.meta.largest.decodefrom (key); if(Parsed.sequence >t.max_sequence) {t.max_sequence=parsed.sequence; }    }    //if the traversal process resolves key without failure, store metadata directly, otherwise traverse sstable to generate a new sstable and delete the old    if(Status.ok ()) {tables_.push_back (t); } Else{repairtable (fname, t);//repairtable Archives input file.    }}

If the data is corrupted, you must rebuild the sstable, which is the following process:

voidRepairtable (ConstSTD::string&src, tableinfo t) {    //Traverse sstable to generate a new sstable and delete the old//generate a new fileSTD::stringcopy = Tablefilename (Dbname_, next_file_number_++); Writablefile*file; Status s= Env_->newwritablefile (Copy, &file); if(!S.ok ()) {        return; } Tablebuilder* Builder =NewTablebuilder (options_, file); //Copy data.iterator* iter =Newtableiterator (T.meta); intCounter =0;  for(Iter->seektofirst (); Iter->valid (); iter->Next ()) {Builder->add (Iter->key (), iter->value ()); Counter++;    } Delete iter; Archivefile (SRC); //move old files to lost    if(Counter = =0) {Builder->abandon ();//Nothing to save}Else{s= builder->Finish (); if(S.ok ()) {t.meta.file_size= builder->FileSize ();    }} Delete builder; Builder=NULL; if(S.ok ()) {s= file->Close ();    } Delete file; File=NULL; if(Counter >0&&S.ok ()) {std::stringOrig =tablefilename (dbname_, T.meta.number); S= env_->renamefile (copy, orig); if(S.ok ()) {tables_.push_back (t); }    }    if(!S.ok ()) {ENV_-deletefile (copy); }}

In the process of recover and repair we found that the maintenance and management of version is very important, originally did not intend to analyze this part, but now it seems to be a careful combing or is very necessary, the next article is expected to introduce this part of the content.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.