LEVELDB Source Analysis-write

Source: Internet
Author: User
Tags compact

Write

LEVELDB provides write and put two interfaces for insert operations, but the put is actually called write implementation, so I'll just parse the write function here:

Status DBImpl::Write(const WriteOptions &options, WriteBatch *my_batch)

First, a writer object is initialized, the writer object is used to encapsulate an insert operation, LEVELDB uses a deque to manage the writer object, and the new writer object is inserted at the end of the deque. If the writer object is not processed and is not in the deque header, it waits:

    Writer w(&mutex_);    w.batch = my_batch;    w.sync = options.sync;    w.done = false;    MutexLock l(&mutex_);    writers_.push_back(&w);    while (!w.done && &w != writers_.front())    {        w.cv.Wait();    }    if (w.done)    {        return w.status;    }

The Makeroomforwrite function is then called to ensure that there is an inserted space in the memtable:

    // May temporarily unlock and wait.    Status status = MakeRoomForWrite(my_batch == nullptr);    uint64_t last_sequence = versions_->LastSequence();    Writer *last_writer = &w;

Next call the Buildbatchgroup function to encapsulate all the writer objects in the Writers_ queue as a writebatch, which means that leveldb actually handles all the current insertions at once:

    if (status.ok() && my_batch != nullptr)    { // nullptr batch is for compactions        WriteBatch *updates = BuildBatchGroup(&last_writer);        WriteBatchInternal::SetSequence(updates, last_sequence + 1);        last_sequence += WriteBatchInternal::Count(updates);

The call function then inserts the KV value into the memtable:

        ADD to log and apply to memtable. We can release the lock//during this phase since &w are currently responsible for logging//and Protec        TS against concurrent loggers and concurrent writes//into MEM_. {mutex_.            Unlock ();            Status = Log_->addrecord (writebatchinternal::contents (Updates));            BOOL Sync_error = false;                if (Status.ok () && options.sync) {status = Logfile_->sync ();                if (!status.ok ()) {sync_error = true; }} if (Status.ok ()) {status = Writebatchinternal::insertinto (Updates, ME            M_); } mutex_.            Lock ();                if (sync_error) {//The state of the log file is indeterminate:the log record we                Just added may or could not show up when the DB is re-opened. So we forceThe DB into a mode where all of the future writes fail.            Recordbackgrounderror (status);        }} if (updates = = Tmp_batch_) tmp_batch_->clear ();    Versions_->setlastsequence (last_sequence); }

Delete the writer objects that have been processed in the queue and send signals to those writer objects so that they can end their tasks:

    while (true)    {        Writer *ready = writers_.front();        writers_.pop_front();        if (ready != &w)        {            ready->status = status;            ready->done = true;            ready->cv.Signal();        }        if (ready == last_writer)            break;    }

If there is a new writer object in the current queue, send the signal to activate the writer object of the first team:

    // Notify new head of write queue    if (!writers_.empty())    {        writers_.front()->cv.Signal();    }    return status;

The Makeroomforwrite function called by the Write function is:

// REQUIRES: mutex_ is held// REQUIRES: this thread is currently at the front of the writer queueStatus DBImpl::MakeRoomForWrite(bool force)

The function will loop all the time, judging each condition and performing the appropriate action until there is enough space in the memtable to insert.

If the number of level0 files exceeds the threshold, and this is the first time this is detected, then sleep1ms:

        else if (            allow_delay &&            versions_->NumLevelFiles(0) >= config::kL0_SlowdownWritesTrigger)        {            // We are getting close to hitting a hard limit on the number of            // L0 files.  Rather than delaying a single write by several            // seconds when we hit the hard limit, start delaying each            // individual write by 1ms to reduce latency variance.  Also,            // this delay hands over some CPU to the compaction thread in            // case it is sharing the same core as the writer.            mutex_.Unlock();            env_->SleepForMicroseconds(1000);            allow_delay = false; // Do not delay a single write more than once            mutex_.Lock();        }

If there is enough space in the current memtable, jump out of the loop:

        else if (!force &&                 (mem_->ApproximateMemoryUsage() <= options_.write_buffer_size))        {            // There is room in current memtable            break;        }

If there is not enough space in the current memtable and immutable memtable is not written out, wait for the compact background thread to complete the compact (immutable memtable requires compact):

        else if (imm_ != nullptr)        {            // We have filled up the current memtable, but the previous            // one is still being compacted, so we wait.            Log(options_.info_log, "Current memtable full; waiting...\n");            background_work_finished_signal_.Wait();        }

If the current memtable space is insufficient, the number of files in Level0 exceeds the threshold, and this is not the first time that this condition is detected, wait for the compact background thread to complete the compact (level0 requires compact):

        else if (versions_->NumLevelFiles(0) >= config::kL0_StopWritesTrigger)        {            // There are too many level-0 files.            Log(options_.info_log, "Too many L0 files; waiting...\n");            background_work_finished_signal_.Wait();        }

If none of the above exists, then it is possible to write the current memtable to immutable memtable and then create a new memtable, which, of course, calls the Maybeschedulecompaction function because of the resulting immutable Memtable Need Compact:

        else {//attempt to switch to a new memtable and trigger compaction of the old assert (ve            Rsions_->prevlognumber () = = 0);            uint64_t New_log_number = Versions_->newfilenumber ();            Writablefile *lfile = nullptr;            s = Env_->newwritablefile (LogFileName (Dbname_, New_log_number), &lfile);                if (!s.ok ()) {//Avoid chewing through file number space in a tight loop.                Versions_->reusefilenumber (New_log_number);            Break            } Delete Log_;            Delete Logfile_;            Logfile_ = Lfile;            Logfile_number_ = New_log_number;            Log_ = new Log::writer (lfile);            Imm_ = mem_; Has_imm_.            Release_store (IMM_);            Mem_ = new Memtable (INTERNAL_COMPARATOR_);            Mem_->ref (); force = false;        Do not force another compaction if has a guest maybeschedulecompaction ();} 

231 Love u

LEVELDB Source Analysis-write

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.