MongoDB Wiredtiger Storage Engine Implementation principle--copy on write manage modify operations, Btree cache

Source: Internet
Author: User

transferred from: http://www.mongoing.com/archives/2540Mongodb-3.2 has been wiredtiger set up for the default storage engine, recently by reading the Wiredtiger source code (without understanding its internal implementation, reading code is very difficult, the code is too large, it is strongly recommended that the official more than the introduction of the article), Clarify the general principle of wiredtiger, and simple summary, do not guarantee that the content is correct, if there are questions please point out, welcome to discuss the exchange.

Follow the default configuration of MongoDB,? Wiredtiger writes are written to the cache and persisted to the Wal (write ahead log), checkpoint once every 60s or log file reaches 2GB, persisting the current data and generating a new snapshot. When the Wiredtiger connection is initialized, the data is first restored to the latest snapshot state, and then the data is recovered based on the Wal to ensure storage reliability.

The Wiredtiger cache is organized in btree manner, each btree node is a page,root page is the root node of btree, internal page is the middle index node of btree, Leaf The page is a leaf node that actually stores data, and the Btree data is loaded or written to disk on demand from the disk in page units.

The Wiredtiger uses copy on write to manage the Modify operation (insert, UPDATE, delete), the modification operation will be slow in the cache, persist, the modification will not be on the original leaf page, Instead, it writes the newly assigned page, and each time checkpoint produces a new root page.

Checkpoint, Wiredtiger needs to persist the Btree modified page, each btree a physical file on the disk, btree each page in the file extent form (offset by file + Size identity) is stored, and a checkpoit contains the following meta data:

    • Root page address, which consists of the file offset,size and the checksum of the content
    • alloc extent list address, which stores the newly assigned extent lists from the last checkpoint
    • Discard extent list address, storing extent lists discarded from the last checkpoint
    • Available extent list address, store assignable extent lists, only the most recent checkpoint contain the list
    • File size to revert to the checkpoint state, truncate

A typical Wiredtiger database storage layout in MongoDB is as follows:


$tree.├── journal│   ├── WiredTigerLog.0000000003│   └── WiredTigerPreplog.0000000001├── WiredTiger├── WiredTiger.basecfg├── WiredTiger.lock├── WiredTiger.turtle├── admin│   ├── table1.wt│   └── table2.wt├── local│   ├── table1.wt│   └── table2.wt└── WiredTiger.wt
    • Wiredtiger.basecfg Storing basic configuration information
    • Wiredtiger.lock to prevent multiple processes from connecting to the same Wiredtiger database
    • TABLE*.WT storing data for individual tale (tables in a database)
    • WIREDTIGER.WT is a special table that stores metadata information for all other table
    • Wiredtiger.turtle storing metadata information for WIREDTIGER.WT
    • Journal Store write ahead log

The approximate flow of a checkpoint is as follows

Checkpoint all table once, checkpoint metadata for each table is updated to WIREDTIGER.WT
To checkpoint the WIREDTIGER.WT, update the table checkpoint metadata to a temporary file WiredTiger.turtle.set
Rename WiredTiger.turtle.set to Wiredtiger.turtle
If the above process fails, Wiredtiger will first restore the data to the latest snapshot state at the next connection initialization, and then recover the data based on Wal to ensure storage reliability.

Resources
      1. Wiredtiger Official documents
      2. Mongodb Internal
      3. Wiredtiger Block Manager Overview

MongoDB Wiredtiger Storage Engine Implementation principle--copy on write manage modify operations, Btree cache

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.