Implementation of the Bitcask storage model-merge and hint files

Source: Internet
Author: User

In the implementation-basic framework of the Bitcask storage model, we learned about how data is stored in the Bitcask storage model, how memory indexes are organized, and how to use caching to accelerate data reading. In addition, the Bitcask storage model puts forward to reduce data redundancy and use hint files to accelerate the creation of memory indexes, and then we look at the concrete implementation of merge and how to accelerate index creation with hint files.

Merge

Bitcask is a journaled storage model that append to disk for new and changed operations, and disk utilization increases as operations increase.

As an example, the key and corresponding Val are stored at the beginning of the 25.W, after the data changes, and through append storage in 26.W, in-memory indexes are modified accordingly, then the key data in 25.W becomes obsolete data.

How to clean up these redundant data? Because the memory index points to the currently valid key data on the disk, we can use this feature to clean up the redundant data. In the data store directory, we define a file:

xxx.m: Data storage file that stores data after merge, reducing obsolete data compared to XXX.W files

Assuming that a record on a disk is a recording, the process of cleaning up redundant data is as follows:

1. Loop through the record in the. w file, and check the corresponding memidx_t according to the key in the record.

(Refer to the "Bitcask storage model implementation-BASIC framework" article, by the key to calculate the hash value can obtain the corresponding memidx_t)

2. If Record.ifileno = = Stmemidx.ifileno && Record.ioffset = = Stmemidx.ioffset, the record is valid data

3. Record records that meet the above conditions in a. m file, update the Ifileno, Ioffset in the memory index memidx_t, and skip the record that does not meet the criteria

4. After traversing a. w file, delete it

The above process filters out the excess data in. W and migrates the valid data to the. m file because the data is changed in the disk position, so modify the in-memory index data accordingly. As there are more update operations, redundant data will also be present in the. m file, so after you have done this, you will also traverse the existing. m file for the same operation.

Hint file

In the above mentioned Bitcask implementation, memory index has a great effect, not only accelerates the data read, in the merge process also with its standard, determine which data is valid data, which data can be deleted. However, the memory data is volatile, lost when the program or machine restarts, if you want to rebuild the memory index data, to traverse all the. m and. w files on the disk so that it is inefficient, the hint file is a way to accelerate the creation of memory indexes:

xxx.h: Accelerates the hint file created by the memory index, saves only key/vallen/ifileno/ioffset, and does not save Val data compared to. W,. m files

In the process of merge, when the valid record data is checked, the. m file is written, the key/keylen/vallen/ifileno/ioffset indicator of the record is written to the. h file. When the program restarts, traverse the. h file to rebuild the memory index data.

Implementation of the Bitcask storage model-merge and hint files

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.