In the implementation-basic framework of the Bitcask storage model, we learned about how data is stored in the Bitcask storage model, how memory indexes are organized, and how to use caching to accelerate data reading. In addition, the Bitcask storage model puts forward to reduce data redundancy and use hint files to accelerate the creation of memory indexes, and then we look at the concrete implementation of merge and how to accelerate index creation with hint files.
Merge
Bitcask is a journaled storage model that append to disk for new and changed operations, and disk utilization increases as operations increase.
As an example, the key and corresponding Val are stored at the beginning of the 25.W, after the data changes, and through append storage in 26.W, in-memory indexes are modified accordingly, then the key data in 25.W becomes obsolete data.
How to clean up these redundant data? Because the memory index points to the currently valid key data on the disk, we can use this feature to clean up the redundant data. In the data store directory, we define a file:
xxx.m: Data storage file that stores data after merge, reducing obsolete data compared to XXX.W files
Assuming that a record on a disk is a recording, the process of cleaning up redundant data is as follows:
1. Loop through the record in the. w file, and check the corresponding memidx_t according to the key in the record.
(Refer to the "Bitcask storage model implementation-BASIC framework" article, by the key to calculate the hash value can obtain the corresponding memidx_t)
2. If Record.ifileno = = Stmemidx.ifileno && Record.ioffset = = Stmemidx.ioffset, the record is valid data
3. Record records that meet the above conditions in a. m file, update the Ifileno, Ioffset in the memory index memidx_t, and skip the record that does not meet the criteria
4. After traversing a. w file, delete it
The above process filters out the excess data in. W and migrates the valid data to the. m file because the data is changed in the disk position, so modify the in-memory index data accordingly. As there are more update operations, redundant data will also be present in the. m file, so after you have done this, you will also traverse the existing. m file for the same operation.
Hint file
In the above mentioned Bitcask implementation, memory index has a great effect, not only accelerates the data read, in the merge process also with its standard, determine which data is valid data, which data can be deleted. However, the memory data is volatile, lost when the program or machine restarts, if you want to rebuild the memory index data, to traverse all the. m and. w files on the disk so that it is inefficient, the hint file is a way to accelerate the creation of memory indexes:
xxx.h: Accelerates the hint file created by the memory index, saves only key/vallen/ifileno/ioffset, and does not save Val data compared to. W,. m files
In the process of merge, when the valid record data is checked, the. m file is written, the key/keylen/vallen/ifileno/ioffset indicator of the record is written to the. h file. When the program restarts, traverse the. h file to rebuild the memory index data.
Implementation of the Bitcask storage model-merge and hint files