Bitcast is a kind of log-based storage system of health value pairs, which was traced back to Riak distributed database.
Currently, the Berkeley Db,tokyo Cabinet,innostore uses this storage engine. Using this engine has the following advantages:
- Lower read and write latency.
- Relatively high random write throughput rates.
- The ability to control larger databases.
- Easy backup and recovery.
- Relatively simple and easy to understand.
- Predictable high access pressure conditions.
Bitcast only supports append operation (append-only), that is, all write operations only append without modifying the old data, each file has a certain size limit, when the file is added to the corresponding size, a new file will be generated, the old file is read-only and not written. At any given moment, only one file is writable and is used for appending data, known as active data file. Other files that have reached the size limit are called (older data file).
Active data files support only append writes, so all write operations are serialized without random disk positioning. The health value pairs are written in the following format:
The deletion of a health value pair is also written in an append-write manner to the active data file, and the actual deletion is performed in the next data merge.
The merge operation is to periodically scan all old data files and generate new data files (essentially merging multiple operations of the same key. )
In the Bitcast model, the index structure of the hash table is used. In addition to the data files stored on the disk, there is a hash table in memory, the key value in the hash table can quickly locate the data on the disk. The approximate structure is as follows:
This structure of the hash table includes three information for locating the data value, which is the file ID number (file_id), the position of value in the file (Value_pos), the size of the value (VALUE_SZ), so we read the File_ The ID corresponds to the file's Value_pos beginning with the VALUE_SZ byte, and we get the value we need. The entire process is as follows:
From the above we can know that the index of the hash table in memory, if a system restart, you need to scan the disk to rebuild the hash table, if the amount of data is very large, this process is very time-consuming. Therefore, a hint file is also generated in the Bitcast model, where the data structure is very similar to that in the disk, but instead of storing the specific value values, the location information of the value is stored. Its structure is as follows:
In this way, when rebuilding the hash table, you do not need to scan all the data files, but only need to read and rebuild the data row in the hint file. Greatly improves the speed of restarting the database with data files.
can refer to the source code
Beansdb.googlecode.com/files/beansdb-0.5.2.tar.gz
Storage Engine-bitcast