Leveldb source code analysis-15

Source: Internet
Author: User
Document directory
  • 9.4 Version Control
  • 9.5 dB Interface
  • 9.6 dbimpl class
9 leveldb framework 2

9.4 Version Control

After a compaction is executed, leveldb creates a new version based on the current version, and the current version becomes a historical version. Also, if you create an iterator, the version that the iterator is attached to will not be deleted by leveldb.
In leveldb, version represents a version that contains information about all files on the current disk and memory. Among all versions, only one is current.
Versionset is a collection of all versions. It is a version management organization.
The previous versionedit record changes between versions, which is equivalent to Delta increments, indicating how many files are added and deleted. That is to say: version0 + versionedit --> Version1.
Every time a file changes, leveldb records the changes to a versionedit variable, applies the changes to the current version through versionedit, and takes the current version snapshot, that is, save the DB metadata to the manifest file.
In addition, the manifest file organization is written in the form of versionedit. It is a log file format and reads and writes using log: writer/reader. A versionedit is a log record. 9.4.1 versionset

Like dbimpl, we will first know version and versionset.
Let's take a look at the version members:

STD: vector <filemetadata *> files _ [config: knumlevels]; // sstable file list // next fileto compact based on seek stats. the next file filemetadata * file_to_compact _; int file_to_compact_level _; // The level and compaction score of the next file to be compact. // score <1 indicates compaction is not urgent. these fields initialize double compaction_score _; int compaction_level _ in finalize _;

It can be seen that a version is a collection of sstable files and the compact state it manages. Version uses version * Prev and * Next pointers to form a version two-way circular linked list. The header pointer is in versionset (initially pointing to itself ).
The following are the versionset members. It can be seen that in addition to managing all sstable files through version, it also cares about manifest file information and controls log file numbers.

// = The first group, directly from dbimple. The constructor passes in env * const env _; // const STD: String dbname _; const options * const options _; tablecache * const table_cache _; // table cacheconst internalkeycomparatoricmp _; // = second group, database metadata-related uint64_t next_file_number _; // log file number uint64_t manifest_file_number _; // manifest File No. uint64_t last_sequence _; uint64_t log_number _; // log no. uint64_t prev_log_number _; // 0 or backingstore for memtable being compacted // = group 3, writablefile * descriptor_file _ related to the menifest file; log: writer * descriptor_log _; // = Group 4, Version Management Version dummy_versions _; // versions bidirectional linked list head. version * Current _; // = dummy_versions _. prev _ // level start key of the next compaction, empty string or valid internalkeystd: stringcompact_pointer _ [config: knumlevels];

About version control, you can get to know the functions and management scope of version and versionedit. Detailed function operations are coming soon. 9.4.2 versionedit

The deCODE/encode of manifest in leveldb is completed through versionedit. The menifest file stores the management metadata of leveldb. The name versionedit is very interesting. Every compaction is like generating a new DB version, and the corresponding menifest stores the DB metadata of this version. Versionedit does not operate on files, but only prepares data for reading and writing the manifest file and parses the database metadata from the read data.
Versionedit has two functions:
1 when there is an incremental change between versions, versionedit records this change;
2. When writing data to manifest, the DB metadata of current version is saved to a versionedit, and then organized into a log record to write data to the file;
After learning about the role of versionedit, let's take a look at the function interfaces exported by this class:

Void clear (); // clear information void setxxx (); // a series of set functions, set information // Add sstable file information, requirements: DB metadata has not been written to the disk manifest file // @ level :. SST file level; @ file number-used as the file name @ size file size // @ smallest, @ largest: the maximum and minimum keyvoid AddFile (INT level, upload File, uint64_t file_size, constinternalkey & smallest, const internalkey & largest) void deletefile (INT level, uint64_t file) // delete the file void encodeto (STD :: string * DST) const // enable the information Code to status decodefrom (const slice & SRC) in a string) // decode the DB metadata from the slice /// ===============================the following is a member variable, this allows you to gain a glimpse of the contents of the DB metadata. Typedef STD: Set <STD: pair <int, uint64_t> deletedfileset; STD: String comparator _; // key comparator name uint64_t log_number _; // log number uint64_t prev_log_number _; // The previous log number uint64_t next_file_number _; // The next file number sequencenumber last_sequence _; // The Last seqbool has_comparator _; // whether comparatorbool has_log_number _; // whether there is log_number_bool has_prev_log_number _; // whether there is prev_log_number_bool has_next_file_number _; // whether there is next_file_number_bool has_last_sequence _; // whether there is a condition :: vector <STD: pair <int, internalkey> compact_pointers _; // delete the file set STD: vector <STD: pair <int, filemetadata> new_files _; // new file set

The functions of the set series are very simple, that is, the corresponding information is set according to the parameters.
The AddFile function generates a filemetadata object based on parameters and adds the information of the sstable file to the new_files _ array.
The deletefile function adds the file specified by the parameter to deleted_files;
The setcompactpointer function adds the compact point specified by {level, key} To compact_pointers.
The decode and encode functions are executed for serialization and serialization. Based on the code, we can understand the storage format of manifest files. The serialization function logic is intuitive. 9.4.3 manifest File Format

As mentioned above, the manifest file records the management metadata of leveldb. What exactly does this metadata contain? The following is a list.
First, use the coparator name, log number, the previous log number, the next file number, and the previous serial number. These are important information used by logs and sstable files. These fields do not necessarily exist.
Leveldb writes a varint number to mark the field type before writing each field. Read this field and parse the following information based on the type. There are nine types:
Kcomparator = 1, klognumber = 2, knextfilenumber = 3, klastsequence = 4,
Kcompactpointer = 5, kdeletedfile = 6, knewfile = 7, kprevlognumber = 9
// 8 was used for large value refs
Eight of them are used separately.
The second is the compact point. There may be multiple write points in the format of {kcompactpointer, level, internal key }.
The file is deleted. There may be multiple files in the format of {kdeletedfile, level, file number }.
There may be multiple new files in the format
{Knewfile, level, file number, file size, Min key, max key }.
It is a newly added file set for version changes, and manifest snapshot is a collection of all sstable files included in this version.
An image is shown in 9.3-1.

Figure 9.3-1

The numbers are in the varint storage format, and the strings indicate the length based on varint, followed by the actual string content.

9.5 dB interface 9.5.1 interface functions

In addition to the DB class, leveldb also exports C-language interfaces: interfaces and the current C. H & C. CC. It is actually a layer encapsulation of leveldb: DB.
DB is a persistent ordered map {key, value}, Which is thread-safe. DB is just a virtual base class. Let's take a look at its interface:
The first is a static function that opens a database and returns OK. The opened dB pointer is saved in * dbptr. after use, the caller needs to call Delete * dbptr to delete it.
Static status open (const options & options, const STD: string & name, DB ** dbptr );
The following are pure virtual functions, and there are two global functions. Why are they not static functions like open.
Note: You can consider setting options. Sync = true in several update interfaces. In addition, although it is a pure virtual function, leveldb provides the default implementation.

// Set dB item {key, value} virtual status Put (const writeoptions & options, const slice & Key, const slice & Value) = 0; // Delete the "key" in the database. If the key does not exist, the system returns the virtual status Delete (const writeoptions & options, const slice & Key) = 0; // update the virtual status write (const writeoptions & options, writebatch * updates) = 0; // obtain the operation. If the database contains a "key", the result is returned, status: isnotfound () virtual status get (const readoptions & options, const s Lice & Key, STD: string * value) = 0; // return the iterator allocated by heap, access the DB content, and the returned iterator is in the invalid position. // before use, the caller must call seek first. Virtual iterator * newiterator (const readoptions & options) = 0; // return the handle of the current dB status. iterator created with the handle sees a stable snapshot of the current dB status. When it is no longer in use, call releasesnapshot (result) virtual const snapshot * getsnapshot () = 0; // release the obtained database snapshot virtual voidreleasesnapshot (const snapshot * snapshot) = 0; // In this way, the DB implementation can display their attribute states. if "property" is valid, // set "* value" to the current State value of the property and return true; otherwise, false is returned. // valid attribute names include: //> "leveldb. num-files-at-level <n> "-returns the number of files in level <n>. // <n> indicates the ASCII value of level (e.g. "0 "). //> "leveldb. stats "-return the multi-row string that describes dB internal operation Statistics //>" leveldb. sstab Les "-returns a multi-line string that describes all sstable virtual bool getproperty (constslice & property, STD: string * value) that constitutes the DB content = 0; // "sizes [I]" stores "[range [I]. start .. range [I]. the file space used by the key in limit. // note: the approximate space used by the file system is returned. // if the user data is compressed by 10 times, the returned value is 1/10 of the user data. // The result may not contain the size of the recently written data. virtual voidgetapproximatesizes (const range * range, int N, uint64_t * sizes) = 0; // underlying storage of compactkey range [* begin, * end, deleted and overwritten versions will be discarded // data will be reorganized to reduce Access overhead // Note: users who do not know the underlying implementation should not call this method. // Begin = NULL is treated as the key before all the keys in the database. // end = NULL is treated as the key after all the keys in the DB. // The following call will compact the entire DB: // DB-> compactrange (null, null); Virtual void compactrange (constslice * begin, const slice * End) = 0; // The last two global functions -- delete and fix dB // be careful. This method will delete all the content of the specified dB status destroydb (const STD: string & name, const options & options); // if the database cannot be opened, you may call this method to try to correct as much data as possible. // data may be lost, so be careful when calling status repairdb (const STD: string & dbname, const options & options );
9.5.2 class diagram

Several functional classes will be designed here, as shown in Figure 9.5-1. In addition, there are several major components we have discussed earlier: read and write operations logs, memory memtable, internalfilterpolicy, internal key comparison, and sstable read and build classes. 9.5-2.

Figure 9.5-1

Figure 9.5-2

There are many classes involved here. snapshot is a memory snapshot, version, and versionset class.

9.6 dbimpl class

Before continuing down, it is necessary to first understand the specific implementation class dbimpl. It mainly refers to its member variables, which indicate which components it uses.
In the entire code, this is a huge thing. Now we just need to make the first photo first. The road to the next is still long. Let's take a look at the class members.

// = The first group. They will not change after initialization in the constructor. Internalkeycomparator and internalfilterpolicy have been analyzed in memtable and filterpolicy respectively. Env * const env _; // environment, which encapsulates system-related file operations, threads, and so on. Const internalkeycomparatorinternal_comparator _; // key comparator const internalfilterpolicyinternal_filter_policy _; // filter policy const options _; // options _. comparator = & internal_comparator _ bool owns_info_log _; bool owns_cache _; const STD: String dbname _; // = the second group, only two. Tablecache * table_cache _; // table cache, thread-safe filelock * db_lock _; // lock the DB file and persistent State until the leveldb process ends. // = group 3, mutex _ contained status and member port: mutex _; // mutex lock port: atomicpointershutting_down _; port: condvar bg_cv _; // when the background work ends, the memtable * mem _; memtable * Imm _; // memtablebeing compacted port: atomicpointerhas_imm _; // bgthread is used to check whether non-null Imm _ // these three are log-related writablefile * logfile _; // Lo G file uint64_t logfile_number _; // Log File No. Log: writer * log _; // log writer // = Group 4, irregular STD :: deque <writer *> writers _; // writers queue. writebatch * tmp_batch _; snapshotlist snapshots _; // snapshot list // setof table files to protect from deletion because they are // part ofongoing compactions. STD: Set <uint64_t> pending_outputs _; // list of files to be copact, to prevent accidental deletion of bool bg_compaction_scheduled _; // whether compaction is scheduled or run in the background OK? Status bg_error _; // is there a background error in paranoid mode? Manualcompaction * manual_compaction _; // Manual compaction information compactionstatsstats _ [config: knumlevels]; // compaction status versionset * versions _; // multiple dB files, another giant

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.