The primary role of the log file in Leveldb is to ensure that data is not lost when a system failure is restored. Because the log file is written before the record is written to the memory memtable, the data in memtable does not have time to dump to the disk's sstable file, even if the system fails. Leveldb can also recover the contents of the Memtable data structure based on the log file, without causing the system to lose data, LEVELDB and bigtable are consistent at this point. "(http://www.cnblogs.com/haippy/archive/2011/12/04/2276064.html)
Preparatory work:
The Log file involves only sequential read, sequential write operations, while other files in LevelDB (such as sstable) also design random reads, which are encapsulated by the author for each type of operation:
1 classSequentialfile {2 Public:3 VirtualStatus Read (size_t N, slice* result,Char* Scratch) =0;4 VirtualStatus Skip (uint64_t N) =0;5 };6 classRandomaccessfile {7 Public:8 VirtualStatus Read (uint64_t offset, size_t N, slice*result,9 Char* Scratch)Const=0;Ten }; One classWritablefile { A Public: - VirtualStatus Append (Constslice& data) =0; - VirtualStatus Close () =0; the VirtualStatus Flush () =0; - VirtualStatus Sync () =0; -};
Operating system-related operations under different systems, the implementation of the various, the author also made a unified interface encapsulation (cross-platform), called "Environment Class" ENV:
1 classENV {2 Public:3 Env () {}4 Virtual~Env ();5 6 //Return A default environment suitable for the current operating7 //System. 8 Staticenv*Default ();9 Ten //specific classes of various file types are implemented based on different operating systems, which are returned through the factory method. One VirtualStatus Newsequentialfile (ConstSTD::string& fname, sequentialfile** result) =0; A VirtualStatus Newrandomaccessfile (ConstSTD::string& fname, randomaccessfile** result) =0; - VirtualStatus Newwritablefile (ConstSTD::string& fname, writablefile** result) =0; - ...... the};
Write
Under the Leveldb namespace, there is a sub-namespace named Log, under which there are two implementation classes for writer and reader. According to the naming rules of the previous sections, writer is actually a builder, and it provides the only AddRecord method for appending operations records.
1Status Writer::addrecord (Constslice&Slice) {2 Const Char* ptr =Slice.data ();3size_t left =slice.size ();4 5 //Fragment the record if necessary and emit it. Note that if slice6 //is empty, we still want to iterate once to emit a single7 //Zero-length Record8 Status S;9 BOOLBegin =true;Ten Do { One Const intleftover = kblocksize-block_offset_;//1. Current block remaining size AASSERT (Leftover >=0); - if(Leftover < Kheadersize)//2. The remaining size is insufficient, occupying a position - { the //Switch to a new block - if(Leftover >0) - { - //Fill the trailer (literal below relies on kheadersize being 7) +ASSERT (Kheadersize = =7); -Dest_->append (Slice ("\x00\x00\x00\x00\x00\x00", leftover)); + } Ablock_offset_ =0; at } - - //Invariant:we never leave < kheadersize bytes in a block. -ASSERT (Kblocksize-block_offset_-kheadersize >=0); - - Constsize_t avail = kblocksize-block_offset_-kheadersize; in Constsize_t Fragment_length = (left < avail)? Left:avail;//3. Space size of the current block storage - toRecordType type;//4. Record Type + Const BOOLEnd = (left = =fragment_length); - if(Begin &&end) { theType =Kfulltype; * } $ Else if(BEGIN) {Panax NotoginsengType =Kfirsttype; - } the Else if(end) { +Type =Klasttype; A } the Else { +Type =Kmiddletype; - } $ $s = Emitphysicalrecord (type, PTR, fragment_length);//5. Writing Files -PTR + =fragment_length; -Left-=fragment_length; theBegin =false; -} while(S.ok () && left >0);Wuyi returns; the}
The memo is as follows:
- The log file is logically divided into blocks, each block size is 32K.
- Each record consists of the record header + record content, where the header size is kheadersize (7 bytes).
- The current block is not large enough to fill the record header with a "\x00\x00\x00\x00\x00\x00" placeholder.
- When a record is not fully recorded by the block, the type information identifies the block information for the record in the current block so that the full record can be stitched up according to the type when reading.
- Emitphysicalrecord inserting record data into the block
- Each record structure is as follows:
Header |
Record Content |
Crc |
Record Size |
Type |
Record Content |
Read
The log read logic is nothing special, slightly.
Originally current, Manifest and log intended to note together, but to understand manifest,leveldb version of the mechanism must be clear, and this itself is very rich content.
Leveldb source of four log files