1 billion-magnitude kv data persistence storage Engine: Principle of LEVELDB implementation

Source: Internet
Author: User
Tags redis
Leveldb One of the Daily Records :leveldb

Speaking of LEVELDB may not be clear to you, but if you're an IT engineer, and you don't know the next two great God-level engineers, your leader will probably hold: Jeff Dean and Sanjay Ghemawat. These two are Google's heavyweight engineers, and a handful of Google Fellow.

Jeff Dean: Http://research.google.com/people/jeff/index.html,Google Large-scale distributed platform BigTable and MapReduce are the main design and implementation.

Sanjay Ghemawat: Http://research.google.com/people/sanjay/index.html,Google Large-scale distributed platform gfs,bigtable and MapReduce are mainly designed and implemented by engineers.

Leveldb is the open source project initiated by the two great God-level engineers, in short, leveldb is a C + + library that can handle the Key-value data persistence storage of the 1 billion-tier scale. As described above, these two bits are BigTable's design and implementation, and if you understand bigtable, you should know that there are two core components in this far-reaching distributed Storage System: Master server and tablet server. Where master server does some management data storage and distributed scheduling, the actual distributed data storage and read and write operations are done by the tablet server, while Leveldb can be understood as a simplified version of the tablet server.

LEVELDB has the following characteristics:

First of all, Leveldb is a persistent storage kv system, unlike Redis this type of memory kv system, LEVELDB will not eat as much memory as Redis, but most of the data stored on disk.

Second, the Levledb store data in accordance with the record key value ordered storage, that is, adjacent key values in the storage file is sequentially stored in sequence, and the application can customize the key size comparison function, Levledb will be in accordance with user-defined comparison functions in order to store these records.

Again, like most KV systems, LEVELDB's operating interface is simple, with basic operations including writing records, Reading Records, and deleting records. Atomic bulk operations for multiple operations are also supported.

In addition, LEVELDB supports data snapshot (snapshot) functionality, which allows read operations to be unaffected by write operations and can always see consistent data during a read operation.

In addition, LEVELDB also supports operations such as data compression, which is directly helpful for reducing storage space and increasing IO efficiency.

Leveldb performance is very prominent, the official website reported that its random write performance reached 400,000 records per second, while the random read performance of 60,000 records per second. Generally speaking, Leveldb writes much faster than read operations, while sequential read and write operations are much faster than random read and write operations. As for why this is the case, after reading our follow-up leveldb, we estimate that you will understand the underlying reasons. Leveldb The second: The overall structure

LEVELDB is essentially a set of storage systems and some of the operational interfaces that are available on this set of storage systems. In order to understand the whole system and its processing process, we can look at the levledb from two different angles: static angle and dynamic angle. From a static point of view, you can assume that the entire system is running (constantly inserting deletes to read data), at this time we give leveldb photography, from the photos can be seen before the system's data in memory and disk is how the distribution, in what state, and so on, from a dynamic point of view, is mainly to understand how the system is written to a record, Read a record, delete a record, but also include internal operations, such as compaction, in addition to these interface operations, such as the operation of the system after the crash of how to restore the system and so on.

The overall architecture described in this section is primarily static, and the following sections detail the files or memory data structures involved in the static structure, and the latter part of the LEVELDB is mainly about the leveldb of the dynamic perspective, which means how the whole system works.

Leveldb as a storage system, data recording storage media including memory and disk files, if, as mentioned above, when Leveldb run for a while, at this time we give leveldb a perspective to take photos, then you will see the following scene:

Figure 1.1:LEVELDB Structure

As can be seen from the diagram, there are six main parts that comprise the LEVELDB static structure: memtable and immutable memtable in memory and several main files on disk: Current file, manifest file, Log files and sstable files. Of course, LEVELDB has some supporting files in addition to these six main parts, but the above six files and data structures are the main elements of LEVELDB.

LEVELDB's log file and memtable are consistent with the bigtable paper, when the application writes a key:value record, Leveldb writes to the log file first, and succeeds in inserting the record into the memtable. This basically completes the write operation, because one write operation involves only one disk sequential write and one memory write, so this is the main reason why the leveldb write speed is extremely fast.

Log files in the system's role is mainly used for system crash recovery without losing data, if there is no log file, because the written record is initially stored in memory, at this point if the system crashes, the data in memory has not been able to dump to disk, so the data will be lost (Redis this problem). To avoid this, LEVELDB records the action in the log file before it is written to memory, and then it is recorded in memory, so that even if the system crashes, the memtable in memory can be recovered from the log file without causing data loss.

When the memtable inserted data takes up memory to a limit, the memory needs to be exported to the external storage file, Levledb will generate new log files and memtable, the original memtable became immutable memtable, as the name suggests, This means that the contents of this memtable cannot be changed, and can only be read and not written or deleted. The new data is recorded in the new log file and the MEMTABLE,LEVELDB background scheduler will export immutable memtable data to disk to form a new sstable file. Sstable is the data from the memory of the continuous export and compaction operations, and sstable all of the files is a hierarchy, the first level is level 0, the second layer is level 1, and so on, the hierarchy gradually increased, That's why it's called leveldb.

Sstable file is key in order, that is, in the file small key records in the big Key records before the level of the sstable is the case, but here is the point to note: level 0 of the sstable file (suffix is. SST) is specific to Other level files: the. sst file in this hierarchy, two files may have key overlap, such as two levels 0 SST files, file A and file B, file a key range is: {bar, car The key range for file B is {blue,samecity}, and it is likely that all two files will have a key= "blood" record. For Other level sstable files, there will be no key overlap in the same hierarchy. sst files, that is, levels L

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.