161104, NoSQL database: Key/value type of LEVELDB introduction and Java implementation

Source: Internet
Author: User
Tags new set

Summary : LEVELDB is a very efficient KV database implemented by Google, capable of supporting billion levels of data. There is also a very high performance at this level, thanks largely to its good design. Especially the LSM algorithm. LevelDB is a single-process service with very high performance on a 4-core Q6600 CPU machine, with write data exceeding 40w per second, while random read performance exceeds 10w per second.

Principle (you can see the relevant schematic diagram easier to understand, very similar to some of the components of Hadoop implementation)


1. Files

The implementation of LEVELDB is similar to a tablet (Google) in BigTable, except that the underlying file organization is slightly different.

Each database has a series of local files that have different types of files:

log File

The log file stores a sequence of the most recent update operations, each updated (update) is append to the end of the current log file, and when the size of the log file reaches a predetermined size, the log file is converted to a sorted Table (. sst) file, and then scroll through the creation of a new log file to hold the updates for later operations.

The data copy of the current log file is stored in a memory structure called memtable. Each read operation accesses memtable, so these update data can be reflected in the read operation.

Sorted tables (referred to as SST)

The Sorted table (. sst) file stores a sequence of entries sorted by key, each entry can be a key-value, or a delete tag for a key (marker). (The delete tag can mask the older data saved in the previous SST file, i.e. if a key is marked for deletion, then data about this key in the previous SST file will not be read)

Sorted tables is organized at levels (level), the SST generated by the log file is placed in a special young level-that is, level-0, when the number of SST files in young levels exceeds one threshold (4), These young files will be merged with files in Level-1 that have overlapping data and generate a new sequence of level-1 files (each new file size bit 2M).

Note: The meaning of "overlap" is that the key interval exists in two files. Keys are strictly sorted in SST files.

The files in the young level may contain overlapping keys, but the SST files in other level will only contain different "non-overlapping" keys intervals. If level-l, where L >= 1, when the total size of the SST file in Level-l reaches (10^l) MB (for example level-1 bit 10mb,level-2 bit 100MB), then a file in Level-l will be and level-(l+1) The files that have the keys overlap (overwrite) merged and generate a new set of level-(l+1) files. This writes merge, which allows you to migrate the latest updates from young to the highest level with only a batch of file read and write operations.

Manifest (list)

The manifest file lists the SST files that make up each level, along with the corresponding key intervals, and includes some important metadata. When database is reopened, a new manifest file is created (the file hit contains a new number sequence). Manifest file format like log, "serving data" changes (such as SST file creation, deletion) operation will be append to this log.

Current

The current file is a simple text file that holds the name of the currently up-to-date manifest file.

Others: slightly

2, Level 0

When the size of the log file grows to a certain size (default 1M):

    • Create a new memtable and log file to hold the updates for later operations.

    • In the background: writes the old memtable to the file to generate a new SST file, and then destroys this memtable. Delete the old log file and add the new SST file to the Young's level organization.

3, Compactions

When the size of the level-l reaches its limit, we will use a background thread to compaction it. When compressing, a file is selected from Level-l and all files in level-(l+1) that overlap This file key are selected. If a file in Level-l overlaps only part of a file in level-(l+1), then the file in level-(l+1) is used as the input for compression, and after the compression is finished, the file is discarded. However, level-0 is special (the keys in the file may overlap each other), for level-0 to LEVEL-1 compression we need special handling: level-0 files overlap each other, it is possible to select more than one level-0 file as input.

Compression will re-export the selected file contents to a new sequence of level-(l+1) files (multi-path merge), when each output file reaches 2M will switch a new file, or when the new output of the file in the key interval of level-(l+2) More than 10 files, A new file is also toggled, and the second rule guarantees that level-(L+1) will not have to select too many files after the compression.

When new files in level-(l+1) are added to "serving state", the old files will be deleted (including Level-l and level-(l+1)).

When compressing, the values of those "overwritten" will be discarded, and if a delete tag is encountered and the corresponding key does not exist in the higher level, it will be discarded directly.

Timing

Level-0 will read 4 1M files (up to 4 files per 1m,level-0), the worst case is to read all level-1 files (10M), that is, we read and write each 10MB.

Unlike level-0, for other level-l, we will read a file of 2M, and worst of all, it overlaps with the 12 files in level-(l+1) (10 files and 2 files at the same time); then one compression reads and writes 26MB data. Assuming that the disk IO rate bit is 100m/s, the one-time compression takes about 0.5 seconds.

If we are limited to disk rate, such as 10m/s, then compression can take up to 5 seconds.

Number of files

The size of each SST file is 2M, in fact we can increase this value to reduce the total number of files, but this will lead to more time-consuming compression (read larger file size, disk-intensive operations), in addition, we can put different files in multiple directories.

4. Data recovery

1) Read the name of the latest manifest file from current.

2) Read the manifest file.

3) Clean up files that have expired.

4) We can open all SST files, but usually lazy is better.

5) Dump the log retention file into the SST file in the new level-0.

6) Boot the write operation into the new log file.

7) Recycle garbage files.

After each compression and recovery operation, Deleteobsoletefiles () is called (): All file names are queried from database, and all log files other than the current log file are deleted, and the And the compressed operation does not reference the SST file.

Use

Leveldb is a localized k-v storage database, designed like BigTable, which stores key sequentially in the underlying file, and in order to speed up the read operation, there is a memtable in memory to cache the data.

Based on the performance benchmarks of the LEVELDB website, we probably derive its features:

1) The sequential read (traversal) of the LEVELDB is extremely efficient and nearly close to file system file sequential reads. Many times faster than the Btree database.

2) Its random reading performance is higher, but there are still several magnitude gaps in sequential reading. The random reading of LEVELDB and the database based on Btree still have a large gap. (Personal testing, the efficiency of its random reading is not as high as the official website said, may be related to the configuration of the cache) random reading, is more than btree slower than a few times.

3) Sequential write, very high performance (no forced sync), limited by disk rate, random write, performance is slightly poor, but performance compared to other db, there is still a great advantage. Both sequential and random write performance is much faster than btree.

4) Leveldb for K-V storage structure, byte storage. One of the NoSQL databases, which does not support transactions, can only query data by key, and supports bulk read and write operations.

5) Leveldb key and value data size can not be too large, at the KB level, if the storage of large key or value, will have a greater impact on the read and write performance of Leveld.

Because LEVELDB does not yet have a "distributed" cluster architecture capability, we are basing limited data on LEVELDB storage (limited to local disks).

Case Deduction:

1) LEVELDB has "cache + Disk Persistent storage" feature and does not support RPC calls, then LEVELDB needs to be deployed on the same host machine as application. Similar to the "embedded" k-v storage System.

2) If the storage data is small, 3~5g, and "read-write ratio" (r:w) is higher, we can let leveldb as a local cache to use, such as guava cache + LEVELDB, this combination can be achieved similar to the lightweight Redis. That is used as a local cache.

3) If there is more data, usually "sequential read" or "sequential write", we can use LEVELDB as a "miniature version" of Hadoop HDFs, which can be used to cache peak messages and log storage buffers. For example, we store the user operation log in leveldb, rather than directly send the log to the remote Hadoop (because RPC is called directly each time, will have a great impact on the throughput capacity of the system), Instead, these frequently written log data is stored in the local leveldb and then sent out using a background thread at a "balanced" rate. It acts as a "flow control".

ACTIVEMQ is the use of LEVELDB as the underlying message data storage, performance and fault-tolerant ability is very strong.

API Analysis (Java edition, MAVEN-based)

Native Leveldb is based on C + + development, the Java language is not directly used, iq80 to LEVELDB using the Java language "step-by-step" re-development, after many large-scale project verification (such as ACTIVEMQ), The Java version of IQ80 developed LEVELDB has a minimal performance penalty (10%). For Java developers, we use it directly, without the need for additional LIB installation.

1, Pom.xml

<dependency><groupId>org.iq80.leveldb</groupId><artifactId>leveldb</artifactId> <version>0.7</version></dependency><dependency><groupid>org.iq80.leveldb</ Groupid><artifactid>leveldb-api</artifactid><version>0.7</version></dependency >

2. Sample Code

        Boolean cleanup = true;        Charset Charset = Charset.forname ("Utf-8");        String Path = "/data/leveldb";        init dbfactory factory = iq80dbfactory.factory;        File dir = new file (path);        If the data does not require reload, then each reboot attempts to clean up the old data under path in the disk.        if (cleanup) {Factory.destroy (dir,null);//Clears all files within the folder.        Options options = new options (). Createifmissing (True);        Re-open new db db = Factory.open (dir,options);        Write Db.put ("key-01". GetBytes (CharSet), "value-01". GetBytes (CharSet)); Write the disk synchronously after writing writeoptions writeoptions = new Writeoptions (). sync (true);//Thread Safety Db.put ("key-02". GetBytes (        CharSet), "value-02". GetBytes (CharSet), writeoptions);        Batch write; Writebatch Writebatch = Db.createwritebatch ();        Writebatch.put ("key-03". GetBytes (CharSet), "value-03". GetBytes (CharSet));        Writebatch.put ("key-04". GetBytes (CharSet), "value-04". GetBytes (CharSet)); Writebatch.deletE ("key-01". GetBytes (CharSet));        Db.write (Writebatch);        Writebatch.close ();        Read Byte[] bv = Db.get ("key-02". GetBytes (CharSet));            if (BV! = null && bv.length > 0) {String value = new String (bv,charset);        System.out.println (value);        }//iterator, traversal, sequential read//Read the current Snapshot, snapshot, read the data during the change, will not be reflected Snapshot Snapshot = Db.getsnapshot (); Read option Readoptions readoptions = new Readoptions ();               readoptions.fillcache (false);//The data that is swap out in the traversal should not be saved in memtable.        Readoptions.snapshot (snapshot);//default snapshot is current. Dbiterator iterator = Db.iterator (readoptions);        while (Iterator.hasnext ()) {           map.entry<byte[],byte []> item = Iterator.next ();            string key = new String (Item.getkey (), CharSet);            string value = new String (Item.getvaluE (), CharSet);//null,check.           &NBSP;SYSTEM.OUT.PRINTLN (key + ":" + value);        }        iterator.close ();//must be//delete db.delete ("key-0        1 ". GetBytes (CharSet));        Compaction, Manual Db.compactrange ("key-". GetBytes (CharSet), NULL); Db.close ();

161104, NoSQL database: Key/value type of LEVELDB introduction and Java implementation

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.