Leveldb principle and use of __ program related

Source: Internet
Author: User
Tags new set website performance

LEVELDB is a local file based storage engine, distributed storage engine, based on the bigtable (LSM file tree), no indexing mechanism, storage entries for Key-value. Applies to saving data caching, log storage, caching, and so on, primarily to avoid latency problems with RPC requests. In the access model, sequential read performance is very high, but for random read of the situation is much delayed (but the performance is not particularly low), more suitable for sequential write (key), random key write will not cause problems. Data memory is usually 3~5 times of physical memory, and it is not recommended to store too large data, at which LEVELDB performance is higher than those of "distributed storage" (that is, local disk access latency is less than RPC network latency).

1 If your log logs or video clips need to exist locally, and then bulk to the remote data center, then this requirement is very suitable for using LEVELDB to do data buffering. (These cached data are cut into small chunks and stored in Key-value in Leveldb)

2 If you want to build a local cache component, but the cache data may be larger than the memory capacity, at this time we can use LEVELDB to do support, leveldb a part of the hot zone data stored in memory, other data stored on disk, can be concurrent, Random Read key-value. However, the data cannot be too large, or the delay in disk reads will be large, and distributed caching should be used at this point. (Of course, distributed caching is used to solve the problem of data synchronization and consistency in a distributed environment, not only the problem of large data volume)

first, the principle

1. Files

The implementation of LEVELDB is similar to a tablet (Google) in BigTable, except that the underlying file organization is slightly different.

Each database consists of a series of local files that have different types:

log File

Log files store a sequence of the most recent update operations, each update (update) will be append to the end of the current log file, when the log file size reached the preset size, the log file will be converted into a sorted Table (. sst) file, and then scroll to create a new log file to hold subsequent updates operations, primarily for data recovery.

The data copy of the current log file is saved in a memory structure called memtable, and any update is first written to the memtable and then written to the log file. Each read operation first accesses memtable, and if the memtable does not trigger disk retrieval (if CacheSize is turned on, the cache is viewed before disk retrieval), so the update data can be reflected in the read operation.

The data in the memtable is stored in key order, that is, sequential storage (based on the Jump table implementation). The default memtable size is 4M and is determined by the parameter "Writebuffersize", which needs to be specified when the DB file is opened leveldb.

Sorted Tables (SST for short)

When the amount of memtable data reaches the threshold, it is flushed to the disk, generating a sorted Table (. sst) file, which stores a sequence of entries sorted by key, each entry can be key-value, or a key deletion tag (marker), and files are sorted according to key, as well as memtable. (deletion marks can mask older data saved in previous SST files, i.e. if a key is marked for deletion, then data about this key in the previous SST file will not be read)

Sorted tables are organized according to hierarchy (level), and the SST generated by log files will be placed in a special young levels-level-0, where the number of SST files in Young is more than one threshold (4), These young files will be merged with the data overlapping files in the LEVEL-1 and generate a new sequence of level-1 files (2M of each new file size).

Note: the "overlapping" meaning for the key interval exists in all two files. Keys are kept strictly sorted in the SST file. At the same time need to note that the SST file also contains bloomfilter content, Bloomfilter can quickly determine whether the key exists in this SST file, effectively improve read efficiency.

The files in young level may contain overlapping keys, but the SST files in the other level will only contain different "non-overlapping" keys intervals. If level-l, where L >= 1, when the total size of the SST file in Level-l reaches (10^l) MB (for example, Level-1 bit 100MB), then a file in 10mb,level-2 will be and level-l (level-) Files that have the keys overlap (overlay) merged and generate a new set of level-(l+1) files. These merges migrate the latest updates data from young levels to the highest level, using only bulk file read and write operations. There will be no SST files with key overlaps in the same level, but different level may be available.

The lower the level, the higher the data freshness. As you traverse the data, start at level 0 to advance to the high.

Manifest (list)

The manifest file lists the SST files that make up each level, as well as the corresponding key intervals, and includes some important metadata. When the database is reopened, a new manifest file is created (the file hit contains a new serial number). Manifest file format like log, "serving data" changes (such as the SST file creation, deletion) operation will be append to this log.

Current

The current file is a simple text file that holds the name of the most recent manifest file.

Other: Slightly

2, Level 0

When the log file size grows to a certain size (default 1M): Create a new memtable and log file to save the updates operation thereafter. In the background: writes the old memtable to the file to generate a new SST file, and then destroys the memtable. Delete the old log file and add the new SST file to the young level organization.

3, Compactions

When the level-l size reaches its limit, we will use a background thread to compaction it. When compressed, a file is selected from the Level-l and all files that overlap The file key in level-(l+1) are selected. If a file in level-l only overlaps a portion of a file in level-(l+1), the file in level-(l+1) is used as input for compression, and the file is discarded after the compression is complete. However, level-0 is special (the keys in the file may overlap), and we need special handling for level-0 to LEVEL-1 compression: in level-0 files overlap each other, then you can select multiple level-0 files as input at once.

Compresses the contents of the selected file back into a new sequence of level-(l+1) files (multiplexing), when each output file reaches 2M, a new file is switched, or when the key interval in the newly exported file covers more than 10 files in level-(l+2), Also toggles the generation of new files, and the second rule guarantees that level-(L+1) will not have to select too many files after the compression.

When a new file in level-(l+1) is added to "serving state", the old file is deleted (including Level-l and level-(l+1)).

When compressed, the value of those "overwritten" is discarded, and if a deletion tag is encountered and the corresponding key does not exist in the higher level, it is discarded directly.

Timing

Level-0 will read 4 1M files (up to 4 files per 1m,level-0) and, at worst, read all the Level-1 files (10M), which means we read and write 10MB.

Unlike level-0, for other level-l, we will read a 2M file, at worst, it overlaps with the 12 files in level-(l+1) (10 files, along with 2 files at the same time); then one compression will read and write 26MB data. Assuming the disk IO rate bit is 100m/s, then one compression takes about 0.5 seconds.

If we have limited disk speed, such as 10m/s, then compression can take up to 5 seconds.

Number of files

The size of each SST file is 2M (except for level-0), in fact we can reduce the total number of files by increasing this value (the source file needs to be recompiled), but this can result in more time-consuming compression (larger file size, disk intensive operation); We can put different files in multiple directories.

4. Data Recovery

1 Read the name of the latest manifest file from current.

2 Read the manifest file.

3 Clean up the expired files.

4 We can open all the SST files, but usually lazy better.

5 Convert log file to the SST file in the new level-0.

6 directs the write operation to the new log file.

7) Recycling of garbage files.

After each compression and recovery operation, the Deleteobsoletefiles () is invoked: All file names are queried from the database, and all other log files outside the current log file are deleted, removing all the And the SST files that are not referenced by the compression operation.

second, the use

Leveldb is a localized k-v storage database, designed to resemble bigtable, store keys in sequence in the underlying file, and in order to speed up the read operation, there is a memtable in memory to cache the data.

According to the LEVELDB's official website Performance benchmark test, we probably come up with its characteristics:

1 the sequential read (traversal) of the leveldb is extremely efficient, almost as close to the file system as the file order is read. Much faster than the Btree database.

2 their random reading performance is higher, but there are still several levels of difference in order reading. There is still a big gap between the random reads of LEVELDB and the database based on Btree. (Personal test, the efficiency of its random reading is not as high as the official website said, may be related to the cache configuration) random read, more than btree slow up about one times.

3 sequential write, very high performance (no forced sync), limited by disk speed, random write, performance slightly worse, but performance relative to other db, still have a great advantage. Whether it's sequential or random, performance is many times faster than Btree.

4) Leveldb for K-V storage structure, byte storage. belongs to the NoSQL database, does not support the transaction, can only query the data through the key, supports the batch reads and writes the operation.

5 leveldb key and value data size can not be too large, at the KB level, if the storage of large key or value, will have a greater impact on Leveld read and write performance.

6 The LEVELDB itself does not provide indexing mechanism, so the random read performance is slightly worse. It stores the key, value can be any byte array.

Because LEVELDB itself does not yet have the "distributed" cluster architecture capabilities, we base our limited data on LEVELDB storage (limited to local disk).

Case Deduction:

1) LEVELDB has the "cache + Disk Persistent storage" feature and does not support RPC calls, then LEVELDB needs to be deployed on the same host machine as application. Similar to the "embedded" k-v storage System.

2 If the storage data is less, 3~5g, and "read/write Ratio" (r:w) is higher, we can let leveldb as a local cache to use, such as guava cache + LEVELDB, this combination can be implemented similar to lightweight redis. That is used as a local cache. Typically, the data stored by LEVELDB is 3~5 times the size of memory (modern operating system configuration), and it is not recommended to use LEVELDB to store too large data, otherwise performance will be greatly reduced.

3 If there is more data, usually "sequential read" or "sequential write", we can use LEVELDB as a "miniature version" of the Hadoop HDFs, which can be used to cache the peak messages, the buffer of the log store. For example, we store the user action log in LEVELDB instead of sending the log directly to the remote side of Hadoop (because each time a direct call to RPC will have a significant impact on the throughput capability of the system), Instead, these frequently written log data are stored in local leveldb and then sent out in a "balanced" speed using background threads. Play the role of flow control.

The ACTIVEMQ, which uses LEVELDB as the underlying message data storage, has strong performance and fault-tolerant capability. In many cases, LEVELDB can be used as a storage scheme for local log and Io buffered files.

III. API Analysis (Java edition)

Native Leveldb is based on C + + development, the Java language can not be used directly, iq80 to LEVELDB Use the Java language for "Step by step" development, after many large project validation (such as ACTIVEMQ), The Java version of the iq80 development Leveldb has a minimal performance loss (10%). For Java developers, we can use it directly without having to install additional lib.

1, Pom.xml Java Code <dependency> <groupId>org.iq80.leveldb</groupId> <artifactid>leveldb</arti factid> <version>0.7</version> </dependency> <dependency> <groupid>org.iq    80.leveldb</groupid> <artifactId>leveldb-api</artifactId> <version>0.7</version> </dependency>

    2, code sample Java code    boolean cleanup = true;   Charset charset  = charset.forname ("Utf-8");   string path =  "/data/leveldb";     //init   dbfactory factory = iq80dbfactory.factory;   File  Dir = new file (path);  //If the data does not require reload, each reboot attempts to clean up the old data in the path under the disk.    if (cleanup)  {       factory.destroy (dir,null);//clear All files within the folder.   }   options options = new options (). Createifmissing (True);   //re-open new db  

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.