Leveldb performance analysis and performance

Source: Internet
Author: User
Leveldb is a very efficient kV database implemented by Google. version 1.2 Currently supports billion-level data volumes. At this level, there is still a very high performance, mainly thanks to its good design. Especially the LSM algorithm.

So how does one solve the random I/O that databases are most afraid?

Random write is recorded in the log file first. Before the log file is full, memtable is simply updated. Then random write is converted to sequential write. When the log is full, the data in the log is sorted into an SST table and merged with the previous SST. This operation is also sequential read and write. We all know that the sequential read/write throughput of traditional disk raid is very large, and there is no problem around M. When writing a log file, the buffer Io is used. That is to say, if the operating system has enough memory, the read and write operations are all buffered by the operating system, and the effect is very good. Even in the sync write mode, data is written in units of 4 K accumulated, so the efficiency is high.

What about random reading? This cannot be solved. However, SSD disks are best at random reading. This hardware solves this problem naturally.

Therefore, leveldb is out-of-the-box with SSD disk raid.

For the leveldb standard version compilation, see here. Because the standard version uses the C ++ 0x feature, it is not supported on the RHEL platform, so for portability, for Basho, see the Standard C ++ version of port here. For details, see the directory c_src/leveldb.

This version is used for our tests. We tested the performance of leveldb with data volumes of 10 million, 0.1 billion, and 1 billion respectively, and found that the performance did not change much as the dataset changes.

Since the default SST file of leveldb is 2 MB, tens of thousands of files are needed when the dataset reaches GB. I modified the following:

version_set.cc:23 static const int kTargetFileSize = 32 * 1048576;

Change the default file to 32 MB, reducing the directory pressure.

My test environment is:

$ Uname-r2.6.18-164. EL5 # RHEL 5u4 #10 * SAS 300g RAID card, fusionio 320g, flashcache, 96 GB memory, 24 * Intel (r) Xeon (r) CPU

Top:

21782 root      18   0 1273m 1.1g 2012 R 85.3  1.2   1152:34 db_bench

Iostat says:

$iostat -dx 5...sdb1              0.40     0.00  3.40  0.00    30.40     0.00     8.94     0.02    4.65   4.65   1.58fioa              0.00     0.00 2074.80  3.80 16598.40    30.40     8.00     0.00    0.13   0.00   0.00dm-0              0.00     0.00 1600.00  0.00 16630.40     0.00    10.39     0.25    0.15   0.15  24.76...

During this test, note that snappy compression is not enabled. If compression is performed, the compression performance will be much higher, because I/O is less than half.

Write_buffer_size = $ (256*1024*1024). The log size is set to 256 MB, which reduces the overhead of switching logs and reduces the frequency of data merging.

At the same time, we should note that db_act is a single-threaded program and there is a compact thread. Therefore, at most, this program can only run to 200% of the CPU, and Io util is not very high. in other words, if it is a multi-threaded program, the performance will be improved by N times.

Let's take a look at the actual performance figures:

#10 million records $ sudo. /db_buffers -- num = 10000000 -- write_buffer_size = $ (256*1024*1024) leveldb: version 1.2 Date: Fri May 27 17:14:33 2011cpu: 24 * Intel (r) xeon (r) CPU x5670 @ 2.93 ghzcpucache: 12288 kbkeys: 16 bytes eachvalues: 100 bytes each (50 bytes after compression) entries: 10000000 rawsize: 1106.3 MB (estimated) filesize: 629.4 MB (estimated) write_buffer_size = 268435456 warning: snappy compression is not enabled limit fillseq: 2.134 micros/OP; 51.8 MB/sfillsync: 70.722 micros/OP; 1.6 Mb/s (100000 OPS) fillrandom: 5.229 micros/OP; 21.2 MB/soverwrite: 5.396 micros/OP; 20.5 MB/sreadrandom: 65.729 micros/OP; readrandom: 43.086 micros/OP; readseq: 0.882 micros/OP; 125.4 MB/sreadreverse: 1.200 micros/OP; 92.2 MB/scompact: 24599514.008 micros/OP; readrandom: 12.663 micros/OP; readseq: 0.372 micros/OP; 297.4 MB/sreadreverse: 0.559 micros/OP; 198.0 MB/sfill100k: 349.894 micros/OP; 272.6 Mb/s (10000 OPS) crc32c: 4.759 micros/OP; 820.8 Mb/s (4 K per OP) snappycomp: 3.099 micros/OP; (snappy failure) snappyuncomp: 2.146 micros/OP; (snappy failure) #0.1 billion records $ sudo. /db_buffers -- num = 100000000 -- write_buffer_size = $ (256*1024*1024) leveldb: version 1.2 Date: Fri May 27 17:39:19 2011cpu: 24 * Intel (r) xeon (r) CPU x5670 @ 2.93 ghzcpucache: 12288 kbkeys: 16 bytes eachvalues: 100 bytes each (50 bytes after compression) entries: 100000000 rawsize: 11062.6 MB (estimated) filesize: 6294.3 MB (estimated) write_buffer_size = 268435456 warning: snappy compression is not enabled limit fillseq: 2.140 micros/OP; 51.7 MB/sfillsync: 70.592 micros/OP; 1.6 Mb/s (1000000 OPS) fillrandom: 6.033 micros/OP; 18.3 MB/soverwrite: 7.653 micros/OP; 14.5 MB/sreadrandom: 44.833 micros/OP; readrandom: 43.963 micros/OP; readseq: 0.561 micros/OP; 197.1 MB/sreadreverse: 0.809 micros/OP; 136.8 MB/scompact: 123458261.013 micros/OP; readrandom: 14.079 micros/OP; readseq: 0.378 micros/OP; 292.5 MB/sreadreverse: 0.567 micros/OP; 195.2 MB/sfill100k: 1516.707 micros/OP; 62.9 Mb/s (100000 OPS) crc32c: 4.726 micros/OP; 826.6 Mb/s (4 K per OP) snappycomp: 1.907 micros/OP; (snappy failure) snappyuncomp: 0.954 micros/OP; (snappy failure) #1 billion records $ sudo. /db_buffers -- num = 1000000000 -- write_buffer_size = $ (256*1024*1024) password: leveldb: version 1.2 Date: Sun May 29 17:04:14 2011cpu: 24 * Intel (r) xeon (r) CPU x5670 @ 2.93 ghzcpucache: 12288 kbkeys: 16 bytes eachvalues: 100 bytes each (50 bytes after compression) entries: 1000000000 rawsize: 110626.2 MB (estimated) filesize: 62942.5 MB (estimated) write_buffer_size = 268435456 warning: snappy compression is not enabled limit fillseq: 2.126 micros/OP; 52.0 MB/sfillsync: 63.644 micros/OP; 1.7 Mb/s (10000000 OPS) fillrandom: 10.267 micros/OP; 10.8 MB/soverwrite: 14.339 micros/OP; 7.7 Mb/s... relatively slow to be supplemented

Conclusion: leveldb is a good kV database, which focuses on solving the problem of poor random Io performance and the performance of multi-thread update.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.