Performance report and technical analysis of TERARKDB database

Last Update:2016-06-02 Source: Internet

Author: User

Tags benchmark mixed open mmap elastic search

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

I believe many people have seen the hot American drama "Silicon Valley", which describes the future technology is that can be compressed data on the search, without the need to extract data beforehand. In reality, we are developing this technology. Based on this core technology, we have released the storage Engine product terarkdb, which has very high technical barriers. Our goal is to go beyond Facebook's rocksdb,google Leveldb,mongodb Wiredtiger to make the world's best-performing storage engine.

TERARKDB Introduction

The TERARKDB is a storage engine with very high performance and data compression rates. This is similar to Facebook's rocksdb, but with more functionality than ROCKSDB, here are the features of TERARKDB:

High compression rate, usually 2~5 times of snappy
Real-time free decompression to retrieve data directly
Query latency is low and stable
The same Table can contain multiple indexes, support federated indexes, support scope search
Native support for regular expression retrieval
Supports embedding process, or server-client mode
Data persistence
Supports Schema with rich data types
column storage and row storage, supported by column Group

TERARKDB has a wide range of applications in the Internet as well as in traditional industries. Because TERARKDB is optimized for read operations, it is more suitable for reading less and writing large volumes of read-in scenarios.

The TERARKDB approach is quite flexible and can be used as a standalone library to accommodate customer-specific scenarios. Download packages and Docker are available to facilitate user downloads. Currently supports linux,windows and Mac OS OS.

TERARKDB, as a storage engine, has its own native interface, while providing a compatible LevelDB interface that can be adapted to all systems and applications that use LevelDB, such as SSDB for most Redis interfaces. In addition, the widely used ROCKSDB interface is a superset of the LevelDB interface, so most systems and applications that use ROCKSDB can easily be adapted to TERARKDB.

Terark official provides terarkdb to MongoDB adaptation, to MySQL and other distributed database system adaptation also during the tense development process, the stable version of the Mongoterark product has been scheduled to be released in the near future.

TERARKDB Performance Test Report

This section is from Terark official website to view the original content

Directory

1. Environment
- 1.1. Server information
- 1.2. Compare objects
- 1.3. Test Data set
- 1.4. Test the source code
- 1.5. Compression Rate Description
2.Tests
- 2.1. Random Read Test
- 2.2. Random Write Test
- 2.3. Read-Write mixed test
- 2.4 Read delay Test

1. Environment 1.1. Server information

Indicators	Description
Cpu	Intel (R) Xeon (r) CPU e5-2630 v3 @ 2.40GHz (2 x 8 physical cores)
Memory	Up to GB of DDR4 RAM
Ssds	Intel? SSD 520 Series (480GB, 2.5in SATA 6gb/s, 25nm, MLC)
Linux Kernel	3.10.0-327.10.1.el7.x86_64

1.2. Compare objects

Product Name	version	company
Rocksdb	v4.4	Facebook
Wiredtiger	v2.8.0	Mongodb
Hyperleveldb	v1.2.2
Leveldb	v1.18	Google

1.3. Test Data set

Amazon Movie Data (million reviews) with an average length of approximately 1K per piece of data

Raw data format

product/productId: B00006HAXWreview/userId: A1RSDE90N6RSZFreview/profileName: Joseph M. Kotowreview/helpfulness: 9/9review/score: 5.0review/time: 1042502400review/summary: Pittsburgh - Home of the OLDIESreview/text: I have all of the doo wop DVD‘s and this one is as good or better than the1st ones. Remember once these performers are gone, we‘ll never get to see them again.Rhino did an excellent job and if you like or love doo wop and Rock n Roll you‘ll LOVEthis DVD !!

Meta data (column name)

Because TERARKDB has Schema, you do not need to save metadata in each record (column name)
To be fair, insert a separator between columns (fields) for other databases, and do not save column names

Data set Size

moviesThe total size of the dataset is about 9GB , and the number of records is approximately800万条

1.4.Benchmark Source Code

Benchmark source code See GitHub Warehouse

1.5.Compression Ratio

TERARKDB data compression using a compression algorithm developed by itself
Other databases use block compression, block size is 4KB, and the compression algorithm is set to snappy
We use a randomly written test case to compare the size of the data that was written and compressed

2.Tests

All read operations are random queries on a single record. All write operations are also randomly inserted or updated on a single record.

2.1.Random Read

All data is pre-written to the file system
All database write operations are enabled for compression, and the Rocksdb/leveldb/wiredtiger usage algorithm is configured snappy
TERARKDB uses our own proprietary compression algorithm, does not require block compression, other databases use the default block size of 4KB (block size)

2.1.1. Data is less than memory

In this case our memory is large enough to load all the data into memory while the TERARKDB does not require a proprietary cache, but other databases require a proprietary cache (primarily used to cache the extracted data for block compression), and we set the private cache setting to 3GB for these databases.

At the same time this test we do not limit the operating system memory usage (total memory 64GB), the amount of data is much smaller than memory, the operating system can cache all the data.

We can see that TERARKDB is better than other databases in this case:

TERARKDB uses self-developed data compression algorithm, can directly extract a single record, do not need the traditional database block compression/decompression
TERARKDB uses self-developed succinct compressed data structures as an index, uses less memory, and searches faster

2.1.2. Data slightly larger than memory

When the amount of data is not fully loaded into memory, we need to store the data on a physical disk (we use SSDs as storage media here).

The physical memory that the operating system can use is limited to 8GB
We set up a dedicated cache of 1GB for other databases to load hot data
All databases are warmed up (terarkdb open mmap populate, other databases are read-ahead)

In this case, the advantages of TERARKDB are more obvious:

In addition to TERARKDB, other databases need to use block compression, in the case of random read, even if there is cache support, but after all, the size of the cache is limited, it is not possible to load all the data into the cache, which will lead to frequent disk I/O, reduce read performance
TERARKDB compression ratio is high, compressed data can be loaded into memory, while TERARKDB can directly access the compressed data, so that the advantages of terarkdb more obvious
Other databases because of the use of proprietary caches, when the data read far beyond the cache capacity, will cause a lot of data in and out, adding additional resource overhead

2.1.3. Data is much larger than memory

Operating system memory limit is 3G
Set up a dedicated 256M cache for other databases
All databases are warmed up (terarkdb open mmap populate, other databases are read-ahead)

Because TERARKDB is much higher than the data in other databases, this image uses logarithmic coordinates to make it easier to see the order of magnitude (see the vertical axis)

2.2.Random Write

Compression is turned on for all databases when writing, and the default block compression size is 4KB (TERARKDB does not require block compression)
All write Buffer is set to 256M
Simultaneous operation using 1/3/6 threads on write

2.2.1. Data is less than memory

The environment for random write tests and random read tests is similar:

Storage media uses a memory file system (that is, the data is pre-read into the memory file system to speed up testing)
Operating system memory is not limited
In addition to TERARKDB, set up a dedicated 3GB cache for other databases

2.2.2. Data slightly larger than memory

Similar to the environment for random read tests:

The total memory limit for the operating system is 8GB
In addition to TERARKDB, the private cache for other databases is set to 1GB
Data storage media with SSD
Write buffer set to 256M

Test results on SSDs more realistically reflect the impact of disk I/O on performance:

The TERARKDB is written in an indexed and data-separated manner, enabling the writing of data to be converted to sequential write in some degree

2.2.3. Data is much larger than memory

Operating system memory limit is 3G
Set up a dedicated 256M cache for other databases

2.3.read-write Mixed

TERARKDB is mainly used in a small number of read-write scenes
A total of 8 threads were used in the test, where each thread was randomly read and written, and 95%/99% of the time was reading
Write operation all compression enabled, the size of block compression is 4KB
Start with a random read of other databases (warm up) and populate the dedicated cache

2.3.1. The amount of data is less than memory

Storage media uses a memory file system (that is, the data is pre-read into the memory file system to speed up testing)
Operating system memory is not limited
In addition to TERARKDB, the private cache for other databases is set to 3GB

2.3.2. Data slightly larger than memory

Storage media changed to SSD
Operating system memory limit is 8GB
Private cache for other databases is set to 1GB
Test 99% Read and 95% read separately

2.3.3. Data is much larger than memory

Operating system memory limit is 3G
Set up a dedicated 256M cache for other databases
All databases are warmed up (terarkdb open mmap populate, other databases are read-ahead)

Similarly, due to the magnitude difference, we look at the data by logarithmic coordinates:

2.4 Read Latency Test

The data set in this test is still 9G movie review data, only the Read Query delay is tested, no Write operation is in the test.

Because the TERARKDB compression rate is very high, the system memory 3G can be loaded with all the data (actually compressed data only 2.1G, but the test program itself to account for about 750M of memory), so the following 3 sets of comparisons, the terarkdb are in the 3G memory under the conditions of testing. For Rocksdb and Wiredtiger, we tested them in 8g,4g and 3G memory respectively. In all tests, we used 8 threads.

2.4.1. Data slightly larger than memory

8G Physical Memory (TERARKDB is 3G)
Other databases have a 512M dedicated cache

Average	Median	99th Percentile	StdDev
Rocksdb	40.86	24	300
Wiredtiger	58.82	41	450
Terarkdb	6.66	6	25

The horizontal axis indicates the delay, the unit of the number is microseconds, and the coordinate scale is approximate 对数
- A closer look at the number of the horizontal axis will reveal a much lower terarkdb delay.
Ordinate indicates the percentage of total query number of cumulative query in interval
Point ( X, Y% ) indicates a query with a delay of less than X microseconds for the total number of queryY%
Data results, the faster you reach 100%, the better the Query delay performance (lower latency)
In the current situation, the memory is sufficient for all databases, so the curve is smoother
TERARKDB's latency mean, median, standard deviation, 99-bit value have obvious advantages, latency is stable.

2.4.2. Data is much larger than memory

3G Physical Memory
Other databases have 256M of proprietary caching

Average	Median	99th Percentile	StdDev
Rocksdb	1338.88	1210	5000
Wiredtiger	273.06	353	600
Terarkdb	6.67	6	25

Other databases have a two-segment skew curve that represents the delay between the read data hit memory and the absence of hit memory, which is basically the point at which the cache is hit.
TERARKDB delay is much lower, terarkdb of the latency mean, median, standard deviation, 99-bit value has obvious advantages, latency is stable
In this case, although the total memory is only 3g, but our compression ratio is high, the compressed data can be fully loaded into memory, so there will be no cache misses

2.4.3 We also tested the indicators for ROCKSDB and Wiredtiger under 4G memory conditions:

Average	Median	99th Percentile	StdDev
Rocksdb	964.21	970.36	2500
Wiredtiger	204.85	56.25	600
Terarkdb	6.67	6	25

We can see that in the case of 4G memory, Rocksdb and Wiredtiger have a higher rate of cache hit operations (middle horizontal line)

Technical analysis

TERARKDB used very advanced and complex technology, and also applied for 4 patents. Its core technology is fundamentally different from other database products such as B + Tree, LSM tree, and block compression technology. The benefit is that the compression ratio and performance are greatly improved, not simple time-space interchange. This article briefly introduces a few technical points, more technical details please go to terark.com to view the document.

Not "space-for-time" or "time-to-space" existing technology

The existing mainstream database is also using compression technology, but they are mainly on 时间与空间的折衷 : The compression method is the use of universal compression technology 按块/页(block/page)压缩 (block size is usually 4k~32k, the compression rate known as the TOKUDB block size is 2m~4m).

When compression is enabled, it follows that 访问速度下降 this is because:

When writing, many records are packaged together to compress into blocks, increasing the size of the block, the compression algorithm can get a larger context, and 提高压缩率 Conversely, reducing the block size decreases the compression ratio.
Read, and even 读取很短的数据，也需要先把整个块解压 then read the extracted data. Thus, the larger the block size, the greater the number of records contained within the same block, the more unnecessary decompression is done to read a single piece of data, and the worse the performance will be. Conversely, the smaller the block size, the better the performance.

Once the compression is enabled, in order to alleviate the above problems, the traditional database generally need relatively large 专用缓存，用来缓存解压后的数据 , so that can be large 提高热数据的访问性能 , but also caused by 双缓存 the space occupancy problem, one is 操作系统缓存中的压缩数据 , two 专用缓存中解压后的数据 . There is also a very serious problem: after all, the 专用缓存 cache, when the cache misses, still need to extract the entire block, this is 慢Query a source of the problem; Another source of slow query is when the operating system cache misses ...

The Btree index of a traditional database also occupies a larger space, because the compression rate typically used by Btree 前缀压缩 is very low.

All these lead to the existing traditional database on 访问速度 and 空间占用 on is a problem that can not be solved completely, only to make such a compromise.

Terark's technology is fundamentally different from the existing database

For data compression (which can be considered as the compression of value in Key-value), TERARKDB mainly uses its own research and development of the database-specific 全局压缩 technology, compression rate is higher, and there is no concept of block compression, there is no problem of double caching. This compression technology can press Rowid/recordid Direct 读取单条数据 , if this 读取单条数据 is regarded as an decompression, then, according to RowID 顺序 decompression, the decompression speed is generally 500MB per second (single-threaded), up to about 7gb/s; RowID 随机 when decompressed, the decompression speed is generally 300MB per second (single thread), up to about 3gb/s.

For index compression, Terark mainly uses Succinct technology, the compression rate is higher than the existing technology, and, in addition to compression, the 不用解压就可以高效地执行搜索 index can be supported 正则表达式搜索 (without having to iterate through matching regular expressions). This technology-based Succinct Index also has additional support for 反向搜索 : Forward is to get RowID from key, reverse search is to get key from RowID, so that key does not need to store one copy (traditional Btree index does not have this function). This provides a technical fulcrum for TERARKDB to support multiple indexes on the same Table.

Succinct technology has been around for a long time, but since performance issues have not been widely used, Terark succinct technology has been specifically optimized at the CPU instruction level, significantly improving succinct performance.

It is the use of these new technologies, the compression rate and access speed of TERARKDB is greatly improved, and the function is very rich.

TERARKDB Database Schema

The TERARKDB database contains multiple segment, which can be divided into writing segment,writable frozen segment, as well as ReadOnly segment, according to segment status. The data is written to writing segment first, and the data in this segment can be updated and retrieved directly. When the data is written to a certain size, the writing segment becomes writable frozen segment, and begins to be compressed by the background thread. When the background compression is complete, readonly segment is generated and writable frozen segment is deleted. In addition, the physical deletion of data, segment merging, and so on, are also performed in a background thread. Eventually, most of the data will be in the ReadOnly segment, resulting in very high compression rates and access performance.

Automata Technology and succinct technology

With Terark at the same time in the engineering succinct technology and the famous Berkeley Amplab Laboratory, Spark was born in this laboratory. Terark has its own advantages in algorithms, data structures and engineering techniques.

There are a lot of applications of automata technology in TERARKDB, the self-motive is a state transfer diagram, which is used to express data, along the edges of the graph, to access the nodes according to certain rules, so that the required data can be extracted. Using traditional techniques to store this graph, memory consumption is large, and Terark uses succinct technology to compress this state transition diagram. The essence of succinct technology is to use bitmap to represent data structures, and memory usage is greatly reduced while maintaining fast access performance. On the other hand, because it is based on automata, it is possible to natively support regular expression retrieval.

Conclusion

Welcome to download the use of Terark products. Future Terark plans to port the core engine to more distributed systems for more scenarios, such as Elastic Search,spark, mobile phones and embedded devices. Terark at this stage of the plan is to find more research and development and business cooperation, the product to the market as soon as possible. We are currently hiring and interested friends can contact us directly. You can also visit the official website for more information.

Performance reporting and technical analysis of the TERARKDB database

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Performance report and technical analysis of TERARKDB database

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Performance report and technical analysis of TERARKDB database

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support