INFLUXDB engine principle

Last Update:2017-03-20 Source: Internet

Author: User

Tags influxdb

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Introduction

Influxdb is a time series database written in the go language. Time series database is mainly used to store the data of the indicator based on timeseries, such as PV, UV and other indexes of a Web page, collect them periodically and timestamp, which is a time series based indicator. Time series databases are often used to match front-end pages to show a period of indicator curves.

Why a time series database is required

What are the advantages of time series databases over traditional relational databases and NoSQL, which are analyzed in the context of the characteristics of the relevant models

LSM Tree

The LSM tree is based on the bigtable architecture of Google and the data is stored in k-v manner.

The write data is first inserted into the in-memory tree. A merge operation occurs when the data in the in-memory tree exceeds a certain threshold. The merge operation iterates the leaf nodes of the in-memory tree from left to right and merges the leaf nodes of the tree in the disk, and when the amount of data being merged reaches the size of the disk's storage page, the merged data is persisted to disk, and the parent node's pointer to the leaf node is updated.

650) this.width=650; "src=" https://segmentfault.com/img/remote/1460000006766264 "style=" border:1px solid RGB ( 221,221,221); vertical-align:middle;padding:3px; "alt=" 1460000006766264 "/>

This mechanism guarantees the efficiency of writes because the data is written to disk pages sequentially after merging. However, the disk write-back is deferred, so in order to guarantee the consistency of the read data, it is queried in memory first, if not in memory, to disk.

When you delete data, look in memory (C0), if not, create a new index in memory, set the key value to delete the tag (create Tombstone), so that subsequent scrolling merge operation, then the query operation, will be directly returned that the key value does not exist. The data is removed from the data file in subsequent compaction.

compaction

The threshold for log files exceeding a certain size is (by default, 1MB):

Create a new memtable and log file, with new memtable and log files for future operations

The following actions are performed in the background:

the old Memtable write to Sstable (procedure is first to immtable_table, then traversal to write)
discard old memtable
delete old memtable and log files
add the new sstable to level 0.

650) this.width=650; "src=" https://segmentfault.com/img/remote/1460000005977489 "style=" border:1px solid RGB ( 221,221,221); vertical-align:middle;padding:3px; "alt=" 1460000005977489 "/>

For time series data, the LSM tree is highly read and write efficient. However, hot backup and data bulk cleanup are not efficient.

B + Tree

B + Tree, many relational databases like Berkerly DB, SQLite, MySQL database all use B + trees algorithm to process the index. B + Tree is characterized by the orderly emission of data in accordance with the index, sacrificing a certain write performance to ensure read efficiency. However, when the amount of data is large (GB), the query efficiency will be very low. Because the larger the amount of data, the more the tree forks, the greater the overhead of traversing.

TSM

Influxdb introduced the TSM engine in the v0.9.5 version, which modifies the self-LSM

Pre-write log

Closed after the current log file reaches 2MB size, and start writing new log files

When the data is written, the log file is dropped (Fsync) and the data index is added to the memory table and returned successfully. This design ensures consistency of the data. At the same time, the throughput performance requirements of the write disk, it is recommended to bulk submit data (Influxdb provides a batch submission API). The log follows the TLV format and employs a more streamlined data structure to reduce the overhead of write operations.

Data Files

File structure
650) this.width=650; "src=" https://segmentfault.com/img/remote/1460000006766265 "style=" border:1px solid RGB ( 221,221,221); vertical-align:middle;padding:3px; "alt=" 1460000006766265 "/> The data blocks in a file are arranged chronologically

Comparing the structure of LEVELDB with the addition of Min and Max time, data extraction based on a timeframe is very simple

data block structure
650) this.width=650; src= https://segmentfault.com/img/ remote/1460000006766266 "style=" border:1px solid rgb (221,221,221); vertical-align:middle;padding:3px; "Alt=" 1460000006766266 "/>id hash by the stored key (measurement name + tagset) and field name (fnv64-a hash) generates a
COMPRESSD block that stores metric values, which are detailed later in the data compression algorithm

INDEX block Structure
650) this.width=650; "src=" https://segmentfault.com/img/remote/1460000006766267 "style=" border:1px solid RGB ( 221,221,221); vertical-align:middle;padding:3px; "alt=" 1460000006766267 "/>

reading Data

First, based on the time range of the query request, the binary search in the data file to find the file that conforms to the scope. The mapping table in memory then obtains the ID based on the hash of the query metric item and finds the starting address of the block by index. Then, based on the timestamp of the data block and its next data block, we can figure out how many blocks need to be taken out and finally extract the data from the data block to get the result.

Update Data

if more than one update is within the same time range, the pre-write log is cached and updated together .

Delete Data

Two-stage processing, the first stage, the pre-write log will persist it in the log and inform the index to maintain the tombstone in memory. When the data is queried, it will return to nonexistent. In the second stage, the pre-write log writes the index file, takes precedence over the deletion, and then handles other insertions after the delete operation (including the deleted sequence and other sequences), and clears the in-memory tombstone.

Data Compression

The purpose of data compression is to reduce storage space and to reduce the cost of writing disks
650) this.width=650; "src=" https://segmentfault.com/img/remote/1460000006766268 "style=" border:1px solid RGB ( 221,221,221); vertical-align:middle;padding:3px; "alt=" 1460000006766268 "/> Each compression data block will contain a series of points (compression timestamp, compressed value), Because the timestamp is a monotonically increasing sequence, the offset of the time that is filled in when it is compressed

Summary

INFLUXDB's data storage structure enables sequential access to data based on series and 2 dimensions of timestamp. and reduce I/o overhead by compressing the data. In this scenario, you can increase processing speed by taking a series of data within a certain time frame. As the data is merged by time, for retention operation, the data file can be used as the unit of operation, the efficiency will be relatively high.

INFLUXDB engine principle

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More