Influxdb Principle Detailed

Source: Internet
Author: User
Tags crc32 key string cpu usage influxdb

This article is part of the series Influxdb series, which includes the following 15 parts:
    1. Installation and introduction of INFLUXDB learning Influxdb
    2. The basic concept of influxdb learning Influxdb
    3. The basic operation of INFLUXDB learning Influxdb
    4. HTTP API write operations for INFLUXDB learning Influxdb
    5. INFLUXDB Learning INFLUXDB Data Retention policy (Retention policies)
    6. INFLUXDB Learning influxdb Continuous query (continuous Queries)
    7. HTTP API query operation for INFLUXDB learning Influxdb
    8. The key concepts of INFLUXDB learning Influxdb
    9. INFLUXDB Learning influxdb Common functions (i) Aggregate class functions
    10. The INFLUXDB function of Influxdb Learning (II.) Selecting a class function
    11. INFLUXDB Learning influxdb Common functions (III.) Transformation class functions
    12. INFLUXDB study the continuous query
    13. Influxdb Principle Detailed
    14. INFLUXDB solutions that cannot be accessed by Web pages after installation
    15. INFLUXDB data Backup and recovery methods to support local and remote backup

See the series for more details: "Influxdb Series Tutorial"

Article Directory
    • First, influxdb characteristics
    • Ii. the concept of influxdb
    • Third, storage engine-TSM Tree
    • Iv. Directory and file structure
    • V. Data Query and index structure

Influxdb is a time series database written in Go language, this article mainly introduces the architecture and basic principle of the next influxdb.

For more Influxdb detailed tutorials See: INFLUXDB Series Learning Tutorials Catalogue

INFLUXDB Technology Group: 580487672 (click to join)

First, influxdb characteristics

    • You can set the save time for metric.
    • Supports the deletion of data through conditional filtering and regular expressions.
    • Supports SQL-like syntax.
    • You can set the number of copies of data in the cluster.
    • Supports periodic sampling of data and writes additional measurement to facilitate granular storage of data.
Ii. INFLUXDB concept 1) data format line Protocol

In InfluxDB, we can roughly consider a piece of data to be deposited as a virtual key and its corresponding value (field value) in the following format:

Cpu_usage,host=server01,region=us-west value=0.64 1434055562000000000

The virtual key consists of the following parts: database, retention policy, measurement, tag sets, field name, timestamp. Database and retention policy are not reflected in the above data, and are typically specified in the corresponding fields of the HTTP request when inserting data.

  • database name, you can create multiple databases in InfluxDB, data files in different databases are isolated and stored in different directories on disk.
  • retentionPolicy: storage policies that set the time for data retention, and each database is initially automatically created with a default storage policy of Autogen, with data retention times permanent, after which users can set their own, such as keeping data for the last 2 hours. If you do not specify a storage policy when inserting and querying data, the default storage policy is used, and the default storage policy can be modified. InfluxDB will periodically purge expired data.
  • measurement: A measurement indicator name, such as Cpu_usage, indicates CPU usage.
  • tag Sets: tags in the InfluxDB will be sorted by dictionary order, whether TAGK or TAGV, as long as the inconsistency is two keys, for example, host=server01,region=us-west and host=server02,region=us-west is two different tag set.
  • Field name: For example, in the data above, value FIELDNAME,INFLUXDB supports inserting multiple fieldName in a single piece of data, which is actually a syntactic optimization that is stored as multiple data in the actual underlying storage.
  • timestamp: Each piece of data needs to specify a timestamp that is treated specifically in the TSM storage engine in order to optimize subsequent query operations.
    2) Point

    The data structure of a single insert statement in InfluxDB, series + timestamp can be used to differentiate a point, meaning that a point can have multiple field name and field value.

    3) Series

    Series is equivalent to a collection of some data in the InfluxDB, in the same database, retention policy, measurement, tag sets identical data belong to a series, the same series of data in the physical are stored together in chronological order.

    The series key is Measurement + the serialized string of all tags, and this key is often used later.

    The structure in the code is as follows:

    Type Series struct {    mu          sync. Rwmutex    Key         string              //Series Key    tags        map[string]string   //tags    id          uint64              //ID    measurement *measurement        //measurement}
    4) Shard

    Shard is a relatively important concept in InfluxDB, and it is associated with retention policy. There are many shard under each storage policy, each shard stores data for a specified period of time, and does not repeat, for example, 7 o'clock-8 points of data fall into SHARD0, 8 points-9 points of data fall into shard1. Each shard corresponds to a lower-level TSM storage engine with a separate cache, Wal, and TSM file.

    When you create a database, a default storage policy is created automatically, and the data is persisted for a period of 7 days for the data saved by Shard under this storage policy, and the functions are calculated as follows:

    Func shardgroupduration (d time. Duration) time. Duration {    if D >= 180*24*time. Hour | | D = = 0 {//6 months or 0        return 7 * time. Hour    } else if D >= 2*24*time. Hour {//2 days        return 1 * * time. Hour    }    return 1 * time. Hour}

    If you create a new retention policy setting data that is retained for 1 days, the data stored by a single shard is 1 hours long, and more than 1 hours of data is stored in the next shard.

    Third, storage engine-TSM Tree

    From LevelDB (LSM tree), to Boltdb (mmap B + Tree), now InfluxDB uses its own implementation of the TSM tree algorithm, similar to the LSM tree, specifically optimized for the use of InfluxDB.

    The TSM tree is InfluxDB based on the actual needs of the LSM tree, based on a slight modification.

    The TSM storage engine consists mainly of several parts: cache, Wal, tsm file, compactor.

    1) Shard

    Shard is not considered one of these components, as this is a concept on the TSM storage engine. In InfluxDB, depending on the scope of the data timestamp, a different shard is created, each shard has its own cache, Wal, TSM file, and compactor, This is done in order to quickly locate the relevant resources to query the data through time, speed up the query process, and also make the subsequent bulk delete data operations become very simple and efficient.

    Deleting data in the LSM tree is done by inserting a delete tag into the specified key, the data is not immediately deleted, and the file is then compressed and merged to really delete the data, so deleting large amounts of data is a very inefficient operation in the LSM tree.

    In InfluxDB, the retention time of the data is set by retention policy, and when the data in one shard is detected to expire, only the resource of The Shard is released and the related files are deleted, which makes it very efficient to delete outdated data.

    2) Cache

    The cache is equivalent to the memtable in the LSM Tree, which is a simple map structure in memory, where the key is Serieskey + delimiter + filedname, and the delimiter in the current code is #!~#,entry equivalent to a chronological order An array of actual values, with the following structure:

    Type Cache struct {    commit  sync. Mutex    mu      sync. Rwmutex    Store   map[string]*entry    size    uint64              //Current Memory usage    maxSize UInt64              //Cache Max    //Snapshots is the cache objects that is currently being written to TSM files    //They ' re kept in memory Whil e Flushing so they can is queried along with the cache.    They is read only and should never is modified    //Memtable snapshot, used to write to TSM file, read-only    snapshot *cache     snaps    Hotsize UInt64 snapshotting BOOL//This number is the number of    pending or failed writesnaphot attempts since t He last successful one.    snapshotattempts int    stats        *cachestatistics    lastsnapshot time. TIME}

    When inserting data, you are actually writing data to both the cache and the Wal, and you can assume that the cache is a cached data in the Wal file in memory. When InfluxDB starts, it iterates through all the Wal files and reconstructs the cache so that it does not cause data loss even if the system fails.

    The data in the cache is not infinitely growing, and there is a maxSize parameter that controls how much memory is consumed by the data in the cache and writes the data to the TSM file. If not configured, the default upper limit is 25MB, each time the cache data reaches the threshold, the current cache will be a snapshot, then empty the contents of the current cache, and then create a new Wal file for writing, the remaining Wal files will eventually be deleted, the data in the snapshot will go through the row Write to a new TSM file.

    The current cache design has a problem, when a snapshot is being written to a new TSM file, the current cache due to a large amount of data written, and reached the threshold, the previous snapshot has not been fully written to disk, InfluxDB practice is to let the subsequent write operation failed, the user needs to handle , waiting for the recovery to continue writing the data.

    3) WAL

    The content of the Wal file is the same as the cache in memory, which is intended to persist data, which can be recovered through the Wal file after a system crash and not yet written to the TSM file.

    Because the data is inserted sequentially into the Wal file, the write efficiency is very high. However, if the data written is not in chronological order, but is written in a haphazard manner, the data will be routed to different shard according to the time, each shard has its own Wal file, so it is no longer a complete sequential write, the performance will have a certain impact. See what the official community has to say follow-up will be optimized, using only one Wal file instead of creating a Wal file for each shard.

    Wal a single file reaches a certain size after it is partitioned, creating a new Wal shard file for writing data.

    4) TSM File

    A single TSM file size of up to 2GB for storing data.

    TSM file uses its own design format to optimize query performance and compression, and in later chapters it describes its file structure.

    5) compactor

    The compactor component runs continuously in the background, checking every 1 seconds for the need to compress the merged data.

    There are two main operations, one is to take a snapshot after the data size in the cache reaches the threshold, and then dump it into a new TSM file.

    The other is to merge the current TSM file, merging multiple small TSM files into one, so that each file reaches the maximum size of a single file, reduces the number of files, and some data deletion is done at this time.

    Iv. Directory and file structure

    The InfluxDB data store has three main directories.

    By default, it is Meta, Wal, and data three directories.

    Meta is used to store some metadata for the database, and there is a meta.db file under the meta directory.

    The Wal directory holds a pre-written log file ending with a. Wal. The data directory holds the actual stored file, ending with. Tsm. The structure under these two directories is similar, with the following basic structure:

    # WAL directory Structure--Wal   --mydb      --Autogen         --1            --_00001.wal         --2            --_00035.wal      --2hours         --1< c8/>--_00001.wal# Data Directory structure--Data   --mydb      --Autogen         --1            --000000001-000000003.TSM         --2            --000000001-000000001.TSM      --2hours         --1            --000000002-000000002.TSM

    Where MyDB is the database name, Autogen and 2hours are the storage policy names, and the next level directory is named after the directory is the Shard ID value, such as Autogen Storage Policy has two Shard,id respectively 1 and 2,shard stored a certain Data within the time period range. The next level of the directory is the specific file, the files that end with. Wal and. TSM, respectively.

    1) WAL File

    A data in the Wal file corresponds to all the value data under a key (measument + tags + fieldName), sorted by time.

  • Type (1 byte): Represents the type of value in this entry.
  • key Len (2 bytes): Specifies the length of the key in the following field.
  • key (N bytes): Here the key is Measument + tags + fieldName.
  • count (4 bytes): followed by the number of data under the same key.
  • Time (8 bytes): The timestamp of a single value.
  • value (N bytes): The specific content of value, where float64, Int64, Boolean is a fixed number of bytes stored is relatively simple, through the Type field knows the number of bytes of value here. The string type is special, for string, N bytes, the Value portion of the preceding 4 bytes to store the length of the string, the remainder is the actual content of the string.
    2) TSM file

    The primary format for a single TSM file is as follows:

    Mainly divided into four parts: Header, Blocks, Index, Footer.

    Where the content of the Index section is cached in memory, the data structure of each part is explained in detail below.

    Header

  • magicnumber (4 bytes): Used to differentiate which storage engine is currently used by the TSM1 engine, MagicNumber for 0x16D116D1 .
  • Version (1 byte): Currently the TSM1 engine, this value is fixed to1
    Blocks

    Inside the Blocks are some contiguous block,block that are the smallest read object in the InfluxDB, and each read operation reads a Block. Each Block is divided into CRC32 value and data two parts, CRC32 value is used to verify the content of data whether there is a problem. The length of Data is recorded in the following Index section.

    The content in data is different in the InfluxDB, depending on the type, and the float value is Gorilla float compression, and timestamp because it is an ascending sequence, So it is actually only necessary to record the time offset information when compressing. String type value is compressed using the snappy algorithm.

    Data extracted in the format of a 8-byte timestamp and the following value,value depending on the type, will occupy a different size of space, where the string is indefinite, the data will be stored at the beginning of the length, which is the same format as the WAL file.

    Index

    Index stores the contents of the previous Blocks. The order of index entries is sorted first by the dictionary order of key and then by time. InfluxDB in the query operation, you can quickly locate the location of the block to be queried in TSM file based on the information of Index.

    This diagram shows only a few of them, which are represented by a struct, similar to the following code:

    Type blockindex struct {    mintime     int64    maxtime     Int64    Offset      Int64    Size        UInt32} Type KeyIndex struct {    keylen      uint16    Key         string    Type        byte    Count       UInt32    Blocks      []*blockindex}type Index []*keyindex

    Key Len (2 bytes): The length of the key in the following field.

    Key (N bytes): Here The key refers to the Serieskey + delimiter + fieldName.

    Type (1 bytes): The types of fieldvalue that fieldName corresponds to, that is, the type of data within the block.

    Count (2 bytes): The number of Blocks indexes that follow immediately after.

    The next four sections are the index information of the block, which repeats according to the number in count, and each block index is fixed at 28 bytes, sorted by time.

    Min Time (8 bytes): The minimum timestamp of value in the block.

    Max Time (8 bytes): The maximum timestamp of value in the block.

    Offset (8 bytes): The block's offset in the entire TSM file.

    Size (4 bytes): Block sizes. Depending on the Offset + Size field, you can quickly read the contents of a block.

    Indirect index

    An indirect index exists only in memory and is created to quickly locate a key in the detailed index information, which can be used for binary lookups for fast retrieval.

    Offsets is an array in which the value stored is the position of each key in the Index table, and because the key is fixed to 2 bytes in length, the content of the corresponding key at that location can be found.

    When you specify a key to query, you can search by binary, locate its position in the Index table, and then according to the time of the data to be queried to locate, because the BLOCKINDEX structure in KeyIndex is fixed length, so you can also do a binary search, Locate the content of the Blockindex that contains the data to be queried, and then quickly read the contents of a block from the TSM file based on the offset and block length.

    Footer

    The last 8 bytes of the TSM file hold the offset of the starting position of the index part in the TSM file, facilitating the loading of the index information into memory.

    V. Data Query and index structure

    Because LSM Tree works by converting a large number of random writes into Sequential writes, it greatly improves the performance of data writes while sacrificing some of the read performance. The TSM storage engine was developed based on the LSM Tree, so the situation is similar. You typically design a database with an index file (such as a mainfest file in LevelDB) or Bloom filter to optimize read operations for data structures such as LSM Tree.

    There are two main types of indexes in InfluxDB, which are optimized by index.

    1) meta-Data index

    A database's metadata index is stored by databaseindex the struct, initialized at database startup, loading index data from TSM file under All Shard, obtaining information about all measurement and Series in it, and caching into memory.

    Type Databaseindex struct {    measurements map[string]*measurement//All Measurement Object series under the    database       map[ String]*series//      all Series objects, Serieskey = measurement + tags    name string//database name}

    The most important thing in this structure is the content of all the measurement and Series under the data, and its data structure is as follows:

    Type measurement struct {Name string ' JSON: ' Name,omitempty ' ' fieldnames map[string]struct{}//This measure          All Filednames//In-memory index information in ment//ID and its corresponding series information is primarily designed to save memory in Seriesbytagkeyvalue storage ID Seriesbyid  Map[uint64]*series//lookup table for Series by their ID//based on the double index of TAGK and TAGV, save the sorted seriesid array// This map is used to quickly filter out all the seriesid you want to query based on tags when querying an operation, and then reads the content from the file according to Serieskey and time range Seriesbytagkeyvalue Map[string]map[str Ing]seriesids//Map from tag key to value to sorted set of series IDs//ID of all series in this measurement, sorted by ID ser    Iesids seriesids//sorted list of series IDs in this measurement}type series struct {              Key string//Series Key tags map[string]string//tags ID UInt64 ID measurement *measurement//belongs to Measurement//in which Shard exist shardids map[uint64]bool//Shards That has this series deFined} 
    Meta-data query

    InfluxDB supports a number of special query statements (support regular expression matching), you can query some measurement and tags related data, such as

    Show Measurementsshow tag KEYS from ' measurement_name ' show tag VALUES from ' measurement_name ' with KEY = ' Tag_key '

    For example, we need to query cpu_usage this measurement upload the data of the machine, a possible query statement is:

    SHOW TAG VALUES from ' cpu_usage ' with KEY = ' host '

    First of all, according to measurement can get cpu_usage corresponding measurement object in databaseindex.measurements.

    The map object with TAGV as the key for Tagk=host is obtained by Measurement.seriesbytagkeyvalue.

    Traversing this map object, all the keys are the data we need to get.

    The location of common data query

    For normal data query statements, you can quickly navigate to all serieskey,fieldname and time ranges contained in the data you are querying by using the metadata index above.

    For example, suppose the query statement is to get data for the last hour of the cpu_usage indicator on this machine SERVER01:

    ' SELECT value from ' cpu_usage ' WHERE host= ' Server01 ' and Time > Now ()-1h '

    The measurement object corresponding to Cpu_usage is obtained from databaseindex.measurements based on Measurement=cpu_usage.

    The ID value of all matching series is then obtained by databaseindex.measurements["Cpu_usage"].seriesbytagkeyvalue["host" ["Server01"] Measurement.seriesbyid This map object obtains their actual objects based on the series ID.

    Note that although the HOST=SERVER01 is only specified here, it does not mean that there is only one series under Cpu_usage, there may be other tags such as user=1 and user=2, so that the series ID obtained is actually two, and the data needs to get all The data under the series.

    Shardids in the series structure this map variable holds the data for which the series exists in the Shard. And measurement.fieldnames This map can help filter out the situation fieldName not exist.

    At this point, in the time Complexity of O (1), we get all the required series keys, the Shardid of these series keys, the time range to query the data, then we can create a data iterator to get each series ke from different shard Y the data within the specified time range. Subsequent queries are related to the in-memory cache of Index in TSM file.

    2) TSM File Index

    The index portion of the TSM file above will be indexed indirectly in memory, allowing for fast retrieval purposes. Here's a look at the specific data structure:

    Type indirectindex struct {    b []byte                ///down-level verbose index byte stream    offsets []int32         //offset array, which records the offset of a key in B    Minke Y, Maxkey string       mintime, maxtime Int64  //The minimum and maximum time in this file, based on this can quickly determine whether the data to be queried exists in this file, whether it is necessary to read this file    Tombstones Map[string][]timerange   //is used to record which key data within the specified range has been deleted}

    b directly corresponds to the index part of the TSM file, and by binary lookup of the offsets, the index information of all blocks of the specified key can be obtained, and then the offset and size information can be used to fetch all the data from a specified block.

    Type indexentries struct {    type    byte     entries []indexentry}type indexentry struct {    //Poin in a block T all within this minimum and maximum time range    Mintime, maxtime Int64    //block offset Int64//block in TSM file    size size    UInt32}

    As explained in the previous section, the metadata index allows you to obtain all of the required series keys, their corresponding shardid, and the time range. With the TSM file index, we can quickly navigate to the location of the data in the TSM file based on the series key and time range.

    Reading data from TSM file

    All data read operations in the InfluxDB are done through Iterator.

    Iterator is an abstract concept and supports nesting, and a Iterator can fetch and process data from other Iterator in the underlying, and then pass the results to the upper Iterator.

    This part of the code logic is more complex, this does not expand the description. In fact, the main Iterator is the cursor to get the data.

    Type cursor Interface {    next () (t int64, v interface{})}type Floatcursor interface {    cursor    nextfloat () (t int  V float64)}//The underlying is keycursor, each time reading a block of data type floatascendingcursor struct {    //Memory Value object    cache struct {        values values        pos    int    }    tsm struct {        tdec      timedecoder   //Time-serialized object        Vdec      Floatdecoder  //Value serialized object        buf       []floatvalue        values    []floatvalue//  from TSM The Floatvalue cache        pos       int        keycursor *keycursor    }} that is read in the file

    The cursor provides a next () method to get a value. Each data type has a cursor implementation of its own.

    The underlying implementation is that keycursor,keycursor caches the data for each block, returning it sequentially through the next () function, and then reading the contents of the next block through the Readblock () function when the contents of a block are read.

    For more Influxdb detailed tutorials See: INFLUXDB Series Learning Tutorials Catalogue

    INFLUXDB Technology Group: 580487672 (click to join)

  • Reference: http://blog.fatedier.com/2016/08/05/detailed-in-influxdb-tsm-storage-engine-one/

    http://blog.fatedier.com/2016/08/15/detailed-in-influxdb-tsm-storage-engine-two/

    http://blog.fatedier.com/2016/07/05/research-of-time-series-database-influxdb/

Influxdb Principle Detailed

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.