INFLUXDB Basic concepts and operations

Source: Internet
Author: User
Tags cpu usage influxdb

INFLUXDB Basic Concepts

1. Data format

In InfluxDB, we can take a rough look at the data that will be deposited as a virtual key and its corresponding value (field value). The format is as follows:

1 cpu_usage,host=server01,region=us-west value=0.641434055562000000000

The virtual key consists of the following parts: database, retention policy, measurement, tag sets, field name, timestamp.

    • Database name, you can create multiple databases in InfluxDB, data files in different databases are isolated and stored in different directories on disk.
    • Retention Policy: storage policies that set the time for data retention, and each database is initially automatically created with a default storage policy of Autogen, with data retention times permanent, after which users can set their own, such as keeping data for the last 2 hours. If you do not specify a storage policy when inserting and querying data, the default storage policy is used, and the default storage policy can be modified. InfluxDB will periodically purge expired data.
    • Measurement: A measurement indicator name, such as Cpu_usage, indicates CPU usage.
    • Tag sets:tags is sorted by dictionary order in InfluxDB, whether TAGK or TAGV, as long as the inconsistency belongs to two keys, such as Host=server01,region=us-west and Host=server02,regi On=us-west is a two different tag set.
    • tag--tags, in influxdb, tag is a very important part, table name +tag together as the index of the database, is the form of "Key-value".
    • Field Name: For example, the value in the above data is that FIELDNAME,INFLUXDB supports inserting multiple fieldName in a single piece of data, which is actually a syntactic optimization that is stored as multiple data in the actual underlying storage.
    • Timestamp: Each piece of data needs to specify a timestamp that is treated specifically in the TSM storage engine in order to optimize subsequent query operations.

2. Compare with the nouns in the traditional database

Nouns in the influxdb Concepts in a traditional database
Database Database
Measurement Tables in the database
Points A row of data inside the table

3. Point

A point consists of a timestamp (time), a data (field), a label (tags).

Point corresponds to a row of data in a traditional database, as shown in the following table:

Point Property Concepts in a traditional database
Time Each data record time, which is the primary index in the database (automatically generated)
Fields Various record values (attributes with no indexes)
Tags A variety of indexed properties

4. Series

Series is equivalent to a collection of some data in the InfluxDB, in the same database, retention policy, measurement, tag sets identical data belong to a series, the same series of data in the physical are stored together in chronological order.

5, Shard

Shard is a relatively important concept in InfluxDB, and it is associated with retention policy. There are many shard under each storage policy, each shard stores data for a specified period of time, and does not repeat, for example, 7 o'clock-8 points of data fall into SHARD0, 8 points-9 points of data fall into shard1. Each shard corresponds to a lower-level TSM storage engine with a separate cache, Wal, and TSM file.

6. Components

The TSM storage engine consists mainly of several parts: cache, Wal, TSM file, compactor.

1) Cache:cache is equivalent to memtabl in the LSM Tree. When inserting data, you are actually writing data to both the cache and the Wal, and you can assume that the cache is a cached data in the Wal file in memory. When InfluxDB starts, it iterates through all the Wal files and reconstructs the cache so that it does not cause data loss even if the system fails.

The data in the cache is not infinitely growing, and there is a maxSize parameter that controls how much memory is consumed by the data in the cache and writes the data to the TSM file. If not configured, the default upper limit is 25MB, each time the cache data reaches the threshold, the current cache will be a snapshot, then empty the contents of the current cache, and then create a new Wal file for writing, the remaining Wal files will eventually be deleted, the data in the snapshot will go through the row Write to a new TSM file.

2) The content of the Wal:wal file is the same as the cache in memory, which is intended to persist data, and when the system crashes, the data in the TSM file is not yet written to by the WAL file recovery.

3) TSM file: The maximum size of a single TSM file is 2GB for storing data.

4) The Compactor:compactor component runs continuously in the background, checking every 1 seconds for the need to compress the merged data.

There are two main operations, one is to take a snapshot after the data size in the cache reaches the threshold, and then dump it into a new TSM file.

The other is to merge the current TSM file, merging multiple small TSM files into one, so that each file reaches the maximum size of a single file, reduces the number of files, and some data deletion is done at this time.

7. Directory and file structure

The InfluxDB data store has three main directories. By default, it is Meta, Wal, and data three directories.

Meta is used to store some metadata for the database, and there is a file under the Meta directory meta.db .

The Wal directory holds the pre-written log file to the .wal end.

The data directory holds the actual stored file to the .tsm end.

In the above diagram, _internal is the database name, monitor is the storage policy name, and the number named directory in the next level directory is the ID value of Shard.

The storage policy has two shard,id of 1 and 2,shard stores data for a certain time period. The next level of the directory is the specific files, respectively, .wal and .tsm the end of the file.

INFLUXDB Basic Operation

INFLUXDB offers a variety of operating methods:

1) Client command line mode

2) HTTP API interface

3) API libraries for each language

4) Web-based management page operations

Client command-line operation

Go to command line

1 influx -precision rfc3339

1. Influxdb Database operation

    • Display Database
1 show databases

    • New Database
1 create database shhnwangjian

    • Deleting a database
1 drop database shhnwangjian

    • Using the specified database
1 use shhnwangjian

2. INFLUXDB Data Sheet operation

In Influxdb, there is no concept of tables (table), instead the Measurements,measurements function is consistent with tables in traditional databases, so we can also refer to measurements as tables in Influxdb.

    • Show All Tables
1 SHOW MEASUREMENTS
    • New Table

There are no explicit statements for new tables in Influxdb, only new tables can be created by insert data.

1 insert disk_free,hostname=server01 value=442221834240i

Where Disk_free is the table name, hostname is the index (tag), value=xx is the record value (field), the record value can have multiple, the system comes with an append timestamp

Or when you add data, write your own timestamp

1 insert disk_free,hostname=server01 value=442221834240i1435362189575692182

    • Delete a table
1 drop measurement disk_free

3. Data retention strategy (Retention policies)

INFLUXDB does not provide a way to directly delete data records, but provides a data retention policy that is used primarily to specify the time to retain data, and to delete this part of the data over a specified time.

    • View current database retention policies
1 show retention policies on "db_name"

    • Create a new retention policies
1 create retention policy "rp_name"on "db_name" duration 3w replication 1default

Rp_name: Policy name;

Db_name: the specific database name;

3w: Save 3 weeks, 3 weeks before the data will be deleted, influxdb with various event parameters, such as: H (Hours), D (Days), W (week);

Replication 1: The number of replicas, generally 1 on it;

Default: Set as Defaults policy

    • Modify retention Policies
1 alter retention policy "rp_name"on "db_name" duration 30ddefault
    • Delete retention Policies
1 drop retention policy "rp_name"on "db_name"

4, continuous query (continuous Queries)

Influxdb continuous Query is a set of statements that are automatically timed to start in the database, and the statements must contain SELECT  keywords and  GROUP BY time()  keywords.

INFLUXDB will place the query results in the specified data table.

Objective: the use of continuous query is the best way to reduce the sampling rate, continuous query and storage strategy collocation will greatly reduce the amount of influxdb system. And after using continuous query, the data will be stored in the specified data table, which will be convenient for statistical data with different accuracy.

    • New Continuous query
12345 CREATE CONTINUOUS QUERY <cq_name> ON <database_name>[RESAMPLE [EVERY <interval>] [FOR <interval>]]BEGIN SELECT <function>(<stuff>)[,<function>(<stuff>)] INTO <different_measurement>FROM <current_measurement> [WHERE <stuff>] GROUP BY time(<interval>)[,<stuff>]END

Examples:

1 CREATE CONTINUOUS QUERY wj_30m ON shhnwangjian BEGIN SELECT mean(connected_clients), MEDIAN(connected_clients), MAX(connected_clients), MIN(connected_clients) INTO redis_clients_30m FROM redis_clients GROUP BY ip,port,time(30m) END

A new continuous query named wj_30m is created in the Shhnwangjian library, with a connected_clients field averaging, median, maximum, and minimum values for every 30 minutes redis_clients_30m table. The data retention policy used is default.

Different database examples:

1 CREATE CONTINUOUS QUERY wj_30m ON shhnwangjian_30 BEGIN SELECT mean(connected_clients), MEDIAN(connected_clients), MAX(connected_clients), MIN(connected_clients) INTO shhnwangjian_30.autogen.redis_clients_30m FROM shhnwangjian.autogen.redis_clients GROUP BY ip,port,time(30m) END

    • Show all contiguous queries that already exist
1 SHOW CONTINUOUS QUERIES

    • Delete Continuous Queries
1 DROP CONTINUOUS QUERY <cq_name> ON <database_name>

Reference article:

http://blog.fatedier.com/2016/08/05/detailed-in-influxdb-tsm-storage-engine-one/

Http://www.linuxdaxue.com/noun-interpretation-of-influxdb.html

INFLUXDB Basic concepts and operations

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.