INFLUXDB Basic Concept data format
In InfluxDB, we can take a rough look at the data that will be deposited as a virtual key and its corresponding value (field value). The format is as follows:
cpu_usage,host=server01,region=hn-zhengzhou value=0.64 1434055562000000000
The virtual key consists of the following parts: database, retention policy, measurement, tag sets, field name, timestamp.
- Database name, in Influxdb, you can create multiple database, data files in different databases are isolated and stored in different disk directories.
- Retention Policy: storage policies to set the time for data retention each database is initially automatically created with a default storage policy of Autogen, and the data retention time is permanent, after which the user can set up, for example, to retain data for the last 2 hours. If you do not specify a storage policy when inserting and querying data, the default storage policy is used, and the default storage policy can be modified. InfluxDB periodically clears outdated data
- Measurement: The table in the corresponding relational database, the metric name, such as Cpu_usage, represents the CPU usage.
- Tag sets:tags is sorted by dictionary order in InfluxDB, whether TAGK or TAGV, as long as the inconsistency belongs to two keys, such as Host=server01,region=hn-zhengzhou and Host=server02 , Region=hn-zhengzhou is a two different tag set.
- tag--tags, in influxdb, tag is a very important part, table name +tag together as the index of the database, is the form of "Key-value".
- Field Name: For example, the value in the above data is that FIELDNAME,INFLUXDB supports inserting multiple fieldName in a single piece of data, which is actually a syntactic optimization that is stored as multiple data in the actual underlying storage
- Timestamp: Each piece of data needs to specify a timestamp that is treated specifically in the TSM storage engine in order to optimize subsequent query operations.
Point
Points is equivalent to a row in a relational database, and a point consists of a timestamp (time), Data (field), and tags (tags).
Series
Series is equivalent to a collection of some data in the InfluxDB, in the same database, retention policy, measurement, tag sets identical data belong to a series, the same series of data in the physical are stored together in chronological order.
Shard
Shard is a relatively important concept in InfluxDB, and it is associated with retention policy. There are many shard under each storage policy, each shard stores data for a specified period of time, and does not repeat, for example, 7 o'clock-8 points of data fall into SHARD0, 8 points-9 points of data fall into shard1. Each shard corresponds to a lower-level TSM storage engine with a separate cache, Wal, and TSM file.
Directory and file structure
The InfluxDB data store has three main directories. By default, it is Meta, Wal, and data three directories. Meta is used to store some metadata for the database, and there is a meta.db file under the meta directory. The Wal directory holds a pre-written log file ending with a. Wal. The data directory holds the actual stored file, ending with. Tsm.
INFLUXDB Basic Operation
INFLUXDB offers a variety of operating methods:
- Client command line mode
- HTTP API Interface
- API libraries for each language
- Web-based Administration page operations
The client command line operation enters the command line
influxConnected to http://localhost:8086 version 1.2.4InfluxDB shell version: 1.2.4
Display Database
show databases;
New Database
create database cpu_info;
Using the development database
use cpu_info;
Deleting a database
drop database cpu_info;
In Influxdb, there is no concept of tables (table), instead the Measurements,measurements function is consistent with tables in traditional databases, so we can also refer to measurements as tables in INFLUXDB
Show All Tables
show measurements
New Table
There are no explicit statements for new tables in Influxdb, only new tables can be created by insert data.
insert disk_free,hostname=server01 value=442221834240i
Where Disk_free is the table name, hostname is the index (tag), value=xx is the record value (field), the record value can have multiple, the system comes with an append timestamp. Or when you add data, write your own timestamp
insert disk_free,hostname=server01 value=442221834240i 1435362189575692182
Delete a table
drop measurement disk_free
Data Retention policy (Retention policies)
INFLUXDB does not provide a way to directly delete data records, but provides a data retention policy that is used primarily to specify the time to retain data, and to delete this part of the data over a specified time.
View current database retention policies
show retention policies on cpu_info;name duration shardGroupDuration replicaN default---- -------- ------------------ -------- -------autogen 0s 168h0m0s 1 true
Create a new retention policies
create retention policy "rp_name" on "db_name" duration 3w replication 1 default
- Rp_name: Policy name;
- Db_name: the specific database name;
- 3w: Save 3 weeks, 3 weeks before the data will be deleted, influxdb with various event parameters, such as: H (Hours), D (Days), W (week);
- Replication 1: The number of replicas, generally 1 on it;
- Default: Set as Defaults policy
Modify retention Policies
alter retention policy "rp_name" on "db_name" duration 30d default
Delete retention Policies
drop retention policy "rp_name" on "db_name"
Continuous query (continuous Queries)
Influxdb continuous Query is a set of statements that are automatically timed to start in the database, and the statement must contain the SELECT keyword and the group by Time () keyword. INFLUXDB will place the query results in the specified data table.
The use of continuous query is the best way to reduce the sampling rate, continuous query and storage strategy collocation will greatly reduce the amount of influxdb system. And after using continuous query, the data will be stored in the specified data table, which will be convenient for statistical data with different accuracy.
The new continuous query syntax is as follows:
CREATE CONTINUOUS QUERY <cq_name> ON <database_name>[RESAMPLE [EVERY <interval>] [FOR <interval>]]BEGIN SELECT <function>(<stuff>)[,<function>(<stuff>)] INTO <different_measurement>FROM <current_measurement> [WHERE <stuff>] GROUP BY time(<interval>)[,<stuff>]END
Examples:
CREATE CONTINUOUS QUERY wj_30m ON shhnwangjian BEGIN SELECT mean(connected_clients), MEDIAN(connected_clients), MAX(connected_clients), MIN(connected_clients) INTO redis_clients_30m FROM redis_clients GROUP BY ip,port,time(30m) END
A new continuous query named wj_30m is created in the Shhnwangjian library, with a connected_clients field averaging, median, maximum, and minimum values for every 30 minutes redis_clients_30m table. The data retention policy used is default.
Show all contiguous queries that already exist
SHOW CONTINUOUS QUERIES
Delete Continuous Queries
DROP CONTINUOUS QUERY <cq_name> ON <database_name>
INFLUXDB Concepts and basic operations