INFLUXDB Basic Concepts
1. Data format
In InfluxDB, we can take a rough look at the data that will be deposited as a virtual key and its corresponding value (field value). The format is as follows:
1 |
cpu_usage,host = server01,region = us - west value = 0.64 1434055562000000000 |
The virtual key consists of the following parts: database, retention policy, measurement, tag sets, field name, timestamp.
- Database name, you can create multiple databases in InfluxDB, data files in different databases are isolated and stored in different directories on disk.
- Retention Policy: storage policies that set the time for data retention, and each database is initially automatically created with a default storage policy of Autogen, with data retention times permanent, after which users can set their own, such as keeping data for the last 2 hours. If you do not specify a storage policy when inserting and querying data, the default storage policy is used, and the default storage policy can be modified. InfluxDB will periodically purge expired data.
- Measurement: A measurement indicator name, such as Cpu_usage, indicates CPU usage.
- Tag sets:tags is sorted by dictionary order in InfluxDB, whether TAGK or TAGV, as long as the inconsistency belongs to two keys, such as Host=server01,region=us-west and Host=server02,regi On=us-west is a two different tag set.
- tag--tags, in influxdb, tag is a very important part, table name +tag together as the index of the database, is the form of "Key-value".
- Field Name: For example, the value in the above data is that FIELDNAME,INFLUXDB supports inserting multiple fieldName in a single piece of data, which is actually a syntactic optimization that is stored as multiple data in the actual underlying storage.
- Timestamp: Each piece of data needs to specify a timestamp that is treated specifically in the TSM storage engine in order to optimize subsequent query operations.
2. Compare with the nouns in the traditional database
Nouns in the influxdb |
Concepts in a traditional database |
Database |
Database |
Measurement |
Tables in the database |
Points |
A row of data inside the table |
3. Point
A point consists of a timestamp (time), a data (field), a label (tags).
Point corresponds to a row of data in a traditional database, as shown in the following table:
Point Property |
Concepts in a traditional database |
Time |
Each data record time, which is the primary index in the database (automatically generated) |
Fields |
Various record values (attributes with no indexes) |
Tags |
A variety of indexed properties |
4. Series
Series is equivalent to a collection of some data in the InfluxDB, in the same database, retention policy, measurement, tag sets identical data belong to a series, the same series of data in the physical are stored together in chronological order.
5, Shard
Shard is a relatively important concept in InfluxDB, and it is associated with retention policy. There are many shard under each storage policy, each shard stores data for a specified period of time, and does not repeat, for example, 7 o'clock-8 points of data fall into SHARD0, 8 points-9 points of data fall into shard1. Each shard corresponds to a lower-level TSM storage engine with a separate cache, Wal, and TSM file.
6. Components
The TSM storage engine consists mainly of several parts: cache, Wal, TSM file, compactor.
1) Cache:cache is equivalent to memtabl in the LSM Tree. When inserting data, you are actually writing data to both the cache and the Wal, and you can assume that the cache is a cached data in the Wal file in memory. When InfluxDB starts, it iterates through all the Wal files and reconstructs the cache so that it does not cause data loss even if the system fails.
The data in the cache is not infinitely growing, and there is a maxSize parameter that controls how much memory is consumed by the data in the cache and writes the data to the TSM file. If not configured, the default upper limit is 25MB, each time the cache data reaches the threshold, the current cache will be a snapshot, then empty the contents of the current cache, and then create a new Wal file for writing, the remaining Wal files will eventually be deleted, the data in the snapshot will go through the row Write to a new TSM file.
2) The content of the Wal:wal file is the same as the cache in memory, which is intended to persist data, and when the system crashes, the data in the TSM file is not yet written to by the WAL file recovery.
3) TSM file: The maximum size of a single TSM file is 2GB for storing data.
4) The Compactor:compactor component runs continuously in the background, checking every 1 seconds for the need to compress the merged data.
There are two main operations, one is to take a snapshot after the data size in the cache reaches the threshold, and then dump it into a new TSM file.
The other is to merge the current TSM file, merging multiple small TSM files into one, so that each file reaches the maximum size of a single file, reduces the number of files, and some data deletion is done at this time.
7. Directory and file structure
The InfluxDB data store has three main directories. By default, it is Meta, Wal, and data three directories.
Meta is used to store some metadata for the database, and there is a file under the Meta directory meta.db
.
The Wal directory holds the pre-written log file to the .wal
end.
The data directory holds the actual stored file to the .tsm
end.
In the above diagram, _internal is the database name, monitor is the storage policy name, and the number named directory in the next level directory is the ID value of Shard.
The storage policy has two shard,id of 1 and 2,shard stores data for a certain time period. The next level of the directory is the specific files, respectively, .wal
and .tsm
the end of the file.
INFLUXDB Basic Operation
INFLUXDB offers a variety of operating methods:
1) Client command line mode
2) HTTP API interface
3) API libraries for each language
4) Web-based management page operations
Client command-line operation
Go to command line
1 |
influx - precision rfc3339 |
1. Influxdb Database operation
1 |
create database shhnwangjian |
1 |
drop database shhnwangjian |
- Using the specified database
2. INFLUXDB Data Sheet operation
In Influxdb, there is no concept of tables (table), instead the Measurements,measurements function is consistent with tables in traditional databases, so we can also refer to measurements as tables in Influxdb.
There are no explicit statements for new tables in Influxdb, only new tables can be created by insert data.
1 |
insert disk_free,hostname = server01 value = 442221834240i |
Where Disk_free is the table name, hostname is the index (tag), value=xx is the record value (field), the record value can have multiple, the system comes with an append timestamp
Or when you add data, write your own timestamp
1 |
insert disk_free,hostname = server01 value = 442221834240i 1435362189575692182 |
1 |
drop measurement disk_free |
3. Data retention strategy (Retention policies)
INFLUXDB does not provide a way to directly delete data records, but provides a data retention policy that is used primarily to specify the time to retain data, and to delete this part of the data over a specified time.
- View current database retention policies
1 |
show retention policies on "db_name" |
- Create a new retention policies
1 |
create retention policy "rp_name" on "db_name" duration 3w replication 1 default |
Rp_name: Policy name;
Db_name: the specific database name;
3w: Save 3 weeks, 3 weeks before the data will be deleted, influxdb with various event parameters, such as: H (Hours), D (Days), W (week);
Replication 1: The number of replicas, generally 1 on it;
Default: Set as Defaults policy
- Modify retention Policies
1 |
alter retention policy "rp_name" on "db_name" duration 30d default |
- Delete retention Policies
1 |
drop retention policy "rp_name" on "db_name" |
4, continuous query (continuous Queries)
Influxdb continuous Query is a set of statements that are automatically timed to start in the database, and the statements must contain SELECT
keywords and GROUP BY time()
keywords.
INFLUXDB will place the query results in the specified data table.
Objective: the use of continuous query is the best way to reduce the sampling rate, continuous query and storage strategy collocation will greatly reduce the amount of influxdb system. And after using continuous query, the data will be stored in the specified data table, which will be convenient for statistical data with different accuracy.
12345 |
CREATE CONTINUOUS QUERY <cq_name> ON <database_name> [RESAMPLE [EVERY <interval>] [FOR <interval>]] BEGIN SELECT <function>(<stuff>)[,<function>(<stuff>)] INTO <different_measurement> FROM <current_measurement> [WHERE <stuff>] GROUP BY time(<interval>)[,<stuff>] END |
Examples:
1 |
CREATE CONTINUOUS QUERY wj_30m ON shhnwangjian BEGIN SELECT mean(connected_clients), MEDIAN(connected_clients), MAX (connected_clients), MIN (connected_clients) INTO redis_clients_30m FROM redis_clients GROUP BY ip,port,time( 30m ) END |
A new continuous query named wj_30m is created in the Shhnwangjian library, with a connected_clients field averaging, median, maximum, and minimum values for every 30 minutes redis_clients_30m table. The data retention policy used is default.
Different database examples:
1 |
CREATE CONTINUOUS QUERY wj_30m ON shhnwangjian_30 BEGIN SELECT mean(connected_clients), MEDIAN(connected_clients), MAX (connected_clients), MIN (connected_clients) INTO shhnwangjian_30.autogen.redis_clients_30m FROM shhnwangjian.autogen.redis_clients GROUP BY ip,port,time( 30m ) END |
- Show all contiguous queries that already exist
1 |
SHOW CONTINUOUS QUERIES |
- Delete Continuous Queries
1 |
DROP CONTINUOUS QUERY <cq_name> ON <database_name> |
Reference article:
http://blog.fatedier.com/2016/08/05/detailed-in-influxdb-tsm-storage-engine-one/
Http://www.linuxdaxue.com/noun-interpretation-of-influxdb.html
INFLUXDB Basic concepts and operations