Summary of basic concepts of INFLUXDB
Influxdb as a time series database, compared with the traditional relational database, there are some differences, the following as far as possible in a simple and concise way to introduce the relevant terminology concepts
I. Basic CONCEPTS
MySQL |
Influxdb |
Description |
Database |
Database |
Database |
Table |
Measurement |
The concept of a table similar to MySQL |
Record |
Tag + field + Timestamp |
A row of data in a traditional table, mapped to a influxdb, can be divided into three |
1. Database
database, compared to MySQL database, there is not much ambiguity
2. Measurement
Compared to the table in MySQL, from the actual experience, the most obvious difference between the two is that there is no separate way to create measurement, directly add a data, if measurement does not exist, then directly create and insert a piece of data
3. Point
This contrasts with the record in MySQL, in Influxdb, which represents a point in each table, a moment, filed data that satisfies a certain condition (in simple terms, timestamp + tag + filed).
- Timestamp: timestamp, NS unit, each record must have this attribute, no display added, default to a
- Tag: tag, KV structure, in database, tag + measurement build index together
- Participates in index creation, so it is suitable as a filter for queries
- Tag does not have too much data, it is better to have typical discrimination (similar to MySQL's indexing principle)
- Value is of type string
- Tag is optional, the measurement does not set the tag is OK
- Field: Storing data, KV structure
- Data type: Long, String, Boolean, float
4. Series
Unique combination of Series:tag key and tag value
II. Example Analysis
The above several are basic concepts, the individual does not look impressive enough, the following examples are illustrated below:
Establish a measurement, save the performance status of an application, include the following metrics, write data to influxdb every second
- Service machine: host=127.0.0.1
- Service Interface: Service=app.service.index
- qps:qps=1340
- rt:1313
- cpu:45.23
- mem:4154m
- load:1.21
1. Measurement Create
There are 7 indicator parameters, the first step is to distinguish between the tag and field, before the tag will be built index, recommended for the type can be distinguished, the value can be estimated by the field, so the above is the following distinction
Tag
Field
An actual insert data such as
> insert myapp,host=127.0.0.1,service=app.service.index qps=1340,rt=1313,cpu=45.23,mem="4145m",load=1.21> select * from myappname: myapptime cpu host load mem qps rt service---- --- ---- ---- --- --- -- -------1532597158613778583 45.23 127.0.0.1 1.21 4145m 1340 1313 app.service.index
A. Summary notes
- In the Insert execution statement, the tag and tag, field and field are used to split between the tag and the field with a blank space.
- Tag value is, String type, no double quotes required
- field String type data, need to be placed in double quotation marks, otherwise it will be an error
- If you need to display the add timestamp, add a space after filed, and then add a timestamp
B. Is it possible to have no field
Not measured, output is as follows
> insert myabb,host=123,service=indexERR: {"error":"unable to parse ‘myabb,host=123,service=index ‘: invalid field format"}
Can I have no tag
According to the previous instructions have been measured, you can
> insert myabb qps=123,rt=1231> select * from myabbname: myabbtime qps rt---- --- --1532597385053030634 123 1231
2. Data analysis
Insert a few new data, the current data is
> select * from myappname: myapptime cpu host load mem qps rt service---- --- ---- ---- --- --- -- -------1532597158613778583 45.23 127.0.0.1 1.21 4145m 1340 1313 app.service.index1532597501578551929 45.23 127.0.0.1 1.21 4145m 1341 1312 app.service.index1532597510225918132 45.23 127.0.0.1 1.21 4145m 1341 1312 app.service.about1532597552421996033 45.23 127.0.0.2 1.21 4145m 1341 1312 app.service.about
A. Series
How many series does the above four data correspond to?
According to the previous statement, Tagkey + Tagvalue determines to a series (actually measurement + retention policy + tags to determine), so the above table has a total of three series
127.0.0.1 | app.service.index
127.0.0.1 | app.service.about
127.0.0.2 | app.service.about
So what exactly is this series?
What can we do if we display the above data in a graphical way?
- First we identify the application and its service name, and then look at the service performance on this machine, on the timeline
- The translation comes from the Cpu/service as a search condition, with time as the timeline, the value (CPU,LOAD,MEM,QPS,RT) mapped to a two-dimensional coordinate as a point, and then all points are concatenated into lines, resulting in a continuous graph
So the series is the search condition above, and the concept of point is easy to understand.
Iii. Retention Policies
The first is the underlying concept of table data, and here is the strategy for saving data retention policy, which determines how long data is stored (meaning data can be deleted), how many backups are saved, how the cluster is processed, etc.
1. Basic instructions
Influxdb for Big Data time series database, so the amount of data can be very large, if all storage, the estimated cost of hard disk is not small, and some data may not need permanent storage, so there is this rentention policy
The INFLUXDB itself does not provide data deletion, so the way to control the amount of data is to define a data retention policy.
The purpose of defining a data retention policy is therefore to allow influxdb to know what data can be discarded, thus processing the data more efficiently.
2. Basic operation A. Query Policy
> show retention policies on hh_testname duration shardGroupDuration replicaN default---- -------- ------------------ -------- -------autogen 0s 168h0m0s 1 true
- Name: Names
- Duration: Retention time, 0 means permanent save
- Shardgroupduration:shardgroup storage time, Shardgroup is a basic storage structure of influxdb, should be greater than this time of the data in the query efficiency should be decreased.
- Replican: Full name is replication, number of copies
- Default: Whether it is the defaults policy
B. New policy
> create retention policy "2_hour" on hh_test duration 2h replication 1 default> show retention policies on hh_testname duration shardGroupDuration replicaN default---- -------- ------------------ -------- -------autogen 0s 168h0m0s 1 false2_hour 2h0m0s 1h0m0s 1 true
C. Modifying policies
> alter retention policy "2_hour" on hh_test duration 4h default> show retention policies on hh_testname duration shardGroupDuration replicaN default---- -------- ------------------ -------- -------autogen 0s 168h0m0s 1 false2_hour 4h0m0s 1h0m0s 1 true
D. Deleting a policy
> drop retention policy "2_hour" on hh_test> show retention policies on hh_testname duration shardGroupDuration replicaN default---- -------- ------------------ -------- -------autogen 0s 168h0m0s 1 false
After you delete the default policy, there is no default policy, is there a problem?
3. RP Understanding
After setting this policy, the expired data is automatically deleted, so how do you save the data?
For example, the default persistent save policy, there is a shardGroupDuration
parameter, for 7 days, that is, 7 days of data in a shard, after the new add a shard
The Shard contains the actual encoded and compressed data and is represented by a TSM file on disk. Each shard belongs to the only one shard group. Multiple shard may exist in a single shard group. Each shard contains a specific set of series. All points on a given series in a given shard group are stored in the same shard (TSM file) on disk.
Iv. other 1. A grey and grey blog:https://liuyueyi.github.io/hexblog
A gray and gray personal blog, recording all the study and work in the blog, welcome everyone to visit
2. Disclaimer
The letter is not as good as, has been on the content, purely opinion, because of limited personal ability, inevitably there are omissions and errors, such as the detection of bugs or better suggestions, welcome criticism, please feel grateful
- Weibo address: small Gray Gray Blog
- QQ: A grey/grey/3302797840
3. Scan for attention
Small grey ash blog& public number
Knowledge Planet
Summary of basic concepts of 180726-INFLUXDB