This is an era of big data outbreaks. In the face of the torrent of information, the emergence of diversified data, we in the acquisition, storage, transmission, understanding, analysis, application, maintenance of big data, there is no doubt that a convenient information exchange channel, in order to quickly, effectively, accurately understand and harness the process. This article will show you how to make the data easy to present through the practice of the Time series Database (InfluxDB) +grafana.
First, InfluxDB
Open source distributed timing, time, and metrics databases, written in the go language, with no external dependencies. Among them, time series database is data format contains timestamp field data, such as a time user's Internet traffic, call details and so on. However, what data does not contain timestamp? Almost all data can be played on the previous timestamp field. One of the more important attributes of time series data is how to query it, including data filtering, calculation, and so on.
It has three main features:
Timing Series: Flexible use of time-dependent functions (e.g., maximum, minimum, sum, etc.);
Measurement (Metrics): Calculation of real-time large amounts of data;
Event: Supports arbitrary event data, in other words, any event data we can do.
Some of the advantages that individuals consider INFLUXDB:
No special dependencies, almost out of the box (e.g. Elasticsearch requires Java)
Self-brought data expiration function;
With the right management, fine to the "table" level;
Native HTTP support with built-in HTTP API
Powerful class SQL syntax, supporting a series of functions such as Min, Max, sum, count, mean, median, etc. to facilitate statistics.
Comes with a management interface (e.g.), plug-in configuration.
INFLUXDB Basic Concepts
1. Compare with the nouns in the traditional database
Nouns in the influxdb |
Concepts in a traditional database |
Database |
Database |
Measurement |
Tables in the database |
Points |
A row of data inside the table |
2. Unique Concepts in INFLUXDB
1) Point
A point consists of a timestamp (time), a data (field), a label (tags).
Point corresponds to a row of data in a traditional database, as shown in the following table:
Point Property |
Concepts in a traditional database |
Time |
Each data record time, which is the primary index in the database (automatically generated) |
Fields |
Various record values (attributes without indexes) are recorded values: temperature, humidity |
Tags |
Various indexed properties: area, altitude |
2) Series
All the data in the database needs to be shown by a chart, and this series represents the data in the table, which can be drawn as a few lines on the chart: the tags are grouped together.
As shown below:
3 , Influxdb related APIs
InfluxDB supports the HTTP API to write data. Using the Curl tool to simulate HTTP requests, in real-world use, you can write requests to code and emulate HTTP requests through other programming languages.
Example: adding data to the Internet_users table via the HTTP API
Curl-v–xpost "/http Localhost:8086/write?db=internet&u=user&p=password"--data-binary "Internet_users, users= Community Internet users, mobile= mobile internet users, users_num=56,
Mobile_num=21 1493571600000000000 "
Description
Db=interne refers to the use of interne database;
--data-binary is followed by inserting data, where:
Internet_users: Table name (measurement)
Tag field: Users and mobile, the values are: cell Internet access and the Internet user
Field key fields: Users_num and Mobile_num, with values of 56 and 21, respectively
Timestamp (timestamp): 1493571600000000000
This inserts a piece of data into the internet_users table of the interne database.
It is important to note that the DB parameter must specify a database name that already exists in the database, the format of the data body in accordance with the INFLUXDB format, the first is the table name, followed by tags, then the field, and finally the timestamp. tags, field, and timestamp are separated by a space.
InfluxDB Data Visualization Tool
InfluxDB is used to store time-based data, such as monitoring data, because the InfluxDB itself provides the HTTP API, so you can use InfluxDB to easily build a monitoring data storage center. For INFLUXDB in the data show, here have to mention the data display weapon-grafana.
Second, Grafana
A purely HTML/JS application that is very powerful and does not have cross-domain access restrictions when accessing influxdb. As long as you have configured the data source for Influxdb, the rest of the work is to configure the chart.
To configure a data source:
Set query criteria:
Presentation data:
Grafana Alarm function
There is no better word than "visualization" can be summed up the nature of operations, I think Grafana is also well aware of the vast majority of operations personnel's sore: How to use visual data to speak? So Grafana after 4.0 version: New alarm function (alerting), according to the official website, Grafana Alarm Way There are many kinds of, common email, slack instant communication, Webhook and so on.
For the current cluster Grafana monitoring interface, mainly includes the cluster host CPU, memory matching Grafana threshold warning function:
Host memory and CPU usage monitoring:
With the rule configuration, the relevant monitoring rules can be configured, including related logic and time span, as well as the supervision and prosecution of police conditions. Currently, only one condition type--query is supported. You can specify query letters, time spans, and aggregate functions. The letter specifies the aggregation function you set in the Metrics tab. The result of query and the aggregate function will be a single value for later judging if the threshold is exceeded.
Once the rule configuration is complete, you can view the report status in the alarm list uniformly:
Third, practice cases
1. Data acquisition Planning
At present, the acquisition data mainly originates from the JMX monitoring of Hadoop, obtains the related cluster, queue and other metrics information and some Oracle log information, writes the INFLUXDB database through the related interface, and independently manages the design from the database layer according to the source and log information. For subsequent maintenance.
2 , INFLUXDB database permissions configuration
Influxdb comes with permission control, respectively:
ADMIN: Owner
READ: readonly (accurate to library and table)
Write: Writes only (accurate to library and table)
All (Read and write): Reading and writing
Given the source data stream, there are currently only three roles to be used, for which the three roles are divided as follows:
ADMIN: Maintenance Personnel
READ: Data presentation and background query (Influnxdb set on Grafana is read-only)
WRITE: External program (insert data to INFLUNXDB)
Configure database permissions to enable the relevant authentication, the operation is as follows:
Vi/etc/influxdb/influxdb.conf
Change the auth-enabled option value under the [HTTP] tab to True
[HTTP]
Enabled = True
bind-address = ": 8086"
Auth-enabled = True
Log-enabled = True
Write-tracing = False
Pprof-enabled = False
Https-enabled = False
Https-certificate = "/etc/ssl/influxdb.pem"
3 , Influnxdb and Grafana high-availability configurations
In order to avoid the use of influnxdb and Grafana services due to host interruption, the application was deployed with 2 virtual machines and 2 virtual machines installed as follows:
Host |
Service |
Localhost-01 |
Influxdb+grafana |
Localhost-02 |
Influxdb+grafana |
At the system level, the following settings are made:
Set two hosts as the primary standby mode, sharing the same domain name http://xxx.xxx.com
Domain name |
Host |
Main Standby mode |
Http://xxx.xxx.com |
Localhost-01 |
Main |
Http://xxx.xxx.com |
Localhost-02 |
Preparation |
The Load Balancer setting is the VIP primary and disaster-tolerant end domain name + port with the LOCALHOST-01~02 master node domain + port mapping. We all know influxdb and Grafana ports such as show:
Service |
Port |
InfluxDB |
8083 |
InfluxDB |
8086 |
InfluxDB |
8088 |
Grafana |
3000 |
So the mapping relationship can be designed like this:
Other ports are set according to this setting, after the load balancing setup, it is necessary to mention Grafana configuration, if you want to achieve high availability of visual display, then the Grafana configuration data source must adopt the domain name + port way:
The data security setting is complete.
4 , Grafana interface configuration
Complete the above environment configuration, according to the relevant needs of the Grafana interface configuration and monitoring configuration, the specific operation can refer to the official website Operation tutorial, here no longer repeat.
HDFs Directory Quota monitoring:
HDFs Space usage Monitoring:
Build Big Data monitoring tool based on Influxdb+grafana--turn