---------------original content, reproduced please indicate the source. <[email protected]>------------
I. Overview
RRDtool (Round-robin Database tool), which is the polling-based data base (note: Not equivalent to the polling scheduling algorithm in the computer), uses a fixed-size space to store the data, sets a pointer, moves with the data read and write, and points to the location of the last update.
In many materials, the database space for storing data is regarded as a circle, which has no starting point and no end point, so it can achieve continuous reading and writing of data with a fixed size space. This space is the file created with the ". Rrd" suffix after the "rrdtool create" command.
RRDtool is unique in that it can not only play the role of the background to store data, but also provides us with a wealth of tools to create beautiful statistical charts, as well as the role of the front-end.
Installation:
Linux can typically be installed directly using Yum or Apt-get.
$ sudo yum install RRDtool (CentOS)
$ sudo apt-get install RRDtool (Ubuntu)
You can also go to the official download of each operating system (including Windows) installation package, or download the source code to compile their own.
Http://oss.oetiker.ch/rrdtool/download.en.html
Ii. Data Objects
RRDtool is designed to deal with time-series data, such as network bandwidth, temperature, CPU load, and so on, and the associated data or metrics. Many excellent monitoring systems use RRDtool as a DBMS or as a drawing tool, such as Ganglia, MRTG, Xymon, Zenoss, Open-falcon, and so on. Visible RRDtool in the processing of monitoring indicators unique advantages.
three, RRDtool storage and archiving principle
The way RRDtool stores data differs greatly from the common relational database:
- Each rrd file size of the RRDtool is fixed once it is created, while the file size of the traditional relational database grows with the data being written.
- RRDtool the data is received while the computation is merged and the computed results are stored
- RRDtool requires that the data be updated periodically, and the value of "exception" in the database is replaced with unknown.
In order to achieve the timing of the update, RRDtool a time gap (interval), after interval will be updated database, such as the above image of the interval is 1min. This interval is named step in RRDtool, and as an option that must be used to specify the interval used by the RRD database when creating an RRD file using the Create command, this value is not allowed to be changed. Such as:
..... (Note: This is just a demo, not a valid full create command)
The first line of the above command is easier to understand (note: The actual command is not wrapped, it is easy to demonstrate the use of "\" to separate), below to explain the second line of the meaning of each parameter,
(It is recommended to check the full command help against man page or official document Http://oss.oetiker.ch/rrdtool/doc/rrdcreate.en.html):
DS (Data Source): Defines the source of the DataSource, in other words, the indicator we want to detect, which uses cpu_load (CPU load detection) as the name of the DS. Gauge is the type of the data source (Dst:data source type)
Common types of data sources:
1、COUNTER 数据必须是递增的,保存的是相对于前面的一个值
2、GUAGE 保存原值
3、DERIVE 可增可减
4、ABSOLUTE 相对于初始值的数值(也就是参考点唯一)
5、COMPUTE 对于COMPUTE数据源来说格式是DS:ds-name:COMPUTE:rpn-expression
For ease of understanding, it can be thought of as a column defined when we created MySQL (which is, of course, very inaccurate):
Skip the heartbeat parameter first, and look at the latter two parameters. Min Max, as the name implies, is used to specify a reasonable storage range for the data values, and I set it to 0~100 for CPU load. For values with indeterminate bounds, you can use U instead. That is, if the received data is within this range, it is considered to be good value, which is received and inserted by the database. Otherwise it would be considered a bad value, as unknown inserted.
Finally, see heartbeat, which corresponds to 120 of the above example commands. This is actually a RRDtool data collection strategy, mentioned earlier RRDtool each step long time gap, will be a database update, and the value of the refresh into the database, we call this value (Pdp:primary Data point). But this will create a problem----if I do not collect the data in the specified gap, the database will not collect the value, how to deal with it? RRDtool this kind of "bad value" unified as unknown into the database, think about it a little bit will know as unknown, when the statistics will be more accurate, because you can directly remove these unknown, rather than indiscriminately will all the "0" Reject all (for some databases that handle data bad values as 0).
is unknown's treatment foolproof? The answer is obviously no, and it is difficult to ensure that data is collected at an accurate point in time when data is actually collected. So RRDtool proposes a heartbeat concept that represents a time span, which means that, as long as the database update collects data within this time span, the value can be inserted, otherwise the value is inserted as unknown. That is, in the example command above, as long as the data is collected within two minutes, it is updated into the database as a PDP. If the time specified by heartbeat is exceeded, the database is inserted as unknown.
Now enter this incomplete command in the shell to see what kind of error message the shell will return to us.
Error, you must define at least one RRA (Round Robin Archive). So what is RRA? Prior to the introduction of the RRD special storage, I was learning to think: continuous reading and writing data, is how to ensure the integrity of the data? Once the data has been polled, it will inevitably result in new data overwriting the old data, so how do you query the data that was previously collected but later overwritten? ----Obviously, RRA is the archiving strategy that provides such a feature, which is the core of RRD and can be understood as a view in a relational database. That is, how we look at the data and how to graph the collected data is defined by the RRA ("View"). But on the other hand, this should be a rrdtool flaw, the above error message tells us that at the time of creation, we must define at least one rra, that is, when we create the RRD database, we must specify in advance how to query processing of this data in the future.
The full RRDtool create command:
When you view the current folder, you will see an rrd file, Rrd_intro.rrd:
Here are the commands for the RRA section:
RRA Official document definition: An archive consists of a number of data values or statistics for each of the defined data-sources (DS).
That is, after the RRA is created, the RRA is shared by the DS and does not require a DS to be specified. At the same time, this archive contains a series of data and statistics, how is this information obtained?
---archive data according to certain rules through the CF (Consolidation Function) Consolidation method.
合并方法分以下几种:
1、AVERAGE 平均值
2、MAX 最大值
3、MIN 最小值
4、LAST 当前值
There are three CF parameter items, first introduced steps and rows, corresponding to the last two parameters of the above command.
The meaning of the PDP has been mentioned before, in accordance with the heartbeat strategy to collect a value or unknown, inserted into the database, as a PDP.
Steps, as the name implies is the definition of a few steps, such as the above command of the three RRA respectively defined 1, 5, 15 step, also corresponding to 1, 5, 15 PDP, converted into time is 1, 5, 15 minutes. This is the CF parameter, which shows that these values are taken as parameters of the CF merge method, that is, the three rra are averaged using 1, 5, and 15 PDP respectively. The results are stored as data value/statistic in the official documentation as RRA files, which we use as final access and drawing. RRDtool defines this data as CDP (consolidated data point), which is consolidated.
Rows, as I said earlier, should be better understood. I use DS as the column in the Relational data table, the PDP as the data in the DS, the CDP as the archived data in the RRA "view" to show to the finder. So each row in the final view represents a CDP. 60, 288, and 672 CDP were counted against the above command. The conversion time (60s * steps * rows,/3600 h,/(3600*24) day, respectively) is 1 hours, 1 days, 1 weeks.
What do I mean by designing RRA? Here we introduce a concept resolution in advance (drawing is directly related, which is used when we introduce graph later):
Corresponds to the above command:
1 H–1min resolution
1 Day–5min resolution
1 Week–15min resolution
That is, I count the average CPU load per minute, showing me the total statistics for 1 hours in the view. The other two analogies can be.
I am also here to correspond to a common command of Linux uptime. Perhaps the final resolution has no practical reference value, but at least you should understand how to set up an archiving strategy. The general use of reverse thinking, that is, the first to determine the total time required for statistics, and then according to step and steps to determine rows.
At the end of this xff, this value is a ratio, if the CF archive in the PDP used in the unknown value of the total amount exceeded this value, indicating that the bad value in this statistic is too large, RRDtool the merged CDP as unknown, archived to RRA. Generally use 0.5 can.
Paste a graph to indicate the relationship between PDP, CDP, RRA
RRDtool Introductory explanation