Importing data from RRD database into MYSQL

Source: Internet
Author: User
Tags rrd rrdtool translate function python list


Import data from RRD Database to MYSQL. 1. RRD Database and RRDTOOL indicates Round Robin Database. The design concept is to store data in the round-robin mode. After a period (which can be customized), new data will overwrite the original data. Therefore, RRD databases are suitable for storing dynamic data without long-term storage. Because the old data is periodically overwritten, the size of the database will basically be fixed and will not increase over time. RRDTOOL is a free software developed by Tobias Oetiker and uses RRD as the storage format. RRDTOOL provides many tools to operate the RRD database www.2cto.com, including creating, updating, querying, and generating a display chart. RRDTOOL also provides APIs in many languages for ease of operation. Ganglia is a distributed monitoring system that uses RRD databases for data storage and visualization. The Hadoop source code package contains a configuration file related to ganglia. You can modify some parameters and perform some settings on ganglia to monitor the hadoop cluster. Data of each attribute is stored in an RRD database. 2. Importing data into MYSQL may also result in such a situation. You may want to store the data collected by rrdtool for a long time for analysis. The data in the RRD database is constantly updated. Although long-term data can be retained, the accuracy is not enough. For example, the step of an RRD database is 15 seconds. That is to say, every 15 seconds, a new value is stored (such as memory usage) and an old value is overwritten. An RRD database stores 5761 such data (in a day + 15 www.2cto.com seconds), and always stores the data of the last day over time. Then we are constantly calculating the value with a higher step through these values. For example, we can use these 15 s data to calculate the s data (average value ), then we store these values in the step s, but the time range that can be stored is longer, and the same number of rows can be stored for 24 days. Similarly, data of one year can also be stored in units of one day. However, at this time, the precision is only one day, and the old 15 s data has been overwritten. If you want to store all the data, you need to use scripts to import the data at regular intervals. It is very convenient to do this on LINUX. perl, python, lua, and ruby are all good choices, as can shell. Then you can use crond to set it to be executed at a certain time. The following is an example of python code: (Note: python is generally used for reading and writing code. There are many problems. please correct me .) First, initialize the database and the corresponding table: import osimport MySQLdbimport string root = "/var/lib/ganglia/rrds/hap-clu" dirs = OS. listdir (root) map1 = string. maketrans ('. ',' _ ') map2 = string. maketrans ('-', '_') conn = MySQLdb. connect (host = 'localhost', user = 'root', passwd = '000000') cursor = conn. cursor () www.2cto.com for onedir in dirs: dbname = onedir. translate (map1 ). translate (map2) cursor.exe cute ("create database if not exists" + dbname) con N. commit () conn. select_db (dbname) # print onedirname print "DB:" + dbname + ". "files = OS. listdir (root + "/" + onedir) for onefile in files: tablename = onefile [:-4]. translate (map1) if (dbname = "_ SummaryInfo _"): cursor.exe cute ("create table if not exists" + tablename + "(time_id int not null primary key, value varchar (30), num varchar (30) ") else: cursor.exe cute (" create table if not exists "+ tablename +" (time_id Int not null primary key, value varchar (30) ") conn. commit () # print "create table" + tablename print "create database" + dbname + "" cursor. close (); there are many Notes: 1. storage Directory: ganglia contains this directory by default, but it can be modified. Different applications should also be different. The last hop-clu is the name of the cluster www.2cto.com. In this directory, each node occupies a directory, the directory name is generally an IP address, and there is a summary directory. Create a database for each directory (node) and a table for each attribute. 2. MYSQL database and table naming rules do not allow "." and "-". Therefore, the corresponding database name and table name must be converted accordingly. The translate function is used here. 3. It was originally thought that this script only needs to be executed once, but in the actual application process, the number of tables and the number of databases may increase. For example, if a new node is added, you need to create a database for it in time. For some existing nodes, data with some attributes may be detected later. For example, after running for a period of time, the swap statistics are available, and the RRD database is created. I don't know whether this is a configuration problem or a normal situation. But in order to run smoothly, this script also needs to run at the same time every day as the script for inserting data, and before the latter. Script for inserting data: import osimport commandsimport MySQLdbimport stringimport rrdtool # from xml. etree. elementTree import ElementTree www.2cto.com # working directoryroot = "/var/lib/ganglia/rrds/hap-clu" dirs = OS. listdir (root) # mysql table name limitmap1 = string. maketrans ('. ',' _ ') map2 = string. maketrans ('-', '_') conn = MySQLdb. connect (host = 'localhost', user = 'root', passwd = '000000') cursor = conn. cursor () for onedir in dirs: Dbname = onedir. translate (map1 ). translate (map2) conn. select_db (dbname) print "DB:" + dbname + ". "files = OS. listdir (root + "/" + onedir) OS. chdir (root + "/" + onedir) for onefile in files: # it seems that all is AVERAGE tablename = onefile [:-4]. translate (map1) data = rrdtool. fetch (onefile, "AVERAGE") firsttime = data [0] [0] www.2cto.com count = 0 while count <5761: time = firsttime + 15 * count value = data [2] [count] [0] if val Ue = None: count + = 1 continue if dbname = "_ SummaryInfo _": num = data [2] [count] [1] fvalue = [time, str (value), str (num)] try: cursor.exe cute ("insert into" + tablename + "values (% s, % s, % s)", fvalue) except MySQLdb. integrityError: pass else: www.2cto.com fvalue = [time, str (value)] try: cursor.exe cute ("insert into" + tablename + "values (% s, % s )", fvalue) # print "OK" + str (count) limit t MySQLdb. integrityError: pass www .2cto.com count + = 1 conn. commit () print "updating table" + tablenamecursor. close (); Description: 1. python has RRDTOOL modules, and corresponding commands can be directly called through functions in the module, and the result is a Python list or tuples, which is easy to traverse. Another method is to export the rrd database to XML by calling external commands (RRDTOOL has the built-in function). The advantage is that the data in XML is extremely similar and the disadvantage is that it is too cumbersome, the efficiency is not high, but XML must be parsed. 2. count is the number of rows of data stored in RRD. It is set to the default value for convenience. If it is rigorous, you should first use rrdtool info to obtain the desired structure information and obtain this value before calling it. Rrdtool. fetch can be used to obtain all stored values. 3. About commit. I was not familiar with the API at the beginning. I did not add this sentence, and the result data was not imported. After each insert operation for the first time, the result is inserted into www.2cto.com at a very slow speed. It takes almost one day to update the result. It was very fast after it was put in the back. 4. Because the insertion frequency is different from the RRD update frequency, the insertion frequency is higher than the update frequency to ensure data continuity (no loss. There will be a lot of duplicate data. Here we use the primary key (timestamp, UNIX seconds) and IntegrityError to skip the inserted data. When doing this, I already considered the problem: how slow is the insert speed to the end when there are many original rows in the table? (A single table updates about 5700 rows per day, and about 0.17 million rows per month, and more than 2 million rows per year ). Now, the result is that there are 50 thousand rows of data in the table, and the insertion speed is not obviously slowed down. I want to run it again for a while and observe. If it is too slow, you have to change the method. Author: Xicheng

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.