Hadoop cluster monitoring needs to use a time series database, today spent half a day to investigate the use of the recent comparison of Fire influxdb, found it is really good, record the learning experience.
Influx is written in the go language, developed for time series data persistence, and is supported by all platforms due to the use of the Go language. Similar time-series databases have Opentsdb,prometheus and so on.
Opentsdb is very famous, performance is also good, but based on HBase, to use that still have to take a set of hbase, a bit in order to eat pork to kill pigs first, hot skin, pull the feeling of hair. Prometheus related documents and discussions are too few, and the INFLUXDB project is active, users and documents are richer, so look at this first. Influx can be said to be Leveldb's Go language modified version of the implementation, LEVELDB with the LSM engine, high efficiency, influx is based on the LSM engine modified TSM engine, designed for time series.
Introduction to InfluxDB Architecture principles
Introduction to LevelDB Architecture principles
Afternoon with seven cattle crazyjvm chat a bit, because seven cattle are used go, so also a large number of deployment of influx to large enterprise users, is said to be the world's largest influxdb cluster, seven cattle also to influx submitted a large number of patches, Results influx through the early open source made almost stable, suddenly closed source, which is also too much gas, and then cluster function charge, stand-alone function for free use.
Yesterday looked at a document, today tried a bit, feel very good, worth recommending. Learn to learn the contents of the record, for lovers reference, but also save their time long forgotten.
Influxdb can not actually say which database, the first to feel more like the MONGO type of NoSQL, but interestingly, it provides a class of SQL interface, the developer is very friendly. The interface for command-line query results is a bit like MySQL, which is interesting.
Do not write installation deployment and CLI interface, because there is no writing, direct Yum or apt is installed. Service start, and then influx command on the command line, online a lot of installation tutorials
INFLUXDB has several key concepts that need to be understood.
Database: Equivalent to the library name inside the RDBMS. The statements that create the database are also very similar. Once in, you can create a database to play, plus no semicolon line.
CREATE DATABASE ' Hadoop '
Then need to create a user, I save time, directly create a maximum permissions, just look at the day, and then directly write the rest interface went, authority management slowly and carefully.
CREATE USER "Xianglei" with PASSWORD ' PASSWORD ' and all privileges
Inserting a single piece of data using a query statement
INSERT hdfs,hdfs=adh,path=/free=2341234,used=51234123,nonhdfs=1234
Influx does not establish the schema first, because influx allows the stored data to be schemeless, which is called the measurement, and the table is created automatically when the data is inserted without a table.
Measurement: equivalent to the data table name inside the RDBMS.
In the INSERT statement above, the first HDFs following the insert is measurement, and if there is no one called HDFs, a table called HDFs is automatically created, otherwise the data is inserted directly.
Then the concept of tags,tags is similar to the query index name inside the RDBMS, here the tags are hdfs=adh and path=/, equals I built two tags.
Free hereafter collectively called fields,tags and Fields separated by a space, tags and fields within their own comma separated. tags and fields name can be filled out, mainly in the beginning of the design is OK.
So, make a comment on the above INSERT statement, that's it.
INSERT [HDFs (measurement)],[hdfs=adh,path=/(tags)] [free=2341234,used=51234123,nonhdfs=1234 (Fields)]
You can then query the data
SELECT free from HDFs WHERE hdfs= ' adh ' and path= '/'
Name:hdfstime free--------1485251656036494252 4252341485251673348104714 42 5234
SELECT * from HDFs LIMIT 2
Name:hdfstime free HDFs nonhdfs path used----------------------- ----1485251656036494252 425234 adh 1341/234121485251673348104714 425234 adh 1341/23412
Here the Where condition, that is the above tags inside the hdfs=adh and path=/, so tags can be arbitrarily added, but when inserting the first piece of data, it is best to design your query conditions first. Of course, any data you insert will automatically be added to the time column and counted as a nanosecond timestamp.
The above is the basic concept of influx and the basic use of records, the following is the use of interface development. The tornado example is a RESTful query interface.
Influx itself supports restful HTTP Api,python with direct encapsulation of interfaces that can be invoked, directly pip install Influxdb can be
Influxdb-python Documentation
Talk is cheap, show me your code.
Models influx module for connection of INFLUXDB
Class influxclient: def __init__ (self): self._conf = parseconfig () self._config = self._conf.load () self._server = self._ config[' influxdb ' [' Server '] self._port = self._ config[' influxdb ' [' Port '] self._user = self._config[ ' Influxdb ' [' username '] self._pass = self._config[' Influxdb ' [' Password '] self._db = self._config[' Influxdb ' [' DB '] self._retention_days = self._config[ ' Influxdb ' [' retention '] [' Days '] self._retention_replica = self._config[' InfluxdB ' [' Retention '] [' Replica '] self._retention_name = self._config[' influxdb ' [' retention '] [' name '] self._client = influxdbclient (self._server, self._port, self._user, self._pass, self._db) def _create_database (self): try: self._client.create_database (self._db) except Exception, e: print e.message def _create_ Retention_policy (self): try: self._client.create_retention_policy (self._retention_name, self._retention_days, self._retention_replica, default=true) except exception, e: &nbsP; print e.message def _switch_user ( Self): try: self._client.switch_user (Self._user, self._pass) except Exception, e: print e.message def write_points (Self, data): self._create_database () Self._create_retention_policy () if self._client.write_ Points (data): return true else: return False def query (self, qry): try: result = self._client.query (qry) return result except Exception, e: return e.message
The configuration of the connection Influxdb is read from the project's configuration file and can be written by itself.
Controller Module Influxcontroller
Class influxrestcontroller (Tornado.web.RequestHandler): " "GET" op=query&qry=select+used+from+hdfs+where+hdfs= adh ' Query method, using Http get def get (Self, *args, **kwargs): op = self.get_ Argument (' op ') #自己实现的python switch case, a whole bunch of online for case in switch (OP): if case (' query '): #查询语句从url参数获取 qry = self.get_argument (' qry ') #实例化Models里面的class influx = influxclient () result = Influx.query (qry) #返回结果为对象, gets the dictionary in the object through the raw property. self.write ( Json.dumps (result.raw, ensure_ascii=false)) break if case (): self.write (' No argument found ') #写入数据, using http put &nbsP;def put (self): op = self.get_argument (' op ') for case in switch (OP): if case (' write '): #data should urldecode first and then turn into json data = json.loads (Urllib.unquote (self.get_argument (' data))) influx = Influxclient () #写入成功或失败判断 if influx.write_points (Data): self.write (' {' Result ': true} ') else: self.write (' {' result ': false} ') break if case (): self.write (' No argument found ')
Tornado Configuring routing
Applications = Tornado.web.Application ([R '/', Indexcontroller), (R '/ws/api/influx ', influxrestcontroll ER)], **settings)
JSON project configuration file
{"Http_port": 19998, "Influxdb": {"Server": "47.88.6.247", "Port": "8086", "username": "root", "password": "Root", "db": "Hadoop", "retention": {"Days": "365d", "Replica": 3, "name": "Hound_policy"}, "Replica": 3}, "Copyright": "CopyLeft Xianglei"}
Insert Test
Def test_write (): base_url = ' Http://localhost:19998/ws/api/influx ' #data = ' [{"Measurement": "HDFs"}, "tags": {"HDFs": "ADH", "path": "/ User "}," Fields ": {" used ": 234123412343423423," free ": 425234523462546546," Nonhdfs ": 1341453452345 }] ' #构造插入数据 body = dict () body[ ' Measurement '] = ' hdfs ' body[' tags '] = dict () body[' tags ' [' HDFs '] = ' adh ' body[' tags ' [' path '] = '/' body[' Fields '] = dict () body[' fields ' [' Used '] = 234123 body[' Fields ' [' Free '] = 425234 body[' fields '] [ ' Nonhdfs '] = 13414 tmp = list () tmp.append ( Body) op = ' Write ' # dict data to json and Urlencode data = urllib.urlencode ({' op ': op, ' data ': Json.dumps (TMP)}) headers = {' Content-type ': ' application/ X-www-form-urlencoded; charset=utf-8 '} try: http = tornado.httpclient.httpclient () Response = http.fetch ( Tornado.httpclient.HTTPRequest ( url=base_url, method= ' PUT ', headers=headers, body=data ) ) print response.body except Tornado.httpclient.httperror, e: print etest_write ()
Get insert results by accessing an HTTP connection after inserting data
Curl-i "Http://localhost:19998/ws/api/influx?op=query&qry=select%20*%20from%20hdfs" HTTP/1.1 Okdate:tue, 24 Jan 15:47:42 gmtcontent-length:1055etag: "7a2b1af6edd4f6d11f8b000de64050a729e8621e" content-type:text/html; charset=utf-8server:tornadoserver/4.4.2{"Values": [["2017-01-24t09:54:16.036494252z", 425234, "Adh", 13414, "/", 234123]], "name": "HDFs", "Columns": ["Time", "Free", "HDFs", "Nonhdfs", "path", "used"]}
We'll write tomorrow with react. Monitor Front end
This article from the "Practice Test Truth" blog, reproduced please contact the author!
Trial Time Series Database Influxdb