Opentsdb 2.3+ and Tcollector 1.3+ installation configuration troubleshooting

Source: Internet
Author: User
Tags grafana opentsdb

Actually do not want to use OPENTSDB, has been used influxdb+grafana very convenient, and tsdb dependent hbase, although the capacity and speed is guaranteed, but the distributed system for a monitoring platform, after all, still some heavy, problem localization more cumbersome, But the leader says use it.

You have to spit it out here. Opentsdb and Tcollector document updates, too backward to see the official documents at all, the location of the configuration file could not be found. Finally, we have to look at the source code, especially Tcollector, the TSDB official launch of the data collector. Not only the document behind, but the core, peripheral auxiliary code is also backward. and plug-in mode design of various data collectors too wonderful, can not run on the mad error.

Installation of TSDB is more convenient, find a hbase regionserver direct RPM can be. Mainly engaged in tcollector after the wrong, but the problem is mainly in the tcollector, not bad tsdb.

After Tsdb, however, you need to run a script called create_table.sh to create the tables Tsdb need in HBase. This pit me for a few minutes, install the official document, what compression method is not important to create the table, and then run the script, the half-hour table is not created, the script by default will be Lzo way to select compression, is not built, you must write env compression= ' NONE ' on the command line ... Can. But my cluster is to support Lzo AH. But compared with tcollector, this is not a matter.

TSDB configuration is also simple, not to elaborate.


Tcollector is the OPENTSDB official data collector, in line with the personal open source style, the document is not updated for a long time. See official documents just can't find where to configure access to Opentsdb, the document written in the code does not exist, can only turn the source.

Tcollector installation is not complex, comes with a RPM packaging makefile, direct make RPM can be packaged into RPM, and then put in repo inside yum installation, the main problem is to run up after the installation of no data.

Then start troubleshooting, and first determine if OPENTSDB can receive the data. Stop Tsdb, start a TCP Server with Nc-l 4242, listen to the port on the original Tsdb, and start TCOLLECTOR,NC receive a version, and then it's gone. OK, go to see the source of Tcollector.

        # we use  the version command as it is very low effort for the  TSD        # to respond         log.debug (' verifying our tsd connection is alive ')          try:             self.tsd.sendall (' version\n ')         except socket.error,  msg:            self.tsd = none             self.blacklist_connection ()              return false 

Well, version is actually the equivalent of sending a ping command, and if not responding, put the server on the blacklist. I do not understand, to a single server to send the program to blacklist what to do.

Continue nc-l and receive a response from version.

After receiving version, directly in the NC console to call 2, just give a response on the line, immediately the data came up. OK, Tcollector send the data in fact no problem, the problem must be on the Tsdb side.

Open the Tsdb log without any error.

Open/etc/opentsdb/logback.xml to promote the log level from info to debug,opentsdb with slf4j as the log record.

<root level= "DEBUG" > <!--<appender-ref ref= "STDOUT"/>--<appender-ref ref= "CYCLIC"/> & Lt;appender-ref ref= "FILE"/> </root>

Restart Tsdb, then go to the log, come.


16:58:27.470 DEBUG [Putdatapointrpc.execute]-Put:unknown metric:no such name for ' Metrics ': ' Tcollector.reader.lines_c Ollected '


Said HBase tsdb table did not metrics this column name, look at the official documents, there is a configuration called


Tsd.core.auto_create_metrics = True


The default is False, set to true after restarting Tsdb, data into hbase, no problem.


But the data into hbase there is a problem, there is no CPU measurement information, see the source of the CPU information in the collector inside the sysload.py inside, but look at the makefile packaged rpm, which does not contain this file. No way, then go back to see Tcollector's makefile and RPM spec file, handy fix centos6 under the bug.

Makefile did not see what the problem, a few options, all, RPM, clean, Distclean, Swix, respectively, make all, made rpm, make clean, it should be in the spec file.

Sure enough, the first problem with the spec file is that the RPM call Python is hard pointing to Python 2.7, and the CENTOS6 inside is no 2.7, just change it.

%global py2_sitelib/usr/lib/python2.6/site-packages

The second problem is that the installed plugin points to a specific file.

%files Collectors%{tcollectordir}/collectors/0/dfstat.py%{tcollectordir}/collectors/0/ifstat.py%{tcollectordir }/collectors/0/iostat.py%{tcollectordir}/collectors/0/netstat.py%{tcollectordir}/collectors/0/procnettcp.py%{t collectordir}/collectors/0/procstats.py%{tcollectordir}/collectors/0/smart_stats.py

So the result is that these files will be packaged to RPM, which is obviously the main code update, and the peripheral auxiliary code is not updated.

But changed to * is not elegant, because some new plug-ins because of the call dependency problem, the start will always error, so the need to perform according to the specific needs to install which plug-ins, so, from this point of the product of this part of the code is still far from enough, at least to make a plug-in judgment ah, the lack of reliance on

Updated spec, re-packaged, the data needed to enter HBase.


And the Tcollector configuration is the largest slot, put to the last finale, according to the official document, there should be a startstop script, the escalation server configured as Opentsdb server, the result of the source of death can not find this startstop script. And then read through the source, his mother's core configuration file is placed in the plugin folder, this idea, is simply a disaster ah. Inside the tcollector/collectors/etc/config.py, it's not complicated, but it's more annoying.

    defaults = {         ' verbose ':  false,         ' No_tcollector_stats ': false,          ' evictinterval ': 6000,         ' Dedupinterval ': 300,         ' Deduponlyzero ': False,          ' Allowed_inactivity_time ': 600,          ' Dryrun ': false,         ' maxtags ':  8,          ' Max_bytes ': 64 * 1024 * 1024,          ' Http_password ': false,          ' reconnectinterval ': 0,         ' http_username ':  False,         ' Port ': 4242,         ' pidfile ':  '/var/ Run/tcollector.pid ',         ' http ': false,          ' Http_api_path ':  "Api/put",         ' Tags ': [],         ' remove_inactive_collectors ':  False,          ' host ':  ' To.your.opentsdb.server ',          ' backup_count ': 1,         ' logfile ' :  '/var/log/tcollector.log ',         ' cdir ':  default_cdir,          ' SSL ': false,          ' stdin ': false,         ' daemonize ': false,          ' Hosts ':  false    }


Well, this is the most pit I have found in the company's Open source code, design the code structure of the engineer should pull out to shoot half an hour. Wasting my 2 hours of precious time with this crap.

Then, I found that we have a server above is a tcollector code, the file creation time is 15, that will I have not come, the fact that I have already investigated this, but has not made it possible, this thing is not much difficult, but the document really deceptive.

Feel product design, has never been the strength of the Internet code farming, rapid development and implementation of the function on the line, never consider the problem of product-engineered code structure optimization.

Finally, the leader is willing to look at the picture with gnuplot and I will not say anything. I still put opentsdb as a data source into the Grafana inside, with that look more beautiful.

Opentsdb 2.3+ and Tcollector 1.3+ installation configuration troubleshooting

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.