OPENTSDB installation, configuration, data storage introduction _OPENTSDB

Source: Internet
Author: User
Tags cassandra numeric value tag name zookeeper git clone opentsdb
1. What is Opentsdb
2.OpenTSDB is written and constructed in what language.
3. How to install OPENTSDB.



1. Opentsdb introduces Opentsdb to build a distributed, scalable time series database using HBase to store all timing (without sampling). It supports all metrics of second-level data acquisition, supports permanent storage, can do capacity planning, and is easy to access to existing alarm systems. Opentsdb can get the appropriate metrics and storage, indexing, and services from large-scale clusters (including network devices, operating systems, applications) in the cluster, making the data easier to understand, such as web, graphical, and so on. For operational engineers, OPENTSDB can obtain real-time state information for infrastructure and services, demonstrating a variety of software and hardware errors, performance changes, and performance bottlenecks in the cluster. For managers, OPENTSDB can measure the system's SLAs, understand the interactions between complex systems, and demonstrate resource consumption. The overall operation of the cluster can be used to support the coordination of budget and cluster resources. For developers, OPENTSDB can show the cluster's main performance bottlenecks, often errors, so you can focus on key issues. OPENTSDB uses lgplv2.1+ Open source protocol, the current version is 2. X. Website address: http://opentsdb.net/Source code: https://github.com/OpenTSDB/opentsdb/
2. Install OPENTSDB 2.1 dependent
OPENTSDB relies on JDK and gnuplot,gnuplot to install in advance, with a minimum version requirement of 4.2 and a maximum of 4.4, which can be installed with the following command:
Yum Install gnuplot autoconf
Apt-get Install gnuplot Copy Code
Opentsdb is written in Java, but the project is built not in Java but in the way C, C + + programmers build projects. Runtime dependencies: JDK 1.6 asynchbase 1.3.0 (BSD) Guava 12.0 (ASLV2) Logback 1.0 (LGPLV2.1/EPL) Netty 3.4 (ASLV2) slf4j 1.6 (MIT) WI Th log4j and JCL adapters Suasync 1.2 (BSD) Zookeeper 3.3 (ASLV2)
Optional compile-time dependencies: GWT 2.4 (ASLV2)
Optional unit Test dependencies: Javassist 3.15 (MPL/LGPL) JUnit 4.10 (CPL) Mockito 1.9 (MIT) Powermock 1.4 (ASLV2)
2.2 Download and compile the source code first install the necessary dependencies:
Yum install gnuplot automake autoconf Replication Code
Download source code, you can specify the latest version or manual checkout
git clone git://github.com/opentsdb/opentsdb.git
CD Opentsdb
./build.sh Copy Code
2.3 Installation 1. First installs the HBase environment of a single node or multi-node cluster, the HBase version requires 0.94 2. Set the environment variable and create the table used by OPENTSDB, you need to set the environment variable to compression and Hbase_home, the former set whether to enable compression, or set the HBase home directory. If you use compression, you will also need to install Lzo 3. Execute the Table statement src/create_table.sh 4. Start TSD

tsdtmp=${tmpdir-'/tmp '}/TSD # for Best performance, make sure
Mkdir-p "$tsdtmp" # Your temporary directory uses TMPFS
./build/tsdb TSD--port=4242--staticroot=build/staticroot--cachedir= "$tsdtmp"--auto-metric copy Code
If you are using a hbase cluster, you will also need to set the--zkquorum,--cachedir corresponding directory to generate some temporary files, you can set cron timed task to delete. Add--auto-metric, the metric is automatically created when new data is collected. You can write these parameters to a configuration file and then specify the path to the file by--config. 5. After the startup is successful, you can access it through 127.0.0.1:4242.
From source code installation gnuplot, autoconf, Opentsdb, and Tcollector, refer to: Opentsdb & Tcollector Installation Deployment (Installation and Deployment) 3. Configuration parameters that use Wizard 3.1 to configure OPENTSDB can be specified either on the command line or in a configuration file. The configuration file uses the Java properties file, the key in the file is lowercase, the comma connection string is supported, but no spaces are allowed. All OPENTSDB properties begin with Tsdb, for example:
# List of Zookeeper hosts that manage the HBase cluster
Tsd.storage.hbase.zk_quorum = 192.168.1.100 Copy Code
Configuration parameter precedence: command line arguments > Profiles > Defaults You can specify the path of the configuration file in the command line by--config, and if not, OPENTSDB will look for the configuration file from the following path:./opentsdb.conf/etc/ Opentsdb.conf/etc/opentsdb/opentsdb.conf/opt/opentsdb/opentsdb.conf
If a legitimate configuration file is not found and some of the required parameters are not set, the TSD process will not start. Configurable properties in a configuration file refer to: Properties 3.2 Basic Concepts before you can get a deep understanding of opentsdb, you need to understand some basic concepts. Cardinality. Cardinality, defined in mathematics as some element of a set, is defined in a database as some unique element of an index, defined in OPENTSDB as: some unique label values associated with a given metric's unique time series and a label name
An indicator with a high cardinality in a opentsdb is more likely to take more time to return than a low cardinality index in the query process. Compaction. In Opentsdb, multiple columns are merged into a column to reduce disk footprint, which is not the same as the compaction in HBase. This process occurs irregularly during the TSD write data or query process. Data Point. Each metric can be recorded as a numeric value at a point in time. Data Point includes the following sections: one metric: Metric A numeric value that is recorded by the timestamp of multiple labels
Metric. The nominal of a measurable unit. Metric does not include a value or a time, it is just a label, contains the number and time of the name Datapoints,metric is a comma-connected is not allowed to have spaces, such as: hours.worked webserver.downloads Accumulation.snow
Tags. A metric should describe what is measured and, in opentsdb, it should not be defined too simply. In general, it is better to use tags to describe metric with the same dimensions. tags consist of TAGK and TAGV, which represent a grouping, which represents a specific item. Time Series. A metric collection of data point with multiple tags. Timestamp. An absolute time used to describe a value or when a given metric is defined. Value. A value represents the actual numeric value of a metric. Uid. In Opentsdb, each metric, TAGK, or TAGV is assigned a unique identity, called a UID, that can be combined to create a sequence of UID or tsuid. In Opentsdb storage, for each metric, TAGK, or TAGV, there are 0-based counters, each with a new metric, TAGK, or TAGV, and the corresponding counter is added by 1. When data point is written to TSD, the UID is allocated automatically. You can also assign the UID manually, provided that the auto metric is set to true. By default, the UID is encoded as 3Bytes, and each UID type can have a maximum of 16,777,215 uid. You can also change the source code to 4Bytes. There are several ways to show the UID, and the most common way is when accessed via the HTTP API, the 3 bytes UID is encoded as a 16-in string. For example, a 1 write is a binary form of 000000000000000000000001, the most unsigned byte array, which can be expressed as [0,0,1], encoded as a 16-in string of 000001, each of which is filled with 0 on the left, if it is less than two. Therefore, the UID of 255 will appear as [0,0,255] and 0000FF. You can refer to the following: Why-uids tsuid about why you use the UID instead of using hashes. When a data point is written to Opentsdb, its row key format is: <metric_uid><timestamp><tagk1_uid><tagv1_uid>[... <tagkn_uid><tagvn_uid>], regardless of the time stamp, converts the remainder into UID, and then together, it can be composed of tsuid. Metadata. Mainly used to record data point of some additional information, convenient searchRigging and tracking, divided into Uidmeta and Tsmeta. Each UID has a metadata record stored in the Tsdb-uid table, each of which contains immutable fields, such as the UID, type, name, and created fields that indicate when they are created, and some additional fields, such as description, Notes, DisplayName, and some custom key/value pairs, details, can be viewed/api/uid/uidmeta Similarly, each tsuid can correspond to a tsmeta, recorded in Tsdb-uid, The fields include TSUID, metric, tags, lastreceived, and created, and the optional fields are description, notes, details, and can be viewed/api/uid/tsmeta Open metadata has the following several parameters: Tsd.core.meta.enable_realtime_uid tsd.core.meta.enable_tsuid_tracking tsd.core.meta.enable_ Tsuid_incrementing Tsd.core.meta.enable_realtime_ts
Another form of metadata is annotations, for more information, please refer to Annotations Tree 3.3 data storage method OPENTSDB use HBase as back-end storage, before installing OPENTSDB, You need to start a hbase node or cluster, and then execute the Build Table statement src/create_table.sh create the HBase table. The table statement is as follows:
Create ' $UID _table ',
{NAME => ' id ', COMPRESSION => ' $COMPRESSION ', Bloomfilter => ' $BLOOMFILTER '},
{Name => ' name ', COMPRESSION => ' $COMPRESSION ', Bloomfilter => ' $BLOOMFILTER '}

Create ' $TSDB _table ',
{NAME => ' t ', versions => 1, COMPRESSION => ' $COMPRESSION ', Bloomfilter => ' $BLOOMFILTER '}

Create ' $TREE _table ',
{NAME => ' t ', versions => 1, COMPRESSION => ' $COMPRESSION ', Bloomfilter => ' $BLOOMFILTER '}

Create ' $META _table ',
{Name => ' name ', COMPRESSION => ' $COMPRESSION ', Bloomfilter => ' $BLOOMFILTER '} copy Code
From the above you can see that 4 tables have been created, and you can set whether to compress, enable the HBase filter, save the version number, and so on, if you are pursuing read-write performance, you can also build the partition.

3.3.1 Data Table Schema

In Opentsdb, all data is stored in a table called TSDB, in order to take full advantage of the HBase ordered and region distributed characteristics. All values are stored in the column family T.

Rowkey for <metric_uid><timestamp><tagk1><tagv1>[...<tagkn><tagvn>],uid default encoding is 3 Bytes, and the timestamp will be encoded as 4 Bytes

After Opentsdb's Tsdb is started, the specified socket port (default 4242) is monitored, receiving monitoring data, including metrics, timestamps, data, tag tags, tag tags including tag name ids and tag value IDs. For example:

Myservice.latency.avg 1292148123 Reqtype=foo host=web42 The ID of the replication code for the indicator MYSERVICE.LATENCY.AVG is: [0, 0,-69], The ID of the Reqtype label name is: [0, 0, 1], the ID of Foo label value is: [0, 1, 11], tag name ID: [0, 0, 2] web42 Tag Value ID: [0,-7, 42], they are composed of Rowkey:


[0, 0,-69, 77, 4,-99, 32, 0, 0, 1, 0, 1, 11, 0, 0, 2, 0,-7, 42]
`-------'  `------------'  `-----'  `------'  `-----'  `-------'
Metric ID base timestamp name ID value ID Name ID value ID
`---------------'  `---------------'
The second tag copy code
The row representation format is: Each number corresponds to 1 byte [0, 0, -69] metric ID [+, 4, -99,] base timestamp = 1292148000. Timestamps in the row key are rounded down to a minute boundary. In other words, for the same one-hour metric + tags, the same data will be stored under a row [0, 0, 1] "Reqtype" index [0, 1, one] "foo" index [0, 0, 2] "host" index [0,-7, ] "Web42″index
Note: You can see that for metric + tags the same data will be stored continuously, and metic the same data will be stored continuously, so for scan and do aggregation are very helpful column qualifier occupy 2 bytes or 4 Bytes, which uses 2 bytes to indicate an offset in seconds, in the form of: bits: Delta for the hour relative to row, up to 2^ = 4096 > 3600 so no problem 4 Bits:format flags
1 Bit:an integer or floating point 3 bits: Indicates the length of the data, which must be 1, 2, 4, 8. 000 indicates that 1 byte,010 represent a 2byte,011 representation of the 4byte,100 representation 8byte

4 bytes is used to indicate an offset in milliseconds, in the format: 4 bits: hexadecimal 1 or F bits: Millisecond offset 2 bit: Keep 4 Bits:format flags
1 Bit:an integer or floating point,0 represents an integer, 1 represents a floating-point 3 bits: the length of the data is indicated, and the length must be 1, 2, 4, 8. 000 indicates that 1 byte,010 represent a 2byte,011 representation of the 4byte,100 representation 8byte

For example: For a data point with a timestamp of 1292148123, it is converted to a base time in hours (minus the seconds after the hour) to 129214800, an offset of 123, and a conversion to binary 1111011, because the value is an integer and the length is 8 bits (corresponds to 2byte). So the last 3bit is 100), so its corresponding column family name is: 0000011110110100, convert it to 16 to 07B4 value using 8bytes storage, you can either store long or double. To sum up, the TSDB table structure is as follows:
3.3.2 UID Table Schema A separate, smaller table is called Tsdb-uid to store the UID mappings, both forward and reverse. There are two-column families, a list of families called name is used to map a UID to a string, and another column family is called an ID to map a string to a UID. Each row of a column family has at least one of the following three columns: Metrics maps the metric name to the UID TAGK maps the tag name to the UID TAGV maps the tag's value to the UID
If metadata is configured, the Name column family can also include additional metatata columns. ID Column Family
The Row key– will be a string assigned to the UID, for example, for a metric that might have a value of Sys.cpu.user or a label whose value might be one of the three column types in the qualifiers– above. Column value– An unsigned integer that is encoded as 3 byte by default and has a UID value. For example, the following lines of data are queries from the Tsdb-uid table, the first row key, column family: column name, and the third listed value, which corresponds to the UID
PROC.STAT.CPU Id:metrics \x00\x00\x01
Host ID:TAGK \x00\x00\x01
CDH1 ID:TAGV \x00\x00\x01 Copy Code
Name Column Family
Row key– is a UID column qualifiers– one of the above three column types or a string for Metrics_meta, Tagk_meta, tagv_meta column value– corresponding to the UID, and the value will be a UTF-8 encoded JSON format string. Do not modify the value outside the OPENTSDB, where the order of fields affects the CAS call. For example, the following lines of data are queries from the Tsdb-uid table, the first row key, column family: column name, and the third listed value, which corresponds to the UID
\X00\X00\X01 Name:metrics PROC.STAT.CPU
\X00\X00\X01 NAME:TAGK Host
\X00\X00\X01 NAME:TAGV Cdh1
\x00\x00\x01 Name:tagk_meta {"UID": "000001", "type": "TAGK", "Name": "Host", "description": "", "Notes": "", "created" : 1395213193, "custom": null, "DisplayName": ""}
\x00\x00\x01 Name:tagv_meta {"UID": "000001", "type": "TAGV", "name": "Cdh1", "description": "," "Notes": "", "created" : 1395213193, "custom": null, "DisplayName": ""}
\x00\x00\x01 Name:metric_meta {"UID": "000001", "type": "Metric", "name": "Metrics proc.stat.cpu", "description": "," Notes ":", "created": 1395213193, "custom": null, "DisplayName": ""} Copy Code
To sum up, the TSDB-UID table structure is as follows:
A datapoint corresponding to the above figure is as follows:
PROC.STAT.CPU 1292148123 host=cdh1 The copy code shows the TSDB-UID table structure as well as the way the data is stored, and for a data point, before it is saved to opentsdb, the metrics, TAGK, TAGV, Metric_meta, Tagk_meta, Tagv_meta generate a UID (000001 in the figure above), then insert it in the HBase table, Rowkey as UID, and store multiple rows of records, respectively, to save the metrics, Mappings for TAGK, TAGV, Metric_meta, Tagk_meta, Tagv_meta to UID.


3.3.3 Meta Table Schema

This table is an index of different time series in Opentsdb and can be used to store some additional information. This table name is called Tsdb-meta, the table has only one column family name, two columns, Ts_meta, TS_CTR, respectively, the table contains the following data:
\X00\X00\X01\X00\X00\X01\X00\X00\X01 name:ts_ctr \x00\x00\x00\x00\x00\x00\x00p
\x00\x00\x01\x00\x00\x01\x00\x00\x01 Name:ts_meta {"Tsuid": "000001000001000001", "DisplayName": "", "description": "" , "Notes": "", "created": 1395213196, "custom": null, "units": "", "DataType": "", "retention": 0, "Max": "Nan", "Min": "Nan"}

\X00\X00\X02\X00\X00\X01\X00\X00\X01 name:ts_ctr \x00\x00\x00\x00\x00\x00\x00p
\x00\x00\x02\x00\x00\x01\x00\x00\x01 Name:ts_meta {"Tsuid": "000002000001000001", "DisplayName": "", "description": "" , "Notes": "", "created": 1395213196, "custom": null, "units": "", "DataType": "", "retention": 0, "Max": "Nan", "Min": "Nan"} Copy Code
Row Key and Tsdb table, which does not contain timestamps, <metric_uid><tagk1><tagv1>[...<tagkn><tagvn>] TSMeta Column and Uidmeta are similar to UTF-8 encoded JSON format Strings ts_ctr column counters that record the number of data stored in a time series with a column named Ts_ctr and a 8-bit signed integer. 3.3.4 Tree Table Schema Copy Code
An indexed table that displays a tree-like structure, similar to a file system, for use by other systems, such as: Graphite 3.4 How to write data 3.5 How to query data 3.6 The CLI Tools TSDB supports the following parameters: [Root@cdh1 build]#./tsdb
Usage:tsdb <command> [args]
Valid commands:fsck, import, mkmetric, query, TSD, scan, UID copy code
Create metrics with the following command:./tsdb mkmetric mysql.bytes_received mysql.bytes_sent Copy Code
The results of the implementation of the above order are as follows: Metrics mysql.bytes_received: [0, 0,-93]
Metrics mysql.bytes_sent: [0, 0,-92] Copy code
4. HTTP API 5. Who is using the Opentsdb StumbleUpon StumbleUpon is the easiest way-find cool new websites, videos, and photos the from images th E Web Box Box simplifies online file storage, replaces FTP and connects teams in online workspaces. Tumblr a lightweight blog that allows users to follow up on other members and see follow-up articles posted on their own pages, as well as forward articles from others on Tumblr
6. Kairosdb Kairosdb is a fast and reliable distributed time series database, mainly used for Cassandra and HBase, of course. Kairosdb is rewritten on a opentsdb basis, not only to store data on HBase but also to support Cassandra. KAIROSDB Home: https://code.google.com/p/kairosdb/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.