Opentsdb is a hbase-based open source monitoring system that can support tens of thousands of cluster monitoring and billions of data points. Among them, Tsdb represents time series Database,opentsdb has done quite a lot of optimization work on the storage and query of timeseries data.
Schema overview
Conceptually, the OPENTSDB consists of three parts: Tcollector data acquisition, TSD database service, and hbase data storage.
Data acquisition Process
For example, the tcollector background process runs on each monitored server, manages data collection scripts, executes periodically, restarts on failure, and ensures that all monitoring data is sent to Opentsdb. Tcollector writes data to TSD, and hbase decoupling keeps the client lightweight. TSD collects data through a telnet-like protocol, so a small number of TSD instances can support a lot of tcollector write processes.
Data query process
The TSD is responsible for data interaction with HBase and provides an HTTP query interface to the front end. All TSD processes are stateless so that the query module can be load balanced, linearly scalable, and highly available.
HBase Schema Design
Opentsdb Stores time-series metrics, from different servers, with different kinds of data, each with timestamps and measured values. And ultimately, data is available to users, and Opentsdb must also support online data visualization. These factors all affect the HBase schema design.
OPENTSDB includes two tables, TSDB table storage time series data, TSDB-UID table storage ID correspondence relationship, management data tags and so on.
Tsdb-uid table
Tsdb-uid includes ID and name two column families
The UID of the first two rows of the Uid-to-name row is used to correlate with the TSDB table foreign key, and the last two lines of Name-to-uid row are used to support automatic completion of the tag name (efficiently implemented by scanning rowkey).
Note that the Metrics in column qualifier represents the type of data, and typical values include tag name and tag value. For example, from the Ubuntu server mysql.bytes_sent data, where mysql.bytes_sent for the metrics,host tag Name,ubuntu as tag value. Opentsdb the same monitoring data by metrics, tag name, tag value combination.
TSDB table
The design of the TSDB table focuses on support for date range lookup, tag filtering. The Tsdb table is only a column family of T.
First look at the Rowkey design: The Beginning is the metric UID, followed by a four-byte partial timestamp, only to the hour granularity, to ensure that the same indicator in chronological order stored together, but also significantly reduce the number of rows. Finally, all the tag name and tag value UID meet the query filtering requirements.
Then look at column qualifier: The first is the time stamp remaining seconds, followed by a four-bit mask to make some additional tokens, such as the first tag cell value is an integer or a floating-point number.