Openstack/gnocchi Introduction--time series Data aggregation operation is calculated and stored in advance, the idea of first counting and taking

Source: Internet
Author: User
Tags statsd

Let's take a look at http://www.cnblogs.com/bonelee/p/6236962.html. Here for the Ring database introduction, easy to understand the archive this operation!

Transferred from: http://blog.sina.com.cn/s/blog_6de3aa8a0102wk0y.html

The early OpenStack Monitoring (telemetry) project Ceilometer was divided into four (ceilometer, gnocchi, Aodh, panko), with their respective roles! Among them, Ceilometer is responsible for collecting measurement data and processing pretreatment; Gnocchi is mainly used to provide resource index and storage timing measurement data, Aodh mainly provides early warning and metering notification Service, Panko mainly provides event storage service. The main reasons for the Ceilometer division are: the measurement data (i.e. time series data)of the early types of resources (measurement, the core field is <</span> time, value > ) is stored in the sample table in the SQL database, and as the amount of resources needed to be monitored in the cloud environment increases and the time passes, the growth of the metering data becomes unpredictable; In terms of the use of metering data, the query operation first filters the required entries from the huge sample list, and then involves the associated aggregation calculations It can be imagined that the resulting performance overhead is absolutely intolerable and that the bottleneck will become more pronounced over time until it crashes. There are a number of ways to solve the problem, such as: a table for each monitoring indicator (METIRC), then a resource may have multiple tables (such as a instance at least cpu,cpu.util,memory,memory.usage,disk.* such as monitoring indicators metrics); This seems a bit exaggerated, even if this is acceptable, then the query on the measurement data aggregation operation is still a problem.


Similar ideas, Red Hat's Julien Danjou (blog:https://julien.danjou.info/blog/) launched a gnocchi project to address such issues. The general idea is: the measurement data of each measurement indicator metric directly to the back-end storage, and before the measurement write according to the pre-set archiving policy aggregation operations, the query directly read the corresponding file to obtain the aggregated monitoring information points , the time complexity is obviously changed to O (1) , and the resource index is provided so that the underlying information metadata and its associated metrics information for each resource can be found more quickly.


Openstack/gnocchi Introduction and architecture [with document translation]

Document translation (Welcome Elegance): HTTP://GNOCCHI.XYZ

The first part: Gnocchi introduction http://gnocchi.xyz/index.html

Gnocchi–metric as a Service,gnocchi metering as a service
Gnocchi is a multi-tenant time series, metering and resource database. provides an HTTP rest interface to create and manipulate data. Gnocchi is designed for storage of ultra-large metering data while providing access to metrics and resource information to both operators and users.
Gnocchi is part of the OpenStack project. So it supports OpenStack, but it can work completely independently.
You can read the complete online documentation on the HTTP://GNOCCHI.XYZ.

Why gnocchi? Why use gnocchi?
Gnocchi has been created to meet the needs of a time-series database that is available in a cloud computing environment: the ability to store large amounts of metric data and be easily extensible.
The gnocchi project began in 2014 as a branch of the OpenStack Ceilometer project to address the performance issues encountered by Ceilometer when using a standard database as a storage backend for metering data. For more information, see Julien's Blog Gnocchi.

Use cases, using case
Gnocchi is designed to store time series and its associated resource metadata. Thus, it is useful for examples such as:
(1) Storage of the billing system, (2) alarm triggering or monitoring system , (3) Statistical use of the data.

Key Features, critical features
HTTP Rest Interface Horizontal extensibility metric aggregation Measurement batch processing support archiving policy metering value search
Structured resource Resource History queryable resource Indexer Multi-tenancy support Grafana Support STATSD protocol support


Part II: Architecture of the gnocchi http://gnocchi.xyz/architecture.html

Project Architecture, Projects schema
Gnocchi consists of several services: an HTTP REST API (see Rest API usage), optional STATSD compatible daemons (see STATSD daemon usage), and asynchronous processing daemons. Receive data through the HTTP REST API and the STATSD daemon. The asynchronous processing daemon (known as GNOCCHI-METRICD) performs statistical calculations on the data received in the background, and measures standard cleanup operations.
Both the HTTP REST API and the asynchronous processing daemon are stateless and extensible. Additional workers can be added depending on the load.

Back-ends, back end
Gnocchi uses two different back-end storage data: one for storing time series (storage driver) and another for index data (index driver).
The memory is responsible for storing the measured value of the measure measurement. It receives timestamps and values and pre-computes aggregations based on the defined archive policy.
The indexer is responsible for storing the indexes of all resources, as well as their types and properties. Gnocchi not only understands the resource types from OpenStack projects, but also provides a common type, so you can create basic resources and handle resource properties yourself. Indexers are also responsible for linking resources to metric metric.

How to choose Back-ends, choosing back end
Gnocchi currently offers a centralized different storage engine: file,swift,s3,ceph (preferred).
The driver for storage is based on an intermediate library called Carbonara, which processes time series operations because these storage technologies themselves cannot handle time series.
Four Carbonara-based drivers run well and the backend technology guarantees scalability. Ceph and Swift are inherently more scalable than file drives.
Depending on the size of your architecture, it may be sufficient to use the file driver and store the data on disk. If you need to extend the number of servers using a file driver, you can export and share data through NFS in all gnocchi processes. In any case, it is clear that the S3,ceph and swift drivers are largely more scalable. Ceph also provides better consistency and therefore is the recommended driver.

How to plan for gnocchi ' s storage, planning gnocchi storage
Gnocchi uses a custom file format based on the Carbonara library. In gnocchi, a time series is a set of points, where a point is a given measurement or sample in the life of a time series.compress storage formats with various technologies, you can use the following formula to estimate the calculation of the time series size based on its worst case scenario: x8 bytes = size in bytes
The number of points you want to keep is usually determined by the following formula: points = time span ÷ granularity
For example, if you want to keep data for one year, the resolution of one minute: points = (365 days x24 hours x60 minutes) ÷ 1 minutes = 525 600. Then: Byte size = 525 600 Bytes x6 = 3 159 600 bytes = 3 085 KiB
This is just an aggregation time series. If the archiving policy uses 6 default aggregation methods (Mean,min,max,sum,std,count) with the same "one-year, one-minute aggregation" resolution, the space used will be increased up to the 6x4.1 MIB = 24.6 MIB.

How to set the archive policy and granularity, setting the archiving strategy and metering granularity
In gnocchi, the archive policy is expressed in points. If the archive policy defines a 10-point policy with a granularity of 1 seconds, the time series archive will remain for up to 10 seconds, each representing a 1-second aggregation. This means that the time series will retain up to 10 seconds of data (sometimes a bit more) between the nearest and oldest points. This does not mean that it will be 10 consecutive seconds: If the data is delivered at irregular intervals, there may be gaps.
There is no data expiration relative to the current timestamp. Additionally, you cannot delete old data points (at least for now).
Therefore, archiving policies and granularity depends entirely on your use case. Depending on the usage of the data, you can define multiple archiving policies. A typical low-granularity use case might be:
3,600 points with a granularity of 1 seconds = 1 hours
1440 points with a granularity of 1 minutes = 24 hours
720 points with a granularity of 1 hours = 30 days
365 points, grain size 1 days = 1 years
This will represent 6,125 points per aggregation method x9 = KiB. If you use the 8 standard aggregation method, your metrics will consume 8x54 KiB = 432 KiB of disk space.

Default archive policies, archive policy
By default, use the default archive policy (listed in Default_aggregation_methods, i.e. average, minimum, maximum, sum, standard value, count) to create 3 archive policies:
Low (maximum estimated size per metric: 5 KiB)
5 minutes Granularity 1 hours, 1 hours granularity 1 days, 1 days granularity more than 1 months
Medium (maximum estimated size per metric: 139 KiB)
1 minutes granularity of 1 days, 1 hours granularity more than 1 weeks, 1 days granularity more than 1 years
High (maximum estimated size per metric: 1 578 KiB)
1 seconds granularity, 1 minutes granularity 1 weeks, 1 hours grain size more than 1 years

Openstack/gnocchi Introduction--time series Data aggregation operation is calculated and stored in advance, the idea of first counting and taking

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.