International - English

Cart Console

Topic Center

Contact Sales

Home > Others

Write Prometheus exporter using Golang

Last Update:2018-07-26 Source: Internet

Author: User

Tags http request memory usage mongodb postgresql rand redis cpu usage firewall

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Exporter is an important part of the monitoring system based on the Prometheus implementation, taking on the work, the official exporter list already contains most of the common system indicator monitoring, such as for machine performance monitoring Node_exporter, For network equipment monitoring snmp_exporter and so on. These existing exporter for monitoring, only need very little configuration work can provide perfect data indicator collection.

Sometimes we need to write some metrics that are related to business logic, which are not available through common exporter. For example, we need to provide overall monitoring of DNS resolution, and understanding how to write exporter is important for business monitoring and a stage to complete the monitoring system. Next we introduce how to write exporter, this content is written in the language of Golang, the official also provides Python, Java and other language implementation of the library, the collection method is actually very similar.

Build the Environment

First make sure that the Go language (above version 1.7) is installed on the machine and the corresponding Gopath is set up. Then we can start writing the code. Here is a simple exporter

Download the corresponding Prometheus package





Go get github.com/prometheus/client_golang/prometheus/promhttp

Program Main function:





Package main

Import (
    "log"
    "Net/http" "
    github.com/prometheus/client_golang/prometheus/promhttp"
)
Func Main () {
    http. Handle ("/metrics", promhttp. Handler ())
    log. Fatal (http. Listenandserve (": 8080", nil))
}

In this code we only specify a path through the HTTP module and will client_golang the promhttp in the library. Handler () as a processing function passed in, you can get the indicator information, two lines of code to achieve a exporter. The inside is actually using a default collector that will collect information about the current go runtime via newgocollector such as go stack usage, goroutine data, and so on. Detailed indicator parameters can be viewed by accessing the http://localhost:8080/metrics.

The above code only shows a default collector, and it hides too many implementation details through the interface call, and doesn't work for the next development, so we need to understand some basic concepts before we can implement custom monitoring.

Indicator category

The four class of indicator types used primarily in Prometheus, as shown below
-Counter (cumulative indicator)
-Gauge (Measurement indicator)
-Summary (Rough view)
-Histogram (histogram)

Counter a cumulative indicator data, this value will only gradually increase over time, such as the number of total task completed by the program, the total number of times the run error occurred. It is also common for SNMP-collected data traffic in the switch to be of this type, which represents a continuous increase in packet or transfer byte accumulation values.

Gauge represents a single , which can be increased or reduced, such as CPU usage, memory usage, disk current space capacity, etc.

Histogram and summary use fewer frequencies, both of which are based on a sampling approach. In addition, some libraries have different levels of use and support for these two indicators, some of which are only partially functional. These two types may be more common for some business needs, such as querying unit time: The total response time is less than 300ms, or the response time corresponding to the threshold value of query 95% user queries. When using the histogram and summary indicators, multiple sets of data are generated at the same time, _count represents the total number of samples, and _sum represents the sum of the sampled values. The _bucket represents the data falling into this range.

The following is a set of metrics defined using Historam, which calculates the ratio of the total amount of requests that are less than 0.3s for the average query request within five minutes.





  SUM (Rate (http_request_duration_seconds_bucket{le= "0.3"}[5m])) by (Job)
/
  sum (http_request_duration_ SECONDS_COUNT[5M]) by (Job)

If you need to aggregate data, you can use histogram. Histogram can also be used if there are definite values for the distribution range (such as 300ms). But if the value is only a percentage (for example, 95% above), use summary to define the indicator

Here we need to introduce another dependent library





Go get Github.com/prometheus/client_golang/prometheus

The following first defines two indicator data, one is the guage type, and the other is the counter type. Represents CPU temperature and disk failure statistics, respectively, using the definitions above to classify.





    Cputemp = Prometheus. Newgauge (Prometheus. gaugeopts{
        Name: "Cpu_temperature_celsius", help
        : "Current temperature of the CPU.",
    })
    hdfailures = Prometheus. Newcountervec (
        Prometheus. counteropts{
            Name: "Hd_errors_total", help
            : "Number of Hard-disk errors.",
        },
        []string{"Device"},
    )

Other parameters can be registered here, such as the number of disk failures above statistics, we can pass a device name in the same time, so that we can acquire a number of different indicators. Each metric corresponds to the number of disk failures for one device.

Registration Metrics





Func init () {
    ///Metrics has to is registered to be exposed:
    Prometheus. Mustregister (cputemp)
    Prometheus. Mustregister (hdfailures)
}

Use Prometheus. Mustregister is to register the data directly to the default Registry, as in the example above, the default Registry does not require any additional code to pass the indicator. After registering, you can use the indicator at the program level, where we use the API (set and with () provided by the previously defined indicator. INC) To change the data content of the indicator





Func Main () {
    cputemp.set (65.3)
    Hdfailures.with (Prometheus. labels{"Device": "/DEV/SDA"}). INC ()

    //The Handler function provides a default Handler to expose metrics
    //via an HTTP server. "/metrics" is the usual endpoint for.
    http. Handle ("/metrics", promhttp. Handler ())
    log. Fatal (http. Listenandserve (": 8080", nil))
}

Where the WITH function is passed to the value on the previously defined label= "Device", that is, the build indicator is similar to





Cpu_temperature_celsius 65.3
hd_errors_total{"Device" = "/DEV/SDA"} 1

Of course, the way we write in the main function is problematic, so the indicator changes only once, and does not change with the next time we collect the data, we hope that every time we perform the acquisition, the program will automatically fetch the indicator and pass the data to us via HTTP.

counter Data Acquisition Example

The following is an example of collecting counter type data, this example implements a custom, satisfies the collector (Collector) interface of the structure, and manually register the structure, so that it automatically perform the acquisition task each time the query.

Let's first look at the implementation of the Collector interface of the collector





Type Collector Interface {
    //is used to pass a definition descriptor for all possible indicators
    //can add a new description during program run, collect new indicator information
    //Duplicate descriptor will be ignored. Two different collector do not set the same descriptor
    Describe (chan<-*desc)

    //Prometheus The Registrar calls collect perform the actual fetch parameters work,
    // and passing the collected data to the channel returns
    //collects indicator information from describe, which can be executed concurrently, but must be secured by the thread.
    Collect (chan<-Metric)
}

Understanding the implementation of the interface, we can write their own implementation, first define the structure, which is a cluster of indicator collectors, each cluster has its own zone, representing the name of the cluster. The other two are saved by the collected indicators.





Type Clustermanager struct {
    Zone         string
    oomcountdesc *prometheus. Desc
    Ramusagedesc *prometheus. Desc
}

We come to implement a collection work, put in the reallyexpensiveassessmentofthesystemstate function implementation, each execution, the return of a host name as a key to collect data, The two return values represent the Oom error count, respectively, and the RAM usage metric information.





Func (c *clustermanager) reallyexpensiveassessmentofthesystemstate () (
    oomcountbyhost Map[string]int, Ramusagebyhost Map[string]float64,
) {
    oomcountbyhost = map[string]int{
        "foo.example.org": Int (rand. INT31N (+)),
        "bar.example.org": Int (rand. INT31N (+)),
    }
    ramusagebyhost = map[string]float64{
        "foo.example.org": Rand. Float64 () *,
        "bar.example.org": Rand. Float64 () * +,
    }
    return
}

Implements the describe interface, passing the indicator descriptor to the channel





Describe simply sends the DESCS in the the channel.
Func (c *clustermanager) Describe (ch chan<-*prometheus. DESC) {
    ch <-c.oomcountdesc
    ch <-c.ramusagedesc
}

The Collect function executes the FETCH function and returns the data, the returned data is passed to the channel, and the original indicator descriptor is bound to be passed. and the type of indicator (a counter and a guage)





Func (c *clustermanager) Collect (Ch chan<-Prometheus. Metric) {
    oomcountbyhost, Ramusagebyhost: = C.reallyexpensiveassessmentofthesystemstate ()
    for host, Oomcount : = range oomcountbyhost {
        ch <-Prometheus. Mustnewconstmetric (
            C.oomcountdesc,
            Prometheus. Countervalue,
            float64 (Oomcount),
            Host,
        )
    }
    for Host, ramusage: = range ramusagebyhost {
        ch <-Prometheus. Mustnewconstmetric (
            C.ramusagedesc,
            Prometheus. Gaugevalue,
            Ramusage,
            host,
        )
    }
}

Create structure and corresponding indicator information, NEWDESC parameter first is the name of the indicator, the second is the Help information, displayed on the indicator as a comment, the third is the definition of the label name Array, the fourth is the definition of labels





Func Newclustermanager (Zone string) *clustermanager {
    return &clustermanager{
        zone:zone,
        Oomcountdesc:prometheus. Newdesc (
            "Clustermanager_oom_crashes_total",
            "number of oom crashes.",
            []string{"host"},
            Prometheus. labels{"zone": Zone},
        ramusagedesc:prometheus. Newdesc (
            "Clustermanager_ram_usage_bytes",
            "RAM usage as reported to the cluster manager.",
            []string{] Host "},
            Prometheus. labels{"zone": Zone},
        ),
    }
}

Executing the main program





Func Main () {
    Workerdb: = Newclustermanager ("db")
    Workerca: = Newclustermanager ("Ca")

    //Since we are Dealing with custom Collector implementations, it might
    /is a good idea to try it out with a pedantic registry.
    Reg: = Prometheus. Newpedanticregistry ()
    Reg. Mustregister (WORKERDB)
    Reg. Mustregister (Workerca)
}

If we execute the above parameters directly, we will not get any parameters, because the program will be introduced automatically, we do not define the HTTP interface to expose the data, so the data will need to define a HttpHandler to handle the HTTP request when executing.

Adding the following code to the main function allows data to be passed to the HTTP interface:





    Gatherers: = Prometheus. gatherers{
        Prometheus. Defaultgatherer,
        Reg,
    }

    H: = Promhttp. Handlerfor (Gatherers,
        promhttp. handleropts{
            errorlog:      log. Newerrorlogger (),
            errorhandling:promhttp. ContinueOnError,
        })
    http. Handlefunc ("/metrics", func (w http). Responsewriter, R *http. Request) {
        h.servehttp (W, R)
    })
    Log.infoln ("Start server at:8080")
    if err: = http. Listenandserve (": 8080", nil); Err! = Nil {
        log. Errorf ("Error occur when start server%v", err)
        OS. Exit (1)
    }

Which Prometheus. Gatherers is used to define a collection of collected data, you can merge multiple different acquisition data into a result set, here we pass the default defaultgatherer, so he will also include the Go Runtime indicator information in the output. The inclusion of Reg is a registered object that we have previously generated and collects data from the definition.

Promhttp. The Handlerfor () function passes the gatherers object before it and returns a HttpHandler object, which can call its own servhttp function to take over the HTTP request and return the response. Which promhttp. Handleropts defines the acquisition process and continues to collect additional data if an error occurs.

Try refreshing the browser several times to get the latest indicator information





Clustermanager_oom_crashes_total{host= "bar.example.org", zone= "Ca"} 364
clustermanager_oom_crashes_total{ Host= "bar.example.org", zone= "db"}
clustermanager_oom_crashes_total{host= "foo.example.org", zone= "Ca"} 844
clustermanager_oom_crashes_total{host= "foo.example.org", zone= "DB"} 801
# Help Clustermanager_ram_usage_ Bytes RAM usage as reported to the cluster manager.
# TYPE Clustermanager_ram_usage_bytes gauge
clustermanager_ram_usage_bytes{host= "bar.example.org", zone= "Ca"} 10.738111282075208
clustermanager_ram_usage_bytes{host= "bar.example.org", zone= "db"} 19.003276633920805
Clustermanager_ram_usage_bytes{host= "foo.example.org", zone= "Ca"} 79.72085409108028
Clustermanager_ram_usage _bytes{host= "foo.example.org", zone= "db"} 13.041384617379178

Each time we refresh, we get different data, similar to the one that implements a constantly changing number of collectors. Of course, the specific indicators and acquisition functions also need to be modified to meet the actual business needs.

Alibaba Cloud Hot Products
Elastic Compute Service (ECS)	Dedicated Host (DDH)	ApsaraDB RDS for MySQL (RDS)	ApsaraDB for PolarDB(PolarDB)	AnalyticDB for PostgreSQL (ADB for PG)
AnalyticDB for MySQL(ADB for MySQL)	Data Transmission Service (DTS)	Server Load Balancer (SLB)	Global Accelerator (GA)	Cloud Enterprise Network (CEN)
Object Storage Service (OSS)	Content Delivery Network (CDN)	Short Message Service (SMS)	Container Service for Kubernetes (ACK)	Data Lake Analytics (DLA)
ApsaraDB for Redis (Redis)	ApsaraDB for MongoDB (MongoDB)	NAT Gateway	VPN Gateway	Cloud Firewall
Anti-DDoS	Web Application Firewall (WAF)	Log Service	DataWorks	MaxCompute
Elastic MapReduce (EMR)	Elasticsearch

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

write program using switch statement prometheus jobs prometheus streaming write html program using all tags crystal reports exporter prometheus configuration grafana prometheus

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Write Prometheus exporter using Golang

Alibaba Cloud Hot Products

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support