Exporter is an important part of the monitoring system based on the Prometheus implementation, taking on the work, the official exporter list already contains most of the common system indicator monitoring, such as for machine performance monitoring Node_exporter, For network equipment monitoring snmp_exporter and so on. These existing exporter for monitoring, only need very little configuration work can provide perfect data indicator collection.
Sometimes we need to write some metrics that are related to business logic, which are not available through common exporter. For example, we need to provide overall monitoring of DNS resolution, and understanding how to write exporter is important for business monitoring and a stage to complete the monitoring system. Next we introduce how to write exporter, this content is written in the language of Golang, the official also provides Python, Java and other language implementation of the library, the collection method is actually very similar.
Build the Environment
First make sure that the Go language (above version 1.7) is installed on the machine and the corresponding Gopath is set up. Then we can start writing the code. Here is a simple exporter
Download the corresponding Prometheus package
Go get github.com/prometheus/client_golang/prometheus/promhttp
Program Main function:
Package main
Import (
"log"
"Net/http" "
github.com/prometheus/client_golang/prometheus/promhttp"
)
Func Main () {
http. Handle ("/metrics", promhttp. Handler ())
log. Fatal (http. Listenandserve (": 8080", nil))
}
In this code we only specify a path through the HTTP module and will client_golang the promhttp in the library. Handler () as a processing function passed in, you can get the indicator information, two lines of code to achieve a exporter. The inside is actually using a default collector that will collect information about the current go runtime via newgocollector such as go stack usage, goroutine data, and so on. Detailed indicator parameters can be viewed by accessing the http://localhost:8080/metrics.
The above code only shows a default collector, and it hides too many implementation details through the interface call, and doesn't work for the next development, so we need to understand some basic concepts before we can implement custom monitoring.
Indicator category
The four class of indicator types used primarily in Prometheus, as shown below
-Counter (cumulative indicator)
-Gauge (Measurement indicator)
-Summary (Rough view)
-Histogram (histogram)
Counter a cumulative indicator data, this value will only gradually increase over time, such as the number of total task completed by the program, the total number of times the run error occurred. It is also common for SNMP-collected data traffic in the switch to be of this type, which represents a continuous increase in packet or transfer byte accumulation values.
Gauge represents a single , which can be increased or reduced, such as CPU usage, memory usage, disk current space capacity, etc.
Histogram and summary use fewer frequencies, both of which are based on a sampling approach. In addition, some libraries have different levels of use and support for these two indicators, some of which are only partially functional. These two types may be more common for some business needs, such as querying unit time: The total response time is less than 300ms, or the response time corresponding to the threshold value of query 95% user queries. When using the histogram and summary indicators, multiple sets of data are generated at the same time, _count represents the total number of samples, and _sum represents the sum of the sampled values. The _bucket represents the data falling into this range.
The following is a set of metrics defined using Historam, which calculates the ratio of the total amount of requests that are less than 0.3s for the average query request within five minutes.
SUM (Rate (http_request_duration_seconds_bucket{le= "0.3"}[5m])) by (Job)
/
sum (http_request_duration_ SECONDS_COUNT[5M]) by (Job)
If you need to aggregate data, you can use histogram. Histogram can also be used if there are definite values for the distribution range (such as 300ms). But if the value is only a percentage (for example, 95% above), use summary to define the indicator
Here we need to introduce another dependent library
Go get Github.com/prometheus/client_golang/prometheus
The following first defines two indicator data, one is the guage type, and the other is the counter type. Represents CPU temperature and disk failure statistics, respectively, using the definitions above to classify.
Cputemp = Prometheus. Newgauge (Prometheus. gaugeopts{
Name: "Cpu_temperature_celsius", help
: "Current temperature of the CPU.",
})
hdfailures = Prometheus. Newcountervec (
Prometheus. counteropts{
Name: "Hd_errors_total", help
: "Number of Hard-disk errors.",
},
[]string{"Device"},
)
Other parameters can be registered here, such as the number of disk failures above statistics, we can pass a device name in the same time, so that we can acquire a number of different indicators. Each metric corresponds to the number of disk failures for one device.
Registration Metrics
Func init () {
///Metrics has to is registered to be exposed:
Prometheus. Mustregister (cputemp)
Prometheus. Mustregister (hdfailures)
}
Use Prometheus. Mustregister is to register the data directly to the default Registry, as in the example above, the default Registry does not require any additional code to pass the indicator. After registering, you can use the indicator at the program level, where we use the API (set and with () provided by the previously defined indicator. INC) To change the data content of the indicator
Func Main () {
cputemp.set (65.3)
Hdfailures.with (Prometheus. labels{"Device": "/DEV/SDA"}). INC ()
//The Handler function provides a default Handler to expose metrics
//via an HTTP server. "/metrics" is the usual endpoint for.
http. Handle ("/metrics", promhttp. Handler ())
log. Fatal (http. Listenandserve (": 8080", nil))
}
Where the WITH function is passed to the value on the previously defined label= "Device", that is, the build indicator is similar to
Cpu_temperature_celsius 65.3
hd_errors_total{"Device" = "/DEV/SDA"} 1
Of course, the way we write in the main function is problematic, so the indicator changes only once, and does not change with the next time we collect the data, we hope that every time we perform the acquisition, the program will automatically fetch the indicator and pass the data to us via HTTP.
counter Data Acquisition Example
The following is an example of collecting counter type data, this example implements a custom, satisfies the collector (Collector) interface of the structure, and manually register the structure, so that it automatically perform the acquisition task each time the query.
Let's first look at the implementation of the Collector interface of the collector
Type Collector Interface {
//is used to pass a definition descriptor for all possible indicators
//can add a new description during program run, collect new indicator information
//Duplicate descriptor will be ignored. Two different collector do not set the same descriptor
Describe (chan<-*desc)
//Prometheus The Registrar calls collect perform the actual fetch parameters work,
// and passing the collected data to the channel returns
//collects indicator information from describe, which can be executed concurrently, but must be secured by the thread.
Collect (chan<-Metric)
}
Understanding the implementation of the interface, we can write their own implementation, first define the structure, which is a cluster of indicator collectors, each cluster has its own zone, representing the name of the cluster. The other two are saved by the collected indicators.
Type Clustermanager struct {
Zone string
oomcountdesc *prometheus. Desc
Ramusagedesc *prometheus. Desc
}
We come to implement a collection work, put in the reallyexpensiveassessmentofthesystemstate function implementation, each execution, the return of a host name as a key to collect data, The two return values represent the Oom error count, respectively, and the RAM usage metric information.
Func (c *clustermanager) reallyexpensiveassessmentofthesystemstate () (
oomcountbyhost Map[string]int, Ramusagebyhost Map[string]float64,
) {
oomcountbyhost = map[string]int{
"foo.example.org": Int (rand. INT31N (+)),
"bar.example.org": Int (rand. INT31N (+)),
}
ramusagebyhost = map[string]float64{
"foo.example.org": Rand. Float64 () *,
"bar.example.org": Rand. Float64 () * +,
}
return
}
Implements the describe interface, passing the indicator descriptor to the channel
Describe simply sends the DESCS in the the channel.
Func (c *clustermanager) Describe (ch chan<-*prometheus. DESC) {
ch <-c.oomcountdesc
ch <-c.ramusagedesc
}
The Collect function executes the FETCH function and returns the data, the returned data is passed to the channel, and the original indicator descriptor is bound to be passed. and the type of indicator (a counter and a guage)
Func (c *clustermanager) Collect (Ch chan<-Prometheus. Metric) {
oomcountbyhost, Ramusagebyhost: = C.reallyexpensiveassessmentofthesystemstate ()
for host, Oomcount : = range oomcountbyhost {
ch <-Prometheus. Mustnewconstmetric (
C.oomcountdesc,
Prometheus. Countervalue,
float64 (Oomcount),
Host,
)
}
for Host, ramusage: = range ramusagebyhost {
ch <-Prometheus. Mustnewconstmetric (
C.ramusagedesc,
Prometheus. Gaugevalue,
Ramusage,
host,
)
}
}
Create structure and corresponding indicator information, NEWDESC parameter first is the name of the indicator, the second is the Help information, displayed on the indicator as a comment, the third is the definition of the label name Array, the fourth is the definition of labels
Func Newclustermanager (Zone string) *clustermanager {
return &clustermanager{
zone:zone,
Oomcountdesc:prometheus. Newdesc (
"Clustermanager_oom_crashes_total",
"number of oom crashes.",
[]string{"host"},
Prometheus. labels{"zone": Zone},
ramusagedesc:prometheus. Newdesc (
"Clustermanager_ram_usage_bytes",
"RAM usage as reported to the cluster manager.",
[]string{] Host "},
Prometheus. labels{"zone": Zone},
),
}
}
Executing the main program
Func Main () {
Workerdb: = Newclustermanager ("db")
Workerca: = Newclustermanager ("Ca")
//Since we are Dealing with custom Collector implementations, it might
/is a good idea to try it out with a pedantic registry.
Reg: = Prometheus. Newpedanticregistry ()
Reg. Mustregister (WORKERDB)
Reg. Mustregister (Workerca)
}
If we execute the above parameters directly, we will not get any parameters, because the program will be introduced automatically, we do not define the HTTP interface to expose the data, so the data will need to define a HttpHandler to handle the HTTP request when executing.
Adding the following code to the main function allows data to be passed to the HTTP interface:
Gatherers: = Prometheus. gatherers{
Prometheus. Defaultgatherer,
Reg,
}
H: = Promhttp. Handlerfor (Gatherers,
promhttp. handleropts{
errorlog: log. Newerrorlogger (),
errorhandling:promhttp. ContinueOnError,
})
http. Handlefunc ("/metrics", func (w http). Responsewriter, R *http. Request) {
h.servehttp (W, R)
})
Log.infoln ("Start server at:8080")
if err: = http. Listenandserve (": 8080", nil); Err! = Nil {
log. Errorf ("Error occur when start server%v", err)
OS. Exit (1)
}
Which Prometheus. Gatherers is used to define a collection of collected data, you can merge multiple different acquisition data into a result set, here we pass the default defaultgatherer, so he will also include the Go Runtime indicator information in the output. The inclusion of Reg is a registered object that we have previously generated and collects data from the definition.
Promhttp. The Handlerfor () function passes the gatherers object before it and returns a HttpHandler object, which can call its own servhttp function to take over the HTTP request and return the response. Which promhttp. Handleropts defines the acquisition process and continues to collect additional data if an error occurs.
Try refreshing the browser several times to get the latest indicator information
Clustermanager_oom_crashes_total{host= "bar.example.org", zone= "Ca"} 364
clustermanager_oom_crashes_total{ Host= "bar.example.org", zone= "db"}
clustermanager_oom_crashes_total{host= "foo.example.org", zone= "Ca"} 844
clustermanager_oom_crashes_total{host= "foo.example.org", zone= "DB"} 801
# Help Clustermanager_ram_usage_ Bytes RAM usage as reported to the cluster manager.
# TYPE Clustermanager_ram_usage_bytes gauge
clustermanager_ram_usage_bytes{host= "bar.example.org", zone= "Ca"} 10.738111282075208
clustermanager_ram_usage_bytes{host= "bar.example.org", zone= "db"} 19.003276633920805
Clustermanager_ram_usage_bytes{host= "foo.example.org", zone= "Ca"} 79.72085409108028
Clustermanager_ram_usage _bytes{host= "foo.example.org", zone= "db"} 13.041384617379178
Each time we refresh, we get different data, similar to the one that implements a constantly changing number of collectors. Of course, the specific indicators and acquisition functions also need to be modified to meet the actual business needs.