Objective
In the K8S resource audit and billing this piece, the container and the virtual machine have very big difference. The container is not easy to implement relative to the virtual machine.
Resource metrics can be collected using Heapster or Prometheus. Before the article has introduced, Prometheus Storage bottleneck and query large data volume, easy to oom these two problems. So I chose the heapster. In addition, Heapster not only internally implemented a lot of aggregator and calculator, and did a lot of work on the aggregation layer. With Prometheus, you need to do aggregations at query time.
The heapster supports many metrics outputs, called sink. Currently supported sink such as:
And I prefer to clickhouse database, about Clickhouse, in fact, the previous article introduced a lot.
So this paper mainly talk about how to increase clickhouse sink for Heapster.
Code Analysis and implementation
Looking at the code, adding a sink is still very simple. A typical factory design pattern, the implementation of Name,stop,exportdata interface method can be. Finally, an initialization function is provided for factory invocation.
Initialization Method Newclickhousesink
Specific code:
config, err := clickhouse_common.BuildConfig(uri) if err != nil { return nil, err } client, err := sql.Open("clickhouse", config.DSN) if err != nil { glog.Errorf("connecting to clickhouse: %v", err) return nil, err } sink := &clickhouseSink{ c: *config, client: client, conChan: make(chan struct{}, config.Concurrency), } glog.Infof("created clickhouse sink with options: host:%s user:%s db:%s", config.Host, config.UserName, config.Database) return sink, nil
Basically is to get the configuration file, initialize the Clickhouse client.
In the build method in Factory.go, add the initialization function that you just implemented
Func (this *sinkfactory) Build (URI flags. Uri) (core. Datasink, error) {switch URI. Key {case "Elasticsearch": Return Elasticsearch. Newelasticsearchsink (&uri. Val) Case "GCM": Return GCM. Creategcmsink (&uri. Val) Case "Stackdriver": Return stackdriver. Createstackdriversink (&uri. Val) Case "STATSD": Return STATSD. Newstatsdsink (&uri. Val) Case "graphite": return graphite. Newgraphitesink (&uri. Val) Case "Hawkular": Return hawkular. Newhawkularsink (&uri. Val) Case "Influxdb": Return INFLUXDB. Createinfluxdbsink (&uri. Val) Case "Kafka": Return Kafka. Newkafkasink (&uri. Val) Case "Librato": Return Librato. Createlibratosink (&uri. Val) Case "log": Return Logsink. Newlogsink (), Nil case "metric": Return Metricsink. Newmetricsink (140*time. Second, 15*time. Minute, []string{core. MetricCpuUsageRate.MetricDescriptor.Name, Core. Metricmemoryusage.meTricdescriptor.name}), nil case "Opentsdb": Return OPENTSDB. Createopentsdbsink (&uri. Val) Case "Wavefront": Return wavefront. Newwavefrontsink (&uri. Val) Case "Riemann": Return Riemann. Createriemannsink (&uri. Val) Case "honeycomb": return honeycomb. Newhoneycombsink (&uri. Val) Case "Clickhouse": Return clickhouse. Newclickhousesink (&uri. Val) Default:return Nil, FMT. Errorf ("Sink not recognized:%s", Uri.) Key)}}
Name and Stop
func (sink *clickhouseSink) Name() string { return "clickhouse"}func (tsdbSink *clickhouseSink) Stop() { // Do nothing}
The Stop function is called when Heapster is closed and performs some unmanaged resource shutdowns.
ExportData
This is the core of the place.
Func (sink *clickhousesink) exportdata (Databatch *core. Databatch) {sink. Lock () defer sink. Unlock () If Err: = Sink.client.Ping (); Err! = Nil {glog. WARNINGF ("Failed to ping Clickhouse:%v", err) return} datapoints: = Do ([]point, 0, 0) for _, Metricset : = Range Databatch.metricsets {for metricname, Metricvalue: = Range Metricset.metricvalues {var value Float64 if core. ValueInt64 = = Metricvalue.valuetype {value = Float64 (metricvalue.intvalue)} else if core. Valuefloat = = Metricvalue.valuetype {value = Float64 (metricvalue.floatvalue)} else { Continue} pt: = point{name:metricname, cluster:sink.c.clu Stername, Val:value, Ts:dataBatch.Timestamp,} for key, Val UE: = Range Metricset.labels {If _, exists: = ClickhouseblacklistLabels[key];!exists {if Value! = "" {if key = = "Labels" { LBS: = Strings. Split (Value, ",") for _, LB: = range lbs {ts: = strings. Split (LB, ":") If Len (ts) = = 2 && ts[0]! = "" && ts[1]! = "" { Pt.tags = Append (Pt.tags, FMT. Sprintf ("%s=%s", Ts[0], ts[1])}}} els e {pt.tags = append (Pt.tags, FMT. Sprintf ("%s=%s", Key, Value)}}}} Datap oints = Append (datapoints, PT) If Len (datapoints) >= sink.c.batchsize {Sink.concurrentsendda Ta (datapoints) datapoints = make ([]point, 0, 0)}} If Len (datapoints) >= 0 { Sink.concUrrentsenddata (datapoints)} sink.wg.Wait ()}
There are several places to be aware of:
- Format conversion of the data. You need to convert databatch in Heapster to the format you want to store. In fact, this one has done pipeline more output, it is easy to understand.
- Bulk write. In general, bulk writing is an effective means of large data volumes.
- The concurrent write destination is stored according to the setup parameters. The Golang is used in the process of the association. The following code implements a process for sending data.
func (sink *clickhouseSink) concurrentSendData(dataPoints []point) { sink.wg.Add(1) // use the channel to block until there's less than the maximum number of concurrent requests running sink.conChan <- struct{}{} go func(dataPoints []point) { sink.sendData(dataPoints) }(dataPoints)}
Get configuration parameters
In Clickhouse.go, the main thing is to get the configuration parameters and parameters to initialize some default values, as well as the work of verifying the configuration parameters.
Changes to Dockerfile
The original base image is based on the scratch
FROM scratchCOPY heapster eventer /COPY ca-certificates.crt /etc/ssl/certs/# nobody:nobodyUSER 65534:65534ENTRYPOINT ["/heapster"]
Due to the need to change the timezone problem, changed to be based on Alpine.
FROM alpineRUN apk add -U tzdataRUN ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtimeCOPY heapster eventer /COPY ca-certificates.crt /etc/ssl/certs/RUN chmod +x /heapsterENTRYPOINT ["/heapster"]
In fact, it is possible to add timezone and change based on scratch, just to load some package instructions, and the result is that the mirror becomes larger. Rather than this, it's better to be based on the Alpine I'm more familiar with.
Summarize
The project address of the fork. Actual Run log:
Thanks to the excellent write performance of CK, the operation is very stable.