Original: Istio Source analysis--mixer Telemetry report
Statement
- This article needs to understand Istio,k8s,golang,envoy,mixer basics
- The analyzed environment is K8s,istio version 0.8.0
What are telemetry reports?
This article mainly introduces a GRPC interface provided by mixer, which is responsible for receiving envoy escalated logs and showing the logs in Stdio and Prometheus. The word "telemetry report" was borrowed from Istio's Chinese translation document, and the first time I heard the word felt strange, very tall. By understanding the source code, using the word
"log subscription" to understand the role of this interface will be easier. In a word to summarize the function of this interface: I have these logs, what do you want to do? Stdio and Prometheus are just another form of presentation for these logs.
istio.io/istio/mixer/pkg/api/grpcServer.go #187func (s *grpcServer) Report(legacyCtx legacyContext.Context, req *mixerpb.ReportRequest) (*mixerpb.ReportResponse, error) { ...... var errors *multierror.Error for i := 0; i < len(req.Attributes); i++ { ...... if i > 0 { if err := accumBag.UpdateBagFromProto(&req.Attributes[i], s.globalWordList); err != nil { ...... break } } ...... if err := s.dispatcher.Preprocess(newctx, accumBag, reportBag); err != nil { ...... } ...... if err := reporter.Report(reportBag); err != nil { ...... continue } ...... } ...... if err := reporter.Flush(); err != nil { errors = multierror.Append(errors, err) } reporter.Done() ...... return reportResp, nil}
What data received--reportrequest received
the second parameter of the report interface is the data that envoy escalated to mixer. The following data source: Print the logs to the terminal and then intercept them.
Structure
istio.io/api/mixer/v1/report.pb.go #22type ReportRequest struct { ...... Attributes []CompressedAttributes `protobuf:"bytes,1,rep,name=attributes" json:"attributes"` ...... DefaultWords []string ...... GlobalWordCount uint32 `protobuf:"varint,3,opt,name=global_word_count,json=globalWordCount,proto3" json:"global_word_count,omitempty"`}
The data received
req. Attributes:
[{"strings":{"131":92,"152":-1,"154":-2,"17":-7,"18":-4,"19":90,"22":92},"int64s":{"1":33314,"151":8080,"169":292,"170":918,"23":0,"27":780,"30":200},"bools":{"177":false},"timestamps":{"24":"2018-07-05T08:12:20.125365976Z","28":"2018-07-05T08:12:20.125757852Z"},"durations":{"29":426699},"bytes":{"0":"rBQDuw==","150":"AAAAAAAAAAAAAP//rBQDqg=="},"string_maps":{"15":{"entries":{"100":92,"102":-5,"118":113,"119":-3,"31":-4,"32":90,"33":-7,"55":134,"98":-6}},"26":{"entries":{"117":134,"35":136,"55":-9,"58":110,"60":-8,"82":93}}}}]
req. Defaultwords :
["istio-pilot.istio-system.svc.cluster.local","kubernetes://istio-pilot-8696f764dd-fqxtg.istio-system","1000","rds","3a7a649f-4eeb-4d70-972c-ad2d43a680af","172.00.00.000","/v1/routes/8088/index/sidecar~172.20.3.187~index-85df88964c-tzzds.default~default.svc.cluster.local","Thu, 05 Jul 2018 08:12:19 GMT","780","/v1/routes/9411/index/sidecar~172.00.00.000~index-85df88964c-tzzds.default~default.svc.cluster.local","bc1f172f-b8e3-4ec0-a070-f2f6de38a24f","718"]
req. Globalwordcount:
178
The
first time you see this data is full of question marks, and the official website of the properties of the word is not associated with a point of view. Our main concern in these data is the type of attributes: ...
strings
int64s
And those strange numbers. These mysteries are revealed below.
Data Conversion--updatebagfromproto
GlobalList
> istio.io/istio/mixer/pkg/attribute/list.gen.go #13 globalList = []string{ "source.ip", "source.port", "source.name", ...... }
Updatebagfromproto
istio.io/istio/mixer/pkg/attribute/mutablebag.go #3018func (MB *mutablebag) Updatebagfromproto (Attrs * Mixerpb.compressedattributes, Globalwordlist []string) error {messagewordlist: = attrs. Words. LG ("Setting String Attributes:") for K, V: = Range attrs. Strings {name, E = lookup (k, E, globalwordlist, messagewordlist) value, E = Lookup (V, E, globalwordlist, Messagewor dlist) If Err: = mb.insertprotoattr (name, value, seen, LG); Err! = Nil {return err}} LG ("Setting Int64 attributes:") ... lg ("Setting Double attributes:") ..... . LG ("setting BOOL Attributes:") ... lg ("Setting timestamp attributes:") ... lg ("Setting Duration attributes : ") ... lg (" Setting bytes attributes: ") ... lg (" Setting string map attributes: ") ... return e}
"/" pre>
The
Istio property is strongly typed, so the data conversion is converted according to type one by one. It can be seen from
DefaultWords
and
globalList
composed of a dictionary, and recorded the location of the
Attributes
reported data,
UpdateBagFromProto
the processing, the final conversion to: the official attribute vocabulary.
Conversion results
connection.mtls : falsecontext.protocol : httpdestination.port : 8080......request.host : rdsrequest.method : GET......
Data processing--preprocess
The
result of this method in the K8S environment is the Append data
istio.io/istio/mixer/template/template.gen.go #33425outBag := newWrapperAttrBag( func(name string) (value interface{}, found bool) { field := strings.TrimPrefix(name, fullOutName) if len(field) != len(name) && out.WasSet(field) { switch field { case "source_pod_ip": return []uint8(out.SourcePodIp), true case "source_pod_name": return out.SourcePodName, true ...... default: return nil, false } } return attrs.Get(name) } ......)return mapper(outBag)
The final appended data
destination.labels : map[istio:pilot pod-template-hash:4252932088]destination.namespace : istio-system......
Data distribution--report
Report
distributes the data to
Variety = istio_adapter_model_v1beta1. Template_variety_report
's
Template
, and of course some filtering conditions, are distributed in the current environment to
LogEntry
and
Metri C
.
istio.io/istio/mixer/pkg/runtime/dispatcher/session.go #105func (s *session) dispatch() error { ...... for _, destination := range destinations.Entries() { var state *dispatchState if s.variety == tpb.TEMPLATE_VARIETY_REPORT { state = s.reportStates[destination] if state == nil { state = s.impl.getDispatchState(ctx, destination) s.reportStates[destination] = state } } for _, group := range destination.InstanceGroups { ...... for j, input := range group.Builders { ...... var instance interface{} //把日志绑定到 Template里 if instance, err = input.Builder(s.bag); err != nil{ ...... continue } ...... if s.variety == tpb.TEMPLATE_VARIETY_REPORT { state.instances = append(state.instances, instance) continue } ...... } } } ...... return nil}
Data display--Asynchronous flush
flush is to let
logentry
and
Metric
call the respective
adapter
data processing, because of their own
adapter
no dependencies, so this is used Golang the process of asynchronous processing.
istio.io/istio/mixer/pkg/runtime/dispatcher/session.go #200func (s *session) dispatchBufferedReports() { // Ensure that we can run dispatches to all destinations in parallel. s.ensureParallelism(len(s.reportStates)) // dispatch the buffered dispatchStates we've got for k, v := range s.reportStates { //在这里会把 v 放入协程进行处理 s.dispatchToHandler(v) delete(s.reportStates, k) } //等待所有adapter完成 s.waitForDispatched()}
Federation Pool
from the above see
v
is put into the process of processing, in fact mixer here used the pool. Using a pool can reduce the creation and destruction of co-processes, as well as control the number of threads in the service, thus reducing the resource consumption of the system. Mixer's pool belongs in advance to create a certain number of co-processes that are provided to business use, and if the pool does not finish the work of the business, it needs to block the wait. Here are the steps for mixer to use the Federation pool.
- Initializing the Federation Pool
build a queue that has a length
channel
that we can call it.
istio.io/istio/mixer/pkg/pool/goroutine.go func NewGoroutinePool(queueDepth int, singleThreaded bool) *GoroutinePool { gp := &GoroutinePool{ queue: make(chan work, queueDepth), singleThreaded: singleThreaded, } gp.AddWorkers(1) return gp}
- Put the task into the queue
put executable functions and parameters into the queue as a task
func (gp *GoroutinePool) ScheduleWork(fn WorkFunc, param interface{}) { if gp.singleThreaded { fn(param) } else { gp.queue <- work{fn: fn, param: param} }}
How many workers can be allocated by resource, workers are constantly getting tasks from the queue to execute
func (gp *GoroutinePool) AddWorkers(numWorkers int) { if !gp.singleThreaded { gp.wg.Add(numWorkers) for i := 0; i < numWorkers; i++ { go func() { for work := range gp.queue { work.fn(work.param) } gp.wg.Done() }() } }}
LogEntry Adapter Prints the data to the terminal (stdio)
each
Template
has its own
DispatchReport
, it is responsible and
adapter
interactive, and the logs are displayed.
istio.io/istio/mixer/template/template.gen.go #1311logentry.TemplateName: { Name: logentry.TemplateName, Impl: "logentry", CtrCfg: &logentry.InstanceParam{}, Variety: istio_adapter_model_v1beta1.TEMPLATE_VARIETY_REPORT, ...... DispatchReport: func(ctx context.Context, handler adapter.Handler, inst []interface{}) error { ...... instances := make([]*logentry.Instance, len(inst)) for i, instance := range inst { instances[i] = instance.(*logentry.Instance) } // Invoke the handler. if err := handler.(logentry.Handler).HandleLogEntry(ctx, instances); err != nil { return fmt.Errorf("failed to report all values: %v", err) } return nil },}
istio.io/istio/mixer/adapter/stdio/stdio.go #53func (h *handler) HandleLogEntry(_ context.Context, instances []*logentry.Instance) error { var errors *multierror.Error fields := make([]zapcore.Field, 0, 6) for _, instance := range instances { ...... for _, varName := range h.logEntryVars[instance.Name] { //过滤adapter不要的数据 if value, ok := instance.Variables[varName]; ok { fields = append(fields, zap.Any(varName, value)) } } if err := h.write(entry, fields); err != nil { errors = multierror.Append(errors, err) } fields = fields[:0] } return errors.ErrorOrNil()}
each
adapter
has its own desired data, which can be configured under the startup file
istio-demo.yaml
.
apiVersion: "config.istio.io/v1alpha2" kind: logentry metadata: name: accesslog namespace: istio-system spec: severity: '"Info"' timestamp: request.time variables: originIp: origin.ip | ip("0.0.0.0") sourceIp: source.ip | ip("0.0.0.0") sourceService: source.service | "" ......
The following logs are intercepted from the mixer terminal
{"level":"info","time":"2018-07-15T09:27:30.739801Z","instance":"accesslog.logentry.istio-system","apiClaims":"","apiKey":"","apiName":"","apiVersion":"","connectionMtls":false,"destinationIp":"10.00.0.00","destinationNamespace":"istio-system"......}
Problem
through the analysis of this interface source code, we found some problems:
- Interface needs to process all the
adapter
responses returned
- If the process pool is blocked, the interface needs to wait
Based on the above two points we think: If the association pool is blocked, the interface response will be slower, whether it will affect the business request? Mixer and SPOF, a istio official blogger from a Chinese translator, know that envoy data escalation is done asynchronously through the "Fire-and-forget" mode. But because there is no C + + Foundation, I don't quite understand how this "Fire-and-forget" is implemented.
Because of the above question, we conducted a simulation test. The hypothesis of this test is that the interface is blocked, delaying the 50ms,100ms,150ms,200ms,250ms,300ms "analog blocking Time", and observing the impact on the business request under the same pressure.
- Environment: Docker for k8s under Mac Air
- Pressure measuring tool: hey
- Pressure:-C 50-n 200 "Computer Configuration is not high"
- PC Configuration i5 4G
- Pressure test command: hey-c 50-n http://127.0.0.1:30935/sleep
- Service code to be measured
- Add the delay code to the mixer interface:
func (s *grpcServer) Report(legacyCtx legacyContext.Context, req *mixerpb.ReportRequest) (*mixerpb.ReportResponse, error) { time.Sleep(50 * time.Microsecond) ...... return reportResp, nil}
Attention
Each data result of the pressure measurement is preheated, measured 10 times and obtained from the median.
Results:
as we can see, with the increase in latency, the QPS of business processing is also declining. This means that under the current 0.8.0 version, the pool processing task is not fast enough to "go faster" and there is a blocking phenomenon that can affect the business request. Of course we can solve this problem by scaling the mixer or increasing the number of workers in the co-constructor.
but I think the main problem is blocking this step. Without blocking, the business will not be affected .
Learn from each other and avoid blocking Jaeger.
the log data processing scenario here is similar to the Jaeger that was previously known. Jaeger and mixer deal with log data, so they can learn from each other. Jaeger also has its own pool of pools, and the idea of a mixer pool is the same, although the implementation details are different. So what happens if we
get into a faster-than -Jaeger situation? The specific scene can be seen here.
github.com/jaegertracing/jaeger/pkg/queue/bounded_queue.go #76func (q *BoundedQueue) Produce(item interface{}) bool { if atomic.LoadInt32(&q.stopped) != 0 { q.onDroppedItem(item) return false } select { case q.items <- item: atomic.AddInt32(&q.size, 1) return true default: //丢掉数据 if q.onDroppedItem != nil { q.onDroppedItem(item) } return false }}
above is the source of Jaeger, here and mixer
ScheduleWork
, one of the differences is that if the Jaeger queue is
items
full, and data come in, the data will be discarded, so as to avoid blocking. This idea can also be used in mixer log processing, at the expense of some log data, to ensure the stability of business requests. After all, the location of the business is the most important.
Related Blogs
Adapter model for Mixer