Istio Source Analysis--mixer Telemetry report

Source: Internet
Author: User
Tags k8s

Original: Istio Source analysis--mixer Telemetry report

Statement

    1. This article needs to understand Istio,k8s,golang,envoy,mixer basics
    2. The analyzed environment is K8s,istio version 0.8.0

What are telemetry reports?

This article mainly introduces a GRPC interface provided by mixer, which is responsible for receiving envoy escalated logs and showing the logs in Stdio and Prometheus. The word "telemetry report" was borrowed from Istio's Chinese translation document, and the first time I heard the word felt strange, very tall. By understanding the source code, using the word "log subscription" to understand the role of this interface will be easier. In a word to summarize the function of this interface: I have these logs, what do you want to do? Stdio and Prometheus are just another form of presentation for these logs.
istio.io/istio/mixer/pkg/api/grpcServer.go #187func (s *grpcServer) Report(legacyCtx legacyContext.Context, req *mixerpb.ReportRequest) (*mixerpb.ReportResponse, error) {  ......  var errors *multierror.Error  for i := 0; i < len(req.Attributes); i++ {    ......    if i > 0 {      if err := accumBag.UpdateBagFromProto(&req.Attributes[i], s.globalWordList); err != nil {        ......        break      }    }    ......    if err := s.dispatcher.Preprocess(newctx, accumBag, reportBag); err != nil {      ......    }    ......    if err := reporter.Report(reportBag); err != nil {      ......      continue    }    ......  }  ......  if err := reporter.Flush(); err != nil {    errors = multierror.Append(errors, err)  }  reporter.Done()  ......  return reportResp, nil}

What data received--reportrequest received

the second parameter of the report interface is the data that envoy escalated to mixer. The following data source: Print the logs to the terminal and then intercept them.

Structure

istio.io/api/mixer/v1/report.pb.go #22type ReportRequest struct {  ......  Attributes []CompressedAttributes `protobuf:"bytes,1,rep,name=attributes" json:"attributes"`  ......  DefaultWords []string   ......  GlobalWordCount uint32 `protobuf:"varint,3,opt,name=global_word_count,json=globalWordCount,proto3" json:"global_word_count,omitempty"`}

The data received

req. Attributes:
[{"strings":{"131":92,"152":-1,"154":-2,"17":-7,"18":-4,"19":90,"22":92},"int64s":{"1":33314,"151":8080,"169":292,"170":918,"23":0,"27":780,"30":200},"bools":{"177":false},"timestamps":{"24":"2018-07-05T08:12:20.125365976Z","28":"2018-07-05T08:12:20.125757852Z"},"durations":{"29":426699},"bytes":{"0":"rBQDuw==","150":"AAAAAAAAAAAAAP//rBQDqg=="},"string_maps":{"15":{"entries":{"100":92,"102":-5,"118":113,"119":-3,"31":-4,"32":90,"33":-7,"55":134,"98":-6}},"26":{"entries":{"117":134,"35":136,"55":-9,"58":110,"60":-8,"82":93}}}}]

req. Defaultwords :
["istio-pilot.istio-system.svc.cluster.local","kubernetes://istio-pilot-8696f764dd-fqxtg.istio-system","1000","rds","3a7a649f-4eeb-4d70-972c-ad2d43a680af","172.00.00.000","/v1/routes/8088/index/sidecar~172.20.3.187~index-85df88964c-tzzds.default~default.svc.cluster.local","Thu, 05 Jul 2018 08:12:19 GMT","780","/v1/routes/9411/index/sidecar~172.00.00.000~index-85df88964c-tzzds.default~default.svc.cluster.local","bc1f172f-b8e3-4ec0-a070-f2f6de38a24f","718"]

req. Globalwordcount:
178

The
first time you see this data is full of question marks, and the official website of the properties of the word is not associated with a point of view. Our main concern in these data is the type of attributes: ... strings int64s And those strange numbers. These mysteries are revealed below.

Data Conversion--updatebagfromproto

GlobalList

> istio.io/istio/mixer/pkg/attribute/list.gen.go #13    globalList = []string{      "source.ip",      "source.port",      "source.name",      ......    }

Updatebagfromproto

  istio.io/istio/mixer/pkg/attribute/mutablebag.go #3018func (MB *mutablebag) Updatebagfromproto (Attrs * Mixerpb.compressedattributes, Globalwordlist []string) error {messagewordlist: = attrs. Words. LG ("Setting String Attributes:") for K, V: = Range attrs. Strings {name, E = lookup (k, E, globalwordlist, messagewordlist) value, E = Lookup (V, E, globalwordlist, Messagewor dlist) If Err: = mb.insertprotoattr (name, value, seen, LG); Err! = Nil {return err}} LG ("Setting Int64 attributes:") ... lg ("Setting Double attributes:") .....  . LG ("setting BOOL Attributes:") ... lg ("Setting timestamp attributes:") ... lg ("Setting Duration attributes : ") ... lg (" Setting bytes attributes: ") ... lg (" Setting string map attributes: ") ... return e} "/" pre>
The
Istio property is strongly typed, so the data conversion is converted according to type one by one. It can be seen from DefaultWords and globalList composed of a dictionary, and recorded the location of the Attributes reported data, UpdateBagFromProto the processing, the final conversion to: the official attribute vocabulary.

Conversion results

connection.mtls               : falsecontext.protocol              : httpdestination.port              : 8080......request.host                  : rdsrequest.method                : GET......

Data processing--preprocess

The
result of this method in the K8S environment is the Append data
istio.io/istio/mixer/template/template.gen.go #33425outBag := newWrapperAttrBag(  func(name string) (value interface{}, found bool) {    field := strings.TrimPrefix(name, fullOutName)    if len(field) != len(name) && out.WasSet(field) {      switch field {      case "source_pod_ip":        return []uint8(out.SourcePodIp), true      case "source_pod_name":        return out.SourcePodName, true        ......      default:        return nil, false      }    }    return attrs.Get(name)  }  ......)return mapper(outBag)

The final appended data

destination.labels            : map[istio:pilot pod-template-hash:4252932088]destination.namespace         : istio-system......

Data distribution--report

     Report distributes the data to Variety = istio_adapter_model_v1beta1. Template_variety_report 's Template , and of course some filtering conditions, are distributed in the current environment to LogEntry and Metri C .
istio.io/istio/mixer/pkg/runtime/dispatcher/session.go #105func (s *session) dispatch() error {  ......  for _, destination := range destinations.Entries() {    var state *dispatchState    if s.variety == tpb.TEMPLATE_VARIETY_REPORT {      state = s.reportStates[destination]      if state == nil {        state = s.impl.getDispatchState(ctx, destination)        s.reportStates[destination] = state      }    }    for _, group := range destination.InstanceGroups {      ......      for j, input := range group.Builders {        ......        var instance interface{}        //把日志绑定到 Template里        if instance, err = input.Builder(s.bag); err != nil{          ......          continue        }        ......        if s.variety == tpb.TEMPLATE_VARIETY_REPORT {          state.instances = append(state.instances, instance)          continue        }        ......      }    }  }  ......  return nil}

Data display--Asynchronous flush

flush is to let logentry and Metric call the respective adapter data processing, because of their own adapter no dependencies, so this is used Golang the process of asynchronous processing.
istio.io/istio/mixer/pkg/runtime/dispatcher/session.go #200func (s *session) dispatchBufferedReports() {    // Ensure that we can run dispatches to all destinations in parallel.    s.ensureParallelism(len(s.reportStates))    // dispatch the buffered dispatchStates we've got    for k, v := range s.reportStates {        //在这里会把 v 放入协程进行处理      s.dispatchToHandler(v)      delete(s.reportStates, k)    }    //等待所有adapter完成    s.waitForDispatched()}

Federation Pool

from the above see v is put into the process of processing, in fact mixer here used the pool. Using a pool can reduce the creation and destruction of co-processes, as well as control the number of threads in the service, thus reducing the resource consumption of the system. Mixer's pool belongs in advance to create a certain number of co-processes that are provided to business use, and if the pool does not finish the work of the business, it needs to block the wait. Here are the steps for mixer to use the Federation pool.
    • Initializing the Federation Pool
build a queue that has a length channel that we can call it.
istio.io/istio/mixer/pkg/pool/goroutine.go func NewGoroutinePool(queueDepth int, singleThreaded bool) *GoroutinePool {  gp := &GoroutinePool{    queue:          make(chan work, queueDepth),    singleThreaded: singleThreaded,  }  gp.AddWorkers(1)  return gp}
    • Put the task into the queue
put executable functions and parameters into the queue as a task
func (gp *GoroutinePool) ScheduleWork(fn WorkFunc, param interface{}) {    if gp.singleThreaded {        fn(param)    } else {        gp.queue <- work{fn: fn, param: param}    }}
    • Let the workers work
How many workers can be allocated by resource, workers are constantly getting tasks from the queue to execute
func (gp *GoroutinePool) AddWorkers(numWorkers int) {  if !gp.singleThreaded {    gp.wg.Add(numWorkers)    for i := 0; i < numWorkers; i++ {      go func() {        for work := range gp.queue {          work.fn(work.param)        }        gp.wg.Done()      }()    }  }}

LogEntry Adapter Prints the data to the terminal (stdio)

    • and adapter Interactive
each Template has its own DispatchReport , it is responsible and adapter interactive, and the logs are displayed.
istio.io/istio/mixer/template/template.gen.go #1311logentry.TemplateName: {    Name:  logentry.TemplateName,    Impl:  "logentry",    CtrCfg:   &logentry.InstanceParam{},    Variety:  istio_adapter_model_v1beta1.TEMPLATE_VARIETY_REPORT,    ......    DispatchReport: func(ctx context.Context, handler adapter.Handler, inst []interface{}) error {        ......        instances := make([]*logentry.Instance, len(inst))        for i, instance := range inst {          instances[i] = instance.(*logentry.Instance)        }        // Invoke the handler.        if err := handler.(logentry.Handler).HandleLogEntry(ctx, instances); err != nil {            return fmt.Errorf("failed to report all values: %v", err)        }        return nil    },}
    • Log Data collation
istio.io/istio/mixer/adapter/stdio/stdio.go #53func (h *handler) HandleLogEntry(_ context.Context, instances []*logentry.Instance) error {    var errors *multierror.Error    fields := make([]zapcore.Field, 0, 6)    for _, instance := range instances {      ......      for _, varName := range h.logEntryVars[instance.Name] {          //过滤adapter不要的数据        if value, ok := instance.Variables[varName]; ok {            fields = append(fields, zap.Any(varName, value))        }      }      if err := h.write(entry, fields); err != nil {          errors = multierror.Append(errors, err)      }      fields = fields[:0]    }    return errors.ErrorOrNil()}
each adapter has its own desired data, which can be configured under the startup file istio-demo.yaml .
apiVersion: "config.istio.io/v1alpha2"    kind: logentry    metadata:      name: accesslog      namespace: istio-system    spec:      severity: '"Info"'      timestamp: request.time      variables:        originIp: origin.ip | ip("0.0.0.0")        sourceIp: source.ip | ip("0.0.0.0")        sourceService: source.service | ""        ......
    • Show results
The following logs are intercepted from the mixer terminal
{"level":"info","time":"2018-07-15T09:27:30.739801Z","instance":"accesslog.logentry.istio-system","apiClaims":"","apiKey":"","apiName":"","apiVersion":"","connectionMtls":false,"destinationIp":"10.00.0.00","destinationNamespace":"istio-system"......}

Problem

through the analysis of this interface source code, we found some problems:
    1. Interface needs to process all the adapter responses returned
    2. If the process pool is blocked, the interface needs to wait
Based on the above two points we think: If the association pool is blocked, the interface response will be slower, whether it will affect the business request? Mixer and SPOF, a istio official blogger from a Chinese translator, know that envoy data escalation is done asynchronously through the "Fire-and-forget" mode. But because there is no C + + Foundation, I don't quite understand how this "Fire-and-forget" is implemented.

Because of the above question, we conducted a simulation test. The hypothesis of this test is that the interface is blocked, delaying the 50ms,100ms,150ms,200ms,250ms,300ms "analog blocking Time", and observing the impact on the business request under the same pressure.

    • Environment: Docker for k8s under Mac Air
    • Pressure measuring tool: hey
    • Pressure:-C 50-n 200 "Computer Configuration is not high"
    • PC Configuration i5 4G
    • Pressure test command: hey-c 50-n http://127.0.0.1:30935/sleep
    • Service code to be measured
    • Add the delay code to the mixer interface:
func (s *grpcServer) Report(legacyCtx legacyContext.Context, req *mixerpb.ReportRequest) (*mixerpb.ReportResponse, error) {    time.Sleep(50 * time.Microsecond)    ......    return reportResp, nil}

Attention

Each data result of the pressure measurement is preheated, measured 10 times and obtained from the median.

Results:

as we can see, with the increase in latency, the QPS of business processing is also declining. This means that under the current 0.8.0 version, the pool processing task is not fast enough to "go faster" and there is a blocking phenomenon that can affect the business request. Of course we can solve this problem by scaling the mixer or increasing the number of workers in the co-constructor. but I think the main problem is blocking this step. Without blocking, the business will not be affected .

Learn from each other and avoid blocking Jaeger.

the log data processing scenario here is similar to the Jaeger that was previously known. Jaeger and mixer deal with log data, so they can learn from each other. Jaeger also has its own pool of pools, and the idea of a mixer pool is the same, although the implementation details are different. So what happens if we get into a faster-than -Jaeger situation? The specific scene can be seen here.
github.com/jaegertracing/jaeger/pkg/queue/bounded_queue.go #76func (q *BoundedQueue) Produce(item interface{}) bool {    if atomic.LoadInt32(&q.stopped) != 0 {        q.onDroppedItem(item)        return false    }    select {    case q.items <- item:        atomic.AddInt32(&q.size, 1)        return true    default:        //丢掉数据        if q.onDroppedItem != nil {            q.onDroppedItem(item)        }        return false    }}
above is the source of Jaeger, here and mixer ScheduleWork , one of the differences is that if the Jaeger queue is items full, and data come in, the data will be discarded, so as to avoid blocking. This idea can also be used in mixer log processing, at the expense of some log data, to ensure the stability of business requests. After all, the location of the business is the most important.

Related Blogs

Adapter model for Mixer

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.