International - English

Cart Console

Topic Center

Contact Sales

Home > Developer > Go

Istio Source Analysis--poilt-discovery Service discovery and Configuration Center

Last Update:2018-06-22 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Original: Istio Source analysis--poilt-discovery Service discovery and Configuration Center

Statement

This article needs to understand Istio,k8s,golang,envoy basics
The analyzed environment is K8s,istio version 0.8.0

The role of Poilt-discovery

envoy provides a common set of data surface interfaces that enable service discovery and configuration to be implemented dynamically through interfaces. In Istio need to integrate K8s,consul and other service discovery system, so need an intermediary collation in K8s,consul Service registration and configuration information, and provide to envoy.

Envoy V1 API and V2 API differences

V1 version API and V2 version API have a history, details can be crossing blog. At the beginning of envoy Open source, dynamic service discovery and configuration is implemented using http+ polling, but this approach has the following drawbacks:

The use of weak types in interface data makes it difficult to implement some common services.
The control surface prefers to use push to reduce the time it takes to transfer data when it is updated.

with the enhanced collaboration with Google, the official use of GRPC + push developed the V2 version of the API, the implementation of the V1 version of the Sds/cds/rds/lds interface, continue to support the JSON/YAML data format, but also increased the ads (sds/cds/rds/ LDS4 interfaces together, HDS and other interfaces.

Building the underlying cache data

In
fact, Pilot-discovery has been considered a small non-persistent key/value database, it has istio configuration information and service registration information are cached. This allows the configuration to take effect more quickly.

What data was cached

Istio Configuration

istio.io/istio/pilot/pkg/model/config.govar (    ......    // RouteRule describes route rules    RouteRule = ProtoSchema{      Type:        "route-rule",      ......    }    // VirtualService describes v1alpha3 route rules    VirtualService = ProtoSchema{      Type:        "virtual-service",      ......    }    // Gateway describes a gateway (how a proxy is exposed on the network)    Gateway = ProtoSchema{      Type:        "gateway",      ......    }    // IngressRule describes ingress rules    IngressRule = ProtoSchema{      Type:        "ingress-rule",      ......    })

do a novice task of the classmate, should be very familiar with the above Type , is the configuration information inside kind , configuration information saved into k8s, will be pilot-discovery through Api-server crawl over to cache.

apiVersion: networking.istio.io/v1alpha3kind: VirtualServicemetadata:  name: reviews  ......

Service registration information obtained from k8s

  > Istio.io/istio/pilot/pkg/serviceregistry/kube/controller.go #102func newcontroller (client Kubernetes. Interface, Options Controlleroptions) *controller {... out.services = Out.createinformer (&v1. service{}, "Service", options. Resyncperiod, func (OPTs Meta_v1. listoptions) (runtime. Object, error) {return client. CoreV1 (). Services (options. Watchednamespace). List (OPTs)}, func (OPTs Meta_v1. listoptions) (watch. Interface, error) {return client. CoreV1 (). Services (options. Watchednamespace). Watch (opts)}) ... out.nodes = Out.createinformer (&v1. node{}, "Node", options. Resyncperiod, func (OPTs Meta_v1. listoptions) (runtime. Object, error) {return client. CoreV1 (). Nodes (). List (OPTs)}, func (OPTs Meta_v1. listoptions) (watch. Interface, error) {return client. CoreV1 (). Nodes (). Watch (opts)}) ... return out}

There are other data not listed, as can be seen from the above, the establishment of the cache is through the List and Watch mode (istio configuration data also), List: The first initialization of data, Watch: By polling the way to get the data and cache.

Request address converted To

Https://{k8s.ip}:443/apis/config.istio.io/v1alpha2/httpapispecs?limit=500&resourceversion=0https://{k8s.ip }:443/apis/config.istio.io/v1alpha2/servicerolebindings?limit=500&resourceversion=0https://{k8s.ip}:443/ apis/networking.istio.io/v1alpha3/virtualservices?limit=500&resourceversion=0https://{k8s.ip}:443/apis/ config.istio.io/v1alpha2/quotaspecbindings?limit=500&resourceversion=0https://{k8s.ip}:443/apis/ config.istio.io/v1alpha2/serviceroles?limit=500&resourceversion=0https://{k8s.ip}:443/apis/ networking.istio.io/v1alpha3/serviceentries?limit=500&resourceversion=0https://{k8s.ip}:443/apis/ config.istio.io/v1alpha2/routerules?limit=500&resourceversion=0https://{k8s.ip}:443/apis/config.istio.io/ V1alpha2/egressrules?limit=500&resourceversion=0https://{k8s.ip}:443/apis/authentication.istio.io/v1alpha1 /policies?limit=500&resourceversion=0https://{k8s.ip}:443/apis/config.istio.io/v1alpha2/ HTTPAPISPECBINDINGS?LIMIT=500&AMP;RESOURCEVERSION=0HTTPS://{K8S.IP}:443/apis/networking.istio.io/v1alpha3/destinationrules?limit=500&resourceversion=0https://{k8s.ip}:443/ apis/config.istio.io/v1alpha2/quotaspecs?limit=500&resourceversion=0https://{k8s.ip}:443/apis/ Networking.istio.io/v1alpha3/gateways?limit=500&resourceversion=0https://{k8s.ip}:443/apis/config.istio.io /v1alpha2/destinationpolicies?limit=500&resourceversion=0https://{k8s.ip}:443/api/v1/nodes?limit=500& resourceversion=0https://{k8s.ip}:443/api/v1/namespaces/istio-system/configmaps/ Istio-ingress-controller-leader-istiohttps://{k8s.ip}:443/api/v1/services?limit=500&resourceversion=0https ://{k8s.ip}:443/api/v1/endpoints?limit=500&resourceversion=0https://{k8s.ip}:443/api/v1/pods?limit=500 &resourceversion=0

Generation of key

in Pilot-discovery, the cache data is divided into two categories, one type of istio configuration information and another type of service registration information. The two classes are subdivided into virtualservices,routerules,nodes,pods and so on, and then k8s空间/应用名 cache the data as subscript.

k8s.io/client-go/tools/cache/store.go #76func MetaNamespaceKeyFunc(obj interface{}) (string, error) {  if key, ok := obj.(ExplicitKey); ok {    return string(key), nil  }  meta, err := meta.Accessor(obj)  if err != nil {    return "", fmt.Errorf("object has no meta: %v", err)  }  if len(meta.GetNamespace()) > 0 {    return meta.GetNamespace() + "/" + meta.GetName(), nil  }  return meta.GetName(), nil}default/sleepkube-system/grafanaistio-system/servicegraph

Storing data

List and Watch

the above mentions that the cache is done through list and watch, to see how it is implemented.

k8s.io/client-go/tools/cache/reflector.go #239func (r *Reflector) ListAndWatch(stopCh <-chan struct{}) error {  ......  list, err := r.listerWatcher.List(options)  ......  resourceVersion = listMetaInterface.GetResourceVersion()  items, err := meta.ExtractList(list)  ......  //缓存数据  if err := r.syncWith(items, resourceVersion); err != nil {    return fmt.Errorf("%s: Unable to sync list result: %v", r.name, err)  }  ......  for {    ......    w, err := r.listerWatcher.Watch(options)    ......    if err := r.watchHandler(w, &resourceVersion, resyncerrc, stopCh); err != nil {      ......      return nil    }  }}

How to update the cache

The list can be seen as the first time the data is initialized, and watch is more like the change state of the listening data: Add, modify, and delete. The cached data is added, deleted, and changed for these states.

k8s.io/client-go/tools/cache/reflector.go #358func (r *Reflector) watchHandler(w watch.Interface, resourceVersion *string, errc chan error, stopCh <-chan struct{}) error {......loop:  for {    select {    case <-stopCh:      return errorStopRequested    case err := <-errc:      return err    case event, ok := <-w.ResultChan():      ......      switch event.Type {      case watch.Added:        err := r.store.Add(event.Object)        ......      case watch.Modified:        err := r.store.Update(event.Object)        ......      case watch.Deleted:      ......        err := r.store.Delete(event.Object)        ......      default:        utilruntime.HandleError(fmt.Errorf("%s: unable to understand watch event %#v", r.name, event))      }      ......    }  }  ......  return nil}

Speed limit access--token bucket algorithm

just see the change of monitoring data is through for{} constantly request the K8s Api-server interface, if no restrictions, it becomes a DDoS attack, so pilot-discovery used traffic control.

k8s.io/client-go/rest/config.goconst (  DefaultQPS   float32 = 5.0  DefaultBurst int     = 10)

understand this configuration, if the number of visits in 1 seconds is greater than 10, then in the next visit can only access 5 times in a second.

k8s.io/client-go/rest/request.go #616func (r *Request) request(fn func(*http.Request, *http.Response)) error {  ......  retries := 0  for {    ......    if retries > 0 {      ......      //使用令牌桶算法      r.tryThrottle()    }    resp, err := client.Do(req)    ......    done := func() bool {      ......      retries++      ......  }}

Co-process Security map

It is very simple to use the memory Key/value cache in Golang, to define variables map[string]interface{} , and then to put the data inside. But the map structure is non-coprocessor secure, so a small database like Pilot-discovery, with both read and write, is prone to scramble for shared resources without locking. So you need to lock: Thread_safe_store.go

type ThreadSafeStore interface {  Add(key string, obj interface{})  Update(key string, obj interface{})  Delete(key string)  Get(key string) (item interface{}, exists bool)  List() []interface{}  ListKeys() []string  Replace(map[string]interface{}, string)  Index(indexName string, obj interface{}) ([]interface{}, error)  IndexKeys(indexName, indexKey string) ([]string, error)  ListIndexFuncValues(name string) []string  ByIndex(indexName, indexKey string) ([]interface{}, error)  GetIndexers() Indexers  // AddIndexers adds more indexers to this store.  If you call this after you already have data  // in the store, the results are undefined.  AddIndexers(newIndexers Indexers) error  Resync() error}

Provide interface

whether it is the V1 API or the V2 API, it is based on the data of the underlying cache, which, according to the Envoy interface document, mosaics the data into envoy desired data.

Exposing V1 API RESTFUL

Exposed interfaces

Pilot-discovery exposes the sds/cds/rds/lds interface, and envoy uses polling to get configuration information through these interfaces

Istio.io/istio/pilot/pkg/proxy/envoy/v1/discovery.go #376func (ds *discoveryservice) Register (container *restful. Container) {ws: = &restful. webservice{} ws. Produces (restful. Mime_json) ... ws.    Route (ws. GET (FMT.    Sprintf ("/v1/registration/{%s}", Servicekey)). To (DS.    listendpoints).    DOC ("SDS registration"). Param (ws. Pathparameter (Servicekey, "tuple of service name and tag name"). DataType ("string"))) ... ws.    Route (ws. GET (FMT.    Sprintf ("/v1/clusters/{%s}/{%s}", Servicecluster, Servicenode)). To (DS.    Listclusters).    DOC ("CDS registration"). Param (ws. Pathparameter (Servicecluster, "Client proxy service cluster").    DataType ("string")). Param (ws. Pathparameter (Servicenode, "Client Proxy service node"). DataType ("string"))) ... ws.    Route (ws. GET (FMT.    Sprintf ("/v1/routes/{%s}/{%s}/{%s}", Routeconfigname, Servicecluster, Servicenode)). To (DS.    Listroutes).    DOC ("RDS registration"). Param (ws. Pathparameter (Routeconfigname, "Route configuration Name"). DatatyPE ("string")). Param (ws. Pathparameter (Servicecluster, "Client proxy service cluster").    DataType ("string")). Param (ws. Pathparameter (Servicenode, "Client Proxy service node"). DataType ("string"))) ... ws.    Route (ws. GET (FMT.    Sprintf ("/v1/listeners/{%s}/{%s}", Servicecluster, Servicenode)). To (DS.    Listlisteners).    DOC ("LDS registration"). Param (ws. Pathparameter (Servicecluster, "Client proxy service cluster").    DataType ("string")). Param (ws. Pathparameter (Servicenode, "Client Proxy service node"). DataType ("string"))) ... container. ADD (WS)}

Create a Level two cache

here the cache can be understood: we normally develop, from the database to obtain data, after the logical processing, and then the final results are cached, returned to the client, the next time you come in, the cache to get data. Similarly, the interface of the V1 API obtains data from the underlying cache, stitching the data into envoy required format data, and then caching the data back to envoy.

Listendpoints (EDS)

several other interface methods, not listed

istio.io/istio/pilot/pkg/proxy/envoy/v1/discovery.go #567> func (ds *DiscoveryService) ListEndpoints(request *restful.Request, response *restful.Response) {  ......  key := request.Request.URL.String()  out, resourceCount, cached := ds.sdsCache.cachedDiscoveryResponse(key)  //没有缓存  if !cached {    /**    逻辑处理    **/    ......    resourceCount = uint32(len(endpoints))    if resourceCount > 0 {      //缓存数据      ds.sdsCache.updateCachedDiscoveryResponse(key, resourceCount, out)    }  }  observeResources(methodName, resourceCount)  writeResponse(response, out)}

Exposing v2 API Grpc

GRPC bidirectional flow

I am also just touching Grpc's bidirectional flow, and I understand it as: a long connection, the client and the server can interact with each other. The usage here is that the client envoy open a GRPC connection, initially pilot-discovery the data to envoy, and then, if there is data change, Pilot-discovery pushes the data to GRPC via envoy.

Ads Aggregation Interface

the aggregation interface is where SDS/CDS/RDS/LDS the configuration data is placed on an interface. The implementation is a bit long, the reduction is only one interface, but the way is the same.

  istio.io/istio/pilot/pkg/proxy/envoy/v2/ads.go #237func (S *discoveryserver) streamaggregatedresources ( Stream ads. Aggregateddiscoveryservice_streamaggregatedresourcesserver) Error {... var receiveerror error Reqchannel: = Make (ch An *XDSAPI. Discoveryrequest, 1) Go Receivethread (Con, Reqchannel, &receiveerror) for {//Block until either a request is re      Ceived or the ticker ticks select {Case discreq, OK = <-reqchannel: ... switch discreq.typeurl { Case Clustertype: ... case listenertype: ... case routetype: The case of the ... case Endpointty PE: ...//push data err: = S.pusheds (con) if err! = Nil {return err} ....      .. } ...//trigger push data via listener events case <-con.pushchannel: ... if Len (Con.    Clusters) > 0 {err: = S.pusheds (con) if err! = Nil {return err}} ... }  }}

Clear level two cache and trigger push

Active triggering

clear level Two cache and trigger push is actually the same trigger point here: It's the time of the data change. The data changes should be unordered, but should be done in an orderly fashion when updating the configuration. So the task queue is used here, so the event is done one at a--one thing.

Initialize list and watch to register the Add,update,delete event.

istio.io/istio/pilot/pkg/config/kube/crd/controller.go #133func (c *controller) createInformer(  o runtime.Object,  otype string,  resyncPeriod time.Duration,  lf cache.ListFunc,  wf cache.WatchFunc) cacheHandler {  ......  informer.AddEventHandler(    cache.ResourceEventHandlerFuncs{      AddFunc: func(obj interface{}) {        ......        c.queue.Push(kube.NewTask(handler.Apply, obj, model.EventAdd))      },      ......    })  return cacheHandler{informer: informer, handler: handler}}

it executes when the event is triggered handler.Apply , and then executes the registered method.

istio.io/istio/pilot/pkg/serviceregistry/kube/queue.go #142func (ch *ChainHandler) Apply(obj interface{}, event model.Event) error {  for _, f := range ch.funcs {    if err := f(obj, event); err != nil {      return err    }  }  return nil}

Registration method

istio.io/istio/pilot/pkg/proxy/envoy/v1/discovery.go #328func NewDiscoveryService(ctl model.Controller, configCache model.ConfigStoreCache,  environment model.Environment, o DiscoveryServiceOptions) (*DiscoveryService, error) {  ......  serviceHandler := func(*model.Service, model.Event) { out.clearCache() }  if err := ctl.AppendServiceHandler(serviceHandler); err != nil {    return nil, err  }  instanceHandler := func(*model.ServiceInstance, model.Event) { out.clearCache() }  if err := ctl.AppendInstanceHandler(instanceHandler); err != nil {    return nil, err  }  if configCache != nil {    ......    configHandler := func(model.Config, model.Event) { out.clearCache() }    for _, descriptor := range model.IstioConfigTypes {      configCache.RegisterEventHandler(descriptor.Type, configHandler)    }  }  return out, nil}

method out.clearCache() to realize the level two cache and push data

istio.io/istio/pilot/pkg/proxy/envoy/v1/discovery.go #480func (ds *DiscoveryService) clearCache() {  ......  //清二级缓存  ds.sdsCache.clear()  ds.cdsCache.clear()  ds.rdsCache.clear()  ds.ldsCache.clear()  if V2ClearCache != nil {    //把数据推送到envoy    V2ClearCache()  }}

Manual Trigger

an interface to clear level two cache is opened in Pilot-discovery.

istio.io/istio/pilot/pkg/proxy/envoy/v1/discovery.go #436func (ds *DiscoveryService) Register(container *restful.Container) {  ws := &restful.WebService{}  ws.Produces(restful.MIME_JSON)  ......  ws.Route(ws.    POST("/cache_stats_delete").    To(ds.ClearCacheStats).    Doc("Clear discovery service cache stats"))  container.Add(ws)}

Small knowledge

The Panic of Evil

in the development of the use of Golang in the process of more or less contact with panic. For example, the introduction of some like to use panic third package, assertion error, etc. triggered the panic, resulting in the entire service is hung off. In order to avoid these problems, we are generally used recover to receive panic, but have always felt that their way of handling is not very good. So this time source analysis deliberately read K8s's go client is how to deal with the panic problem, after all, Google produced.

k8s.io/apimachinery/pkg/util/runtime/runtime.go #47func HandleCrash(additionalHandlers ...func(interface{})) {  if r := recover(); r != nil {    //默认会打印 出现panic问题的文件和行数    for _, fn := range PanicHandlers {      fn(r)    }    //留给使用方，出现了panic你还想如何处理    for _, fn := range additionalHandlers {      fn(r)    }    //如果你确认，可以直接panic    if ReallyCrash {      // Actually proceed to panic.      panic(r)    }  }}

as seen from the above, the k8s client is treated the same way as we thought, but its encapsulation is more friendly. In K8s's go client, prefer to HandleCrash use with for{}.

k8s.io/apimachinery/pkg/watch/streamwatcher.go #88func (sw *StreamWatcher) receive() {  ......  defer utilruntime.HandleCrash()  for {    ......  }}k8s.io/client-go/tools/record/event.go #224func (eventBroadcaster *eventBroadcasterImpl) StartEventWatcher(eventHandler func(*v1.Event)) watch.Interface {  ......  go func() {    defer utilruntime.HandleCrash()    for {      ......    }  }()  return watcher}

Conclusion

The source code analysis, not only understand the design of the implementation of Pilot-discovery, but also through the k8s go client learning to the delay queue, flow control, the security database and other related implementation and application scenarios, harvest a lot.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Istio Source Analysis--poilt-discovery Service discovery and Configuration Center

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support