Original: Istio Source analysis--poilt-discovery Service discovery and Configuration Center
Statement
- This article needs to understand Istio,k8s,golang,envoy basics
- The analyzed environment is K8s,istio version 0.8.0
The role of Poilt-discovery
envoy provides a common set of data surface interfaces that enable service discovery and configuration to be implemented dynamically through interfaces. In Istio need to integrate K8s,consul and other service discovery system, so need an intermediary collation in K8s,consul Service registration and configuration information, and provide to envoy.
Envoy V1 API and V2 API differences
V1 version API and V2 version API have a history, details can be crossing blog. At the beginning of envoy Open source, dynamic service discovery and configuration is implemented using http+ polling, but this approach has the following drawbacks:
- The use of weak types in interface data makes it difficult to implement some common services.
- The control surface prefers to use push to reduce the time it takes to transfer data when it is updated.
with the enhanced collaboration with Google, the official use of GRPC + push developed the V2 version of the API, the implementation of the V1 version of the Sds/cds/rds/lds interface, continue to support the
JSON/YAML data format, but also increased the ads (sds/cds/rds/ LDS4 interfaces together, HDS and other interfaces.
Building the underlying cache data
In
fact, Pilot-discovery has been considered a small non-persistent key/value database, it has istio configuration information and service registration information are cached. This allows the configuration to take effect more quickly.
What data was cached
istio.io/istio/pilot/pkg/model/config.govar ( ...... // RouteRule describes route rules RouteRule = ProtoSchema{ Type: "route-rule", ...... } // VirtualService describes v1alpha3 route rules VirtualService = ProtoSchema{ Type: "virtual-service", ...... } // Gateway describes a gateway (how a proxy is exposed on the network) Gateway = ProtoSchema{ Type: "gateway", ...... } // IngressRule describes ingress rules IngressRule = ProtoSchema{ Type: "ingress-rule", ...... })
do a novice task of the classmate, should be very familiar with the above
Type , is the configuration information inside
kind , configuration information saved into k8s, will be pilot-discovery through Api-server crawl over to cache.
apiVersion: networking.istio.io/v1alpha3kind: VirtualServicemetadata: name: reviews ......
- Service registration information obtained from k8s
> Istio.io/istio/pilot/pkg/serviceregistry/kube/controller.go #102func newcontroller (client Kubernetes. Interface, Options Controlleroptions) *controller {... out.services = Out.createinformer (&v1. service{}, "Service", options. Resyncperiod, func (OPTs Meta_v1. listoptions) (runtime. Object, error) {return client. CoreV1 (). Services (options. Watchednamespace). List (OPTs)}, func (OPTs Meta_v1. listoptions) (watch. Interface, error) {return client. CoreV1 (). Services (options. Watchednamespace). Watch (opts)}) ... out.nodes = Out.createinformer (&v1. node{}, "Node", options. Resyncperiod, func (OPTs Meta_v1. listoptions) (runtime. Object, error) {return client. CoreV1 (). Nodes (). List (OPTs)}, func (OPTs Meta_v1. listoptions) (watch. Interface, error) {return client. CoreV1 (). Nodes (). Watch (opts)}) ... return out}
There are other data not listed, as can be seen from the above, the establishment of the cache is through the List and Watch mode (istio configuration data also), List: The first initialization of data, Watch: By polling the way to get the data and cache.
- Request address converted To
Https://{k8s.ip}:443/apis/config.istio.io/v1alpha2/httpapispecs?limit=500&resourceversion=0https://{k8s.ip }:443/apis/config.istio.io/v1alpha2/servicerolebindings?limit=500&resourceversion=0https://{k8s.ip}:443/ apis/networking.istio.io/v1alpha3/virtualservices?limit=500&resourceversion=0https://{k8s.ip}:443/apis/ config.istio.io/v1alpha2/quotaspecbindings?limit=500&resourceversion=0https://{k8s.ip}:443/apis/ config.istio.io/v1alpha2/serviceroles?limit=500&resourceversion=0https://{k8s.ip}:443/apis/ networking.istio.io/v1alpha3/serviceentries?limit=500&resourceversion=0https://{k8s.ip}:443/apis/ config.istio.io/v1alpha2/routerules?limit=500&resourceversion=0https://{k8s.ip}:443/apis/config.istio.io/ V1alpha2/egressrules?limit=500&resourceversion=0https://{k8s.ip}:443/apis/authentication.istio.io/v1alpha1 /policies?limit=500&resourceversion=0https://{k8s.ip}:443/apis/config.istio.io/v1alpha2/ HTTPAPISPECBINDINGS?LIMIT=500&RESOURCEVERSION=0HTTPS://{K8S.IP}:443/apis/networking.istio.io/v1alpha3/destinationrules?limit=500&resourceversion=0https://{k8s.ip}:443/ apis/config.istio.io/v1alpha2/quotaspecs?limit=500&resourceversion=0https://{k8s.ip}:443/apis/ Networking.istio.io/v1alpha3/gateways?limit=500&resourceversion=0https://{k8s.ip}:443/apis/config.istio.io /v1alpha2/destinationpolicies?limit=500&resourceversion=0https://{k8s.ip}:443/api/v1/nodes?limit=500& resourceversion=0https://{k8s.ip}:443/api/v1/namespaces/istio-system/configmaps/ Istio-ingress-controller-leader-istiohttps://{k8s.ip}:443/api/v1/services?limit=500&resourceversion=0https ://{k8s.ip}:443/api/v1/endpoints?limit=500&resourceversion=0https://{k8s.ip}:443/api/v1/pods?limit=500 &resourceversion=0
Generation of key
in Pilot-discovery, the cache data is divided into two categories, one type of istio configuration information and another type of service registration information. The two classes are subdivided into virtualservices,routerules,nodes,pods and so on, and then
k8s空间/应用名 cache the data as subscript.
k8s.io/client-go/tools/cache/store.go #76func MetaNamespaceKeyFunc(obj interface{}) (string, error) { if key, ok := obj.(ExplicitKey); ok { return string(key), nil } meta, err := meta.Accessor(obj) if err != nil { return "", fmt.Errorf("object has no meta: %v", err) } if len(meta.GetNamespace()) > 0 { return meta.GetNamespace() + "/" + meta.GetName(), nil } return meta.GetName(), nil}default/sleepkube-system/grafanaistio-system/servicegraph
Storing data
the above mentions that the cache is done through list and watch, to see how it is implemented.
k8s.io/client-go/tools/cache/reflector.go #239func (r *Reflector) ListAndWatch(stopCh <-chan struct{}) error { ...... list, err := r.listerWatcher.List(options) ...... resourceVersion = listMetaInterface.GetResourceVersion() items, err := meta.ExtractList(list) ...... //缓存数据 if err := r.syncWith(items, resourceVersion); err != nil { return fmt.Errorf("%s: Unable to sync list result: %v", r.name, err) } ...... for { ...... w, err := r.listerWatcher.Watch(options) ...... if err := r.watchHandler(w, &resourceVersion, resyncerrc, stopCh); err != nil { ...... return nil } }}
The list can be seen as the first time the data is initialized, and watch is more like the change state of the listening data: Add, modify, and delete. The cached data is added, deleted, and changed for these states.
k8s.io/client-go/tools/cache/reflector.go #358func (r *Reflector) watchHandler(w watch.Interface, resourceVersion *string, errc chan error, stopCh <-chan struct{}) error {......loop: for { select { case <-stopCh: return errorStopRequested case err := <-errc: return err case event, ok := <-w.ResultChan(): ...... switch event.Type { case watch.Added: err := r.store.Add(event.Object) ...... case watch.Modified: err := r.store.Update(event.Object) ...... case watch.Deleted: ...... err := r.store.Delete(event.Object) ...... default: utilruntime.HandleError(fmt.Errorf("%s: unable to understand watch event %#v", r.name, event)) } ...... } } ...... return nil}
- Speed limit access--token bucket algorithm
just see the change of monitoring data is through for{} constantly request the K8s Api-server interface, if no restrictions, it becomes a DDoS attack, so pilot-discovery used traffic control.
k8s.io/client-go/rest/config.goconst ( DefaultQPS float32 = 5.0 DefaultBurst int = 10)
understand this configuration, if the number of visits in 1 seconds is greater than 10, then in the next visit can only access 5 times in a second.
k8s.io/client-go/rest/request.go #616func (r *Request) request(fn func(*http.Request, *http.Response)) error { ...... retries := 0 for { ...... if retries > 0 { ...... //使用令牌桶算法 r.tryThrottle() } resp, err := client.Do(req) ...... done := func() bool { ...... retries++ ...... }}
It is very simple to use the memory Key/value cache in Golang, to define variables
map[string]interface{} , and then to put the data inside. But the map structure is non-coprocessor secure, so a small database like Pilot-discovery, with both read and write, is prone to scramble for shared resources without locking. So you need to lock: Thread_safe_store.go
type ThreadSafeStore interface { Add(key string, obj interface{}) Update(key string, obj interface{}) Delete(key string) Get(key string) (item interface{}, exists bool) List() []interface{} ListKeys() []string Replace(map[string]interface{}, string) Index(indexName string, obj interface{}) ([]interface{}, error) IndexKeys(indexName, indexKey string) ([]string, error) ListIndexFuncValues(name string) []string ByIndex(indexName, indexKey string) ([]interface{}, error) GetIndexers() Indexers // AddIndexers adds more indexers to this store. If you call this after you already have data // in the store, the results are undefined. AddIndexers(newIndexers Indexers) error Resync() error}
Provide interface
whether it is the V1 API or the V2 API, it is based on the data of the underlying cache, which, according to the Envoy interface document, mosaics the data into envoy desired data.
Exposing V1 API RESTFUL
Pilot-discovery exposes the
sds/cds/rds/lds interface, and envoy uses polling to get configuration information through these interfaces
Istio.io/istio/pilot/pkg/proxy/envoy/v1/discovery.go #376func (ds *discoveryservice) Register (container *restful. Container) {ws: = &restful. webservice{} ws. Produces (restful. Mime_json) ... ws. Route (ws. GET (FMT. Sprintf ("/v1/registration/{%s}", Servicekey)). To (DS. listendpoints). DOC ("SDS registration"). Param (ws. Pathparameter (Servicekey, "tuple of service name and tag name"). DataType ("string"))) ... ws. Route (ws. GET (FMT. Sprintf ("/v1/clusters/{%s}/{%s}", Servicecluster, Servicenode)). To (DS. Listclusters). DOC ("CDS registration"). Param (ws. Pathparameter (Servicecluster, "Client proxy service cluster"). DataType ("string")). Param (ws. Pathparameter (Servicenode, "Client Proxy service node"). DataType ("string"))) ... ws. Route (ws. GET (FMT. Sprintf ("/v1/routes/{%s}/{%s}/{%s}", Routeconfigname, Servicecluster, Servicenode)). To (DS. Listroutes). DOC ("RDS registration"). Param (ws. Pathparameter (Routeconfigname, "Route configuration Name"). DatatyPE ("string")). Param (ws. Pathparameter (Servicecluster, "Client proxy service cluster"). DataType ("string")). Param (ws. Pathparameter (Servicenode, "Client Proxy service node"). DataType ("string"))) ... ws. Route (ws. GET (FMT. Sprintf ("/v1/listeners/{%s}/{%s}", Servicecluster, Servicenode)). To (DS. Listlisteners). DOC ("LDS registration"). Param (ws. Pathparameter (Servicecluster, "Client proxy service cluster"). DataType ("string")). Param (ws. Pathparameter (Servicenode, "Client Proxy service node"). DataType ("string"))) ... container. ADD (WS)}
here the cache can be understood: we normally develop, from the database to obtain data, after the logical processing, and then the final results are cached, returned to the client, the next time you come in, the cache to get data. Similarly, the interface of the V1 API obtains data from the underlying cache, stitching the data into envoy required format data, and then caching the data back to envoy.
- Listendpoints (EDS)
several other interface methods, not listed
istio.io/istio/pilot/pkg/proxy/envoy/v1/discovery.go #567> func (ds *DiscoveryService) ListEndpoints(request *restful.Request, response *restful.Response) { ...... key := request.Request.URL.String() out, resourceCount, cached := ds.sdsCache.cachedDiscoveryResponse(key) //没有缓存 if !cached { /** 逻辑处理 **/ ...... resourceCount = uint32(len(endpoints)) if resourceCount > 0 { //缓存数据 ds.sdsCache.updateCachedDiscoveryResponse(key, resourceCount, out) } } observeResources(methodName, resourceCount) writeResponse(response, out)}
Exposing v2 API Grpc
I am also just touching Grpc's bidirectional flow, and I understand it as: a long connection, the client and the server can interact with each other. The usage here is that the client envoy open a GRPC connection, initially pilot-discovery the data to envoy, and then, if there is data change, Pilot-discovery pushes the data to GRPC via envoy.
- Ads Aggregation Interface
the aggregation interface is where
SDS/CDS/RDS/LDS the configuration data is placed on an interface. The implementation is a bit long, the reduction is only one interface, but the way is the same.
istio.io/istio/pilot/pkg/proxy/envoy/v2/ads.go #237func (S *discoveryserver) streamaggregatedresources ( Stream ads. Aggregateddiscoveryservice_streamaggregatedresourcesserver) Error {... var receiveerror error Reqchannel: = Make (ch An *XDSAPI. Discoveryrequest, 1) Go Receivethread (Con, Reqchannel, &receiveerror) for {//Block until either a request is re Ceived or the ticker ticks select {Case discreq, OK = <-reqchannel: ... switch discreq.typeurl { Case Clustertype: ... case listenertype: ... case routetype: The case of the ... case Endpointty PE: ...//push data err: = S.pusheds (con) if err! = Nil {return err} .... .. } ...//trigger push data via listener events case <-con.pushchannel: ... if Len (Con. Clusters) > 0 {err: = S.pusheds (con) if err! = Nil {return err}} ... } }}
Clear level two cache and trigger push
clear level Two cache and trigger push is actually the same trigger point here: It's the time of the data change. The data changes should be unordered, but should be done in an orderly fashion when updating the configuration. So the task queue is used here, so the event is done one at a--one thing.
- Initialize list and watch to register the Add,update,delete event.
istio.io/istio/pilot/pkg/config/kube/crd/controller.go #133func (c *controller) createInformer( o runtime.Object, otype string, resyncPeriod time.Duration, lf cache.ListFunc, wf cache.WatchFunc) cacheHandler { ...... informer.AddEventHandler( cache.ResourceEventHandlerFuncs{ AddFunc: func(obj interface{}) { ...... c.queue.Push(kube.NewTask(handler.Apply, obj, model.EventAdd)) }, ...... }) return cacheHandler{informer: informer, handler: handler}}
it executes when the event is triggered
handler.Apply , and then executes the registered method.
istio.io/istio/pilot/pkg/serviceregistry/kube/queue.go #142func (ch *ChainHandler) Apply(obj interface{}, event model.Event) error { for _, f := range ch.funcs { if err := f(obj, event); err != nil { return err } } return nil}
- Registration method
istio.io/istio/pilot/pkg/proxy/envoy/v1/discovery.go #328func NewDiscoveryService(ctl model.Controller, configCache model.ConfigStoreCache, environment model.Environment, o DiscoveryServiceOptions) (*DiscoveryService, error) { ...... serviceHandler := func(*model.Service, model.Event) { out.clearCache() } if err := ctl.AppendServiceHandler(serviceHandler); err != nil { return nil, err } instanceHandler := func(*model.ServiceInstance, model.Event) { out.clearCache() } if err := ctl.AppendInstanceHandler(instanceHandler); err != nil { return nil, err } if configCache != nil { ...... configHandler := func(model.Config, model.Event) { out.clearCache() } for _, descriptor := range model.IstioConfigTypes { configCache.RegisterEventHandler(descriptor.Type, configHandler) } } return out, nil}
method
out.clearCache() to realize the level two cache and push data
istio.io/istio/pilot/pkg/proxy/envoy/v1/discovery.go #480func (ds *DiscoveryService) clearCache() { ...... //清二级缓存 ds.sdsCache.clear() ds.cdsCache.clear() ds.rdsCache.clear() ds.ldsCache.clear() if V2ClearCache != nil { //把数据推送到envoy V2ClearCache() }}
an interface to clear level two cache is opened in Pilot-discovery.
istio.io/istio/pilot/pkg/proxy/envoy/v1/discovery.go #436func (ds *DiscoveryService) Register(container *restful.Container) { ws := &restful.WebService{} ws.Produces(restful.MIME_JSON) ...... ws.Route(ws. POST("/cache_stats_delete"). To(ds.ClearCacheStats). Doc("Clear discovery service cache stats")) container.Add(ws)}
Small knowledge
The Panic of Evil
in the development of the use of Golang in the process of more or less contact with panic. For example, the introduction of some like to use panic third package, assertion error, etc. triggered the panic, resulting in the entire service is hung off. In order to avoid these problems, we are generally used
recover to receive panic, but have always felt that their way of handling is not very good. So this time source analysis deliberately read K8s's go client is how to deal with the panic problem, after all, Google produced.
k8s.io/apimachinery/pkg/util/runtime/runtime.go #47func HandleCrash(additionalHandlers ...func(interface{})) { if r := recover(); r != nil { //默认会打印 出现panic问题的文件和行数 for _, fn := range PanicHandlers { fn(r) } //留给使用方,出现了panic你还想如何处理 for _, fn := range additionalHandlers { fn(r) } //如果你确认,可以直接panic if ReallyCrash { // Actually proceed to panic. panic(r) } }}
as seen from the above, the k8s client is treated the same way as we thought, but its encapsulation is more friendly. In K8s's go client, prefer to
HandleCrash use with for{}.
k8s.io/apimachinery/pkg/watch/streamwatcher.go #88func (sw *StreamWatcher) receive() { ...... defer utilruntime.HandleCrash() for { ...... }}k8s.io/client-go/tools/record/event.go #224func (eventBroadcaster *eventBroadcasterImpl) StartEventWatcher(eventHandler func(*v1.Event)) watch.Interface { ...... go func() { defer utilruntime.HandleCrash() for { ...... } }() return watcher}
Conclusion
The source code analysis, not only understand the design of the implementation of Pilot-discovery, but also through the k8s go client learning to the delay queue, flow control, the security database and other related implementation and application scenarios, harvest a lot.