Istio Source Analysis--pilot-agent How to manage envoy life cycle

Source: Internet
Author: User
Tags docker ps k8s

Original: Istio source analysis--pilot-agent How to manage envoy life cycle

Statement

    1. The source code for the analysis is 0.7.1 version
    2. Environment for K8s
    3. Because there is no C + + foundation, so the source analysis stops in C + +, but also learned a lot of things

What is Pilot-agent?

when we do kubectl apply -f <(~istioctl kube-inject -f sleep.yaml) , K8s will help us build 3 of containers.
[root@izwz9cffi0prthtem44cp9z ~]# docker ps |grep sleep8e0de7294922        istio/proxy                                                               ccddc800b2a2        registry.cn-shenzhen.aliyuncs.com/jukylin/sleep                          990868aa4a42        registry-vpc.cn-shenzhen.aliyuncs.com/acs/pause-amd64:3.0           
In
these 3 containers, we are concerned istio/proxy . This container runs 2 services. pilot-agentThis is the following: How to manage the life cycle of envoy.
[root@izwz9cffi0prthtem44cp9z ~]# docker exec -it 8e0de7294922 ps -efUID        PID  PPID  C STIME TTY          TIME CMD1337         1     0  0 May09 ?        00:00:49 /usr/local/bin/pilot-agent proxy1337       567     1  1 09:18 ?        00:04:42 /usr/local/bin/envoy -c /etc/ist

Why do you use Pilot-agent?

envoy does not interact directly with K8s,consul,eureka and other platforms, so it requires other services to dock with them, manage the configuration, and Pilot-agent is one of the "control panels."

Start envoy

Load configuration

pilot-agent will generate a configuration file before booting:/etc/istio/proxy/envoy-rev0.json:
istio.io/istio/pilot/pkg/proxy/envoy/v1/config.go #88func BuildConfig(config meshconfig.ProxyConfig, pilotSAN []string) *Config {    ......    return out}
the specific contents of the file can be viewed directly inside the container file
docker exec -it 8e0de7294922 cat /etc/istio/proxy/envoy-rev0.json
the meaning of the configuration content can be seen in the official documentation

Startup parameters

A binary file boot will always require some parameters, envoy is no exception.
istio.io/istio/pilot/pkg/proxy/envoy/v1/watcher.go #274func (proxy envoy) args(fname string, epoch int) []string {    ......    return startupArgs}
envoy startup parameters can be docker logs 8e0de7294922 viewed by viewing, below is the parameters of intercepting envoy from the terminal. Understand the specific parameters meaning official website documentation.
-c /etc/istio/proxy/envoy-rev0.json --restart-epoch 0--drain-time-s 45 --parent-shutdown-time-s 60--service-cluster sleep --service-node sidecar~172.00.00.000~sleep-55b5877479-rwcct.default~default.svc.cluster.local --max-obj-name-len 189 -l info --v2-config-only

Start envoy

Pilot-agent uses exec.Command the boot envoy and listens for the envoy's running state (if envoy exits abnormally, status returns non-nil,pilot-agent a policy to restart envoy).

proxy.config.BinaryPathIs the envoy binary file path:/usr/local/bin/envoy.

argsThe envoy startup parameters that are described above.

istio.io/istio/pilot/pkg/proxy/envoy/v1/watcher.go #353func (proxy envoy) Run(config interface{}, epoch int, abort <-chan error) error {    ......    /* #nosec */    cmd := exec.Command(proxy.config.BinaryPath, args...)    cmd.Stdout = os.Stdout    cmd.Stderr = os.Stderr    if err := cmd.Start(); err != nil {      return err    }    ......    done := make(chan error, 1)    go func() {      done <- cmd.Wait()    }()    select {    case err := <-abort:      ......    case err := <-done:      return err    }}

Hot update envoy

here we only discuss pilot-agent how to get envoy hot update, as to how to trigger this step will be described in the following article.

Envoy Hot Update Policy

to learn more about Envoy's hot-update strategy, you can crossing the Web blog envoy Heat restart.

A brief introduction to the following Envoy hot update steps:

    1. Start another envoy2 process (secondary)
    2. Envoy2 notifies envoy1 (Primary process) to close its managed port, which is taken over by Envoy2
    3. Bring envoy1 available listen sockets through the UDs.
    4. Envoy2 initialization succeeds, notifies envoy1 to gracefully close a working request over a period of time ( drain-time-s )
    5. Time ( parent-shutdown-time-s ), Envoy2 notifies envoy1 to close itself
    6. Envoy2 Upgrade to Envoy1
from the above execution steps, Poilt-agent is only responsible for initiating another envoy process, and the other is handled by envoy itself.

When does the hot update take place?

when the poilt-agent is started, the files under the directory will be listened to, and /etc/certs/ if the files in this directory are modified or deleted, Poilt-agent will notify envoy for hot updates. As to how to trigger these files to be modified and deleted will be introduced in the next article.
istio.io/istio/pilot/pkg/proxy/envoy/v1/watcher.go #177func watchCerts(ctx context.Context, certsDirs []string, watchFileEventsFn watchFileEventsFn,    minDelay time.Duration, updateFunc func()) {    fw, err := fsnotify.NewWatcher()    if err != nil {        log.Warnf("failed to create a watcher for certificate files: %v", err)        return    }    defer func() {        if err := fw.Close(); err != nil {            log.Warnf("closing watcher encounters an error %v", err)        }    }()    // watch all directories    for _, d := range certsDirs {        if err := fw.Watch(d); err != nil {            log.Warnf("watching %s encounters an error %v", d, err)            return        }    }    watchFileEventsFn(ctx, fw.Event, minDelay, updateFunc)}

Hot Update Startup Parameters

-c /etc/istio/proxy/envoy-rev1.json --restart-epoch 1--drain-time-s 45 --parent-shutdown-time-s 60--service-cluster sleep --service-nodesidecar~172.00.00.000~sleep-898b65f84-pnsxr.default~default.svc.cluster.local --max-obj-name-len 189 -l info--v2-config-only
The
hot update start parameter and the first start parameter are different places are-C and--restart-epoch, in fact,-C just the configuration file name is different, their content is the same. --restart-epoch will increment by 1 each time a hot update is made, to determine if a hot update is being performed or to open an existing envoy (this should mean opening the envoy for the first time)
See the official description in detail
istio.io/istio/pilot/pkg/proxy/agent.go #258func (a *agent) reconcile() {    ......    // discover and increment the latest running epoch    epoch := a.latestEpoch() + 1    // buffer aborts to prevent blocking on failing proxy    abortCh := make(chan error, MaxAborts)    a.epochs[epoch] = a.desiredConfig    a.abortCh[epoch] = abortCh    a.currentConfig = a.desiredConfig    go a.waitForExit(a.desiredConfig, epoch, abortCh)}

Capture the hot update log from the terminal

2018-04-24T13:59:35.513160Z    info    watchFileEvents: "/etc/certs//..2018_04_24_13_59_35.824521609": CREATE2018-04-24T13:59:35.513228Z    info    watchFileEvents: "/etc/certs//..2018_04_24_13_59_35.824521609": MODIFY|ATTRIB2018-04-24T13:59:35.513283Z    info    watchFileEvents: "/etc/certs//..data_tmp": RENAME2018-04-24T13:59:35.513347Z    info    watchFileEvents: "/etc/certs//..data": CREATE2018-04-24T13:59:35.513372Z    info    watchFileEvents: "/etc/certs//..2018_04_24_04_30_11.964751916": DELETE

Rescue envoy

envoy is a service, since it is impossible to guarantee 100% of the availability of services, if envoy is not lucky to be down, then pilot-agent how to rescue, to ensure that envoy high availability?

Get exit status

in the above mentioned Pilot-agent start envoy, will listen to envoy exit status, found abnormal exit status, will rescue envoy.
func (proxy envoy) Run(config interface{}, epoch int, abort <-chan error) error {    ......    // Set if the caller is monitoring envoy, for example in tests or if envoy runs in same    // container with the app.    if proxy.errChan != nil {      // Caller passed a channel, will wait itself for termination      go func() {        proxy.errChan <- cmd.Wait()      }()      return nil    }    done := make(chan error, 1)    go func() {      done <- cmd.Wait()    }()    ......}

Rescue envoy

The
kill-9 can be used to simulate the envoy abnormal exit status. When an abnormal exit occurs, the Pilot-agent rescue mechanism is triggered. If the first rescue success, that of course is good, if failed, Pilot-agent will continue to rescue, up to 10 times, each time interval is 2 n time.millisecond. More than 10 times have not saved, Pilit-agent will give up rescue, announce death, and exit Istio/proxy, let k8s restart a new container.
Istio.io/istio/pilot/pkg/proxy/agent.go #164func (a *agent) Run (CTX context. Context) {... ... for {... ..... case Status: = <-a.statusch: ..... if Sta..... Tus.err = = Errabort {//pilot-agent notification exits or envoy abnormal exit log. Infof ("Epoch%d aborted", Status.epoch)} else if status.err! = Nil {//envoy Abnormal exit log. WARNF ("Epoch%d terminated with an error:%v", Status.epoch, Status.err) ... a.abortall ()} else {//exit log normally. Infof ("Epoch%d exited normally", Status.epoch)} ... if status.err! = Nil {//skip retrying twice by Checking retry Restart delay if A.retry.restart = = Nil {if A.retry.budget > 0 {delayduration: = A.retry.initialinterval * (1 << UINT (a.retry.maxretries-a.retry.budget)) Restart: = time. Now (). ADD (delayduration) A.retry.restart = &restart A.retry.budget = a.retry.budget-1 log.Infof ("Epoch%d:set retry delay to%v, budget to%d", Status.epoch, Delayduration, A.retry.budget)} else { Declare death, exit Istio/proxy log. Error ("Permanent Error:budget exhausted trying to fulfill the desired configuration") A.proxy.panic (A.desiredcon FIG) Return}} else {log. DEBUGF ("Epoch%d:restart already scheduled", Status.epoch)}} case <-time. After (delay): ... case _, more: = <-ctx.  Done (): ...} }}
istio.io/istio/pilot/pkg/proxy/agent.go #72var (  errAbort = errors.New("epoch aborted")  // DefaultRetry configuration for proxies  DefaultRetry = Retry{    MaxRetries:      10,    InitialInterval: 200 * time.Millisecond,  })

Rescue log

Epoch 6: set retry delay to 200ms, budget to 9Epoch 6: set retry delay to 400ms, budget to 8Epoch 6: set retry delay to 800ms, budget to 7

Graceful close envoy

Service offline or upgrade we all want them to be very gentle, so that users do not feel, to avoid disturbing users. This requires that the service receives an exit notification and finishes processing the task that is being performed before shutting down instead of directly. Does envoy support graceful shutdown? This requires k8s,pilot-agent also support this play. Because there is an association relationship k8s Management pilot-agent,pilot-agent management envoy.

K8s let the service gracefully exit

Online has a summary of the blog k8s graceful close pods, I am here to briefly introduce the graceful closing process:
    1. K8s sends the SIGTERM signal to the 1th process of all services under Pods
    2. After the service receives the signal, gracefully shuts down the task and exits
    3. After a while (default 30s), if the service does not exit, K8s sends a SIGKILL signal to force the container to quit.

Pilot-agent let envoy gracefully quit

    • Pilot-agent receiving k8s Signal
Pilot-agent will receive syscall. SIGINT, Syscall. SIGTERM, these 2 signals can be achieved gracefully off the envoy effect.
istio.io/istio/pkg/cmd/cmd.go #29func WaitSignal(stop chan struct{}) {    sigs := make(chan os.Signal, 1)    signal.Notify(sigs, syscall.SIGINT, syscall.SIGTERM)    <-sigs    close(stop)    _ = log.Sync()}
    • Notify child service off envoy
in Golang There is a context Management Pack context , which notifies each sub-service to perform a shutdown operation by means of a broadcast.
  istio.io/istio/pilot/cmd/pilot-agent/main.go #242ctx, Cancel: = context. Withcancel (context. Background ()) Go watcher. Run (CTX) Stop: = Make (chan struct{}) cmd. Waitsignal (stop) <-stop//notifies the sub service cancel () Istio.io/istio/pilot/pkg/proxy/agent.gofunc (a *agent) Run (CTX context. Context) {...... for {... ...////Receive Master Service information Notification envoy Exit Case _, more: = <-ctx. Done (): if!more {a.terminate () return}}}}istio.io/istio/pilot/pkg/proxy/envoy/v1/watcher. Go #297func (proxy envoy) Run (config interface{}, epoch int, abort <-chan error) Error {... Select {case ERR: = <-abort:log. WARNF ("Aborting epoch%d", epoch)//Send kill signal to envoy if Errkill: = cmd. Process.kill (); Errkill! = Nil {log. WARNF ("Killing epoch%d caused an error%v", Epoch, Errkill)} return err ...}  
The
above shows the process of pilot-agent receiving signals from the k8s to the notification envoy shutdown, which illustrates that poilt-agent is also support graceful shutdown. But eventually envoy does not gracefully shut down, which is okay with pilot-agent sending the kill signal, because envoy itself is not supported.

Envoy graceful off

    • Regret notice
It
's a pity to be here to inform you that Envoy can not gracefully close, envoy will receive SIGTERM,SIGHUP,SIGCHLD,SIGUSR1 4 signals, but these 4 are not related to elegance, these 4 signals can be seen in official documents. Of course, the official also noticed this problem, you can go to GitHub to learn about 2920 3307.
    • Alternative Solutions
in fact, using graceful close to achieve the goal is: Let the service smooth upgrade, reduce the impact on users. So we can use the canary deployment to achieve, not necessarily envoy implementation. The approximate process:
    1. Define older versions of services (v1), new version (v2)
    2. Publish new version
    3. Slowly migrate traffic to V2 in a gradient manner
    4. Migration complete, run for a period of time, no problem close V1
    • Golang Graceful Exit HTTP Service
take this opportunity to learn about the graceful closure of the next Golang, which was supported by Golang in version 1.8.
net/http/server.go #2487func (srv *Server) Shutdown(ctx context.Context) error {  atomic.AddInt32(&srv.inShutdown, 1)  defer atomic.AddInt32(&srv.inShutdown, -1)  srv.mu.Lock()  // 把监听者关掉  lnerr := srv.closeListenersLocked()  srv.closeDoneChanLocked()    //执行开发定义的函数如果有  for _, f := range srv.onShutdown {    go f()  }    srv.mu.Unlock()  //定时查询是否有未关闭的链接  ticker := time.NewTicker(shutdownPollInterval)  defer ticker.Stop()  for {    if srv.closeIdleConns() {      return lnerr    }    select {    case <-ctx.Done():      return ctx.Err()    case <-ticker.C:    }  }}
in fact, Golang's closing mechanism and envoy on GitHub discuss graceful shutdown mechanisms very similar:

Golang mechanism

    1. Close Listener ( ln, err := net.Listen("tcp", addr) , nil to LN)
    2. Check to see if there are no closed links
    3. All links are exited, service exits

Envoy Mechanism:

    1. Ingress listeners stop accepting new connections (clients see TCP Connection refused) or continues to service E xisting connections. Egress listeners is completely unaffected
    2. configurable delay to allow workload to finish servicing existing con Nections
    3. envoy (and workload) both terminate
Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.