International - English

Cart Console

Topic Center

Contact Sales

Home > Developer > Go

Istio Source Analysis--pilot-agent How to manage envoy life cycle

Last Update:2018-06-04 Source: Internet

Author: User

Tags docker ps k8s

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Original: Istio source analysis--pilot-agent How to manage envoy life cycle

Statement

The source code for the analysis is 0.7.1 version
Environment for K8s
Because there is no C + + foundation, so the source analysis stops in C + +, but also learned a lot of things

What is Pilot-agent?

when we do kubectl apply -f <(~istioctl kube-inject -f sleep.yaml) , K8s will help us build 3 of containers.

[root@izwz9cffi0prthtem44cp9z ~]# docker ps |grep sleep8e0de7294922        istio/proxy                                                               ccddc800b2a2        registry.cn-shenzhen.aliyuncs.com/jukylin/sleep                          990868aa4a42        registry-vpc.cn-shenzhen.aliyuncs.com/acs/pause-amd64:3.0

In
these 3 containers, we are concerned istio/proxy . This container runs 2 services. pilot-agentThis is the following: How to manage the life cycle of envoy.

[root@izwz9cffi0prthtem44cp9z ~]# docker exec -it 8e0de7294922 ps -efUID        PID  PPID  C STIME TTY          TIME CMD1337         1     0  0 May09 ?        00:00:49 /usr/local/bin/pilot-agent proxy1337       567     1  1 09:18 ?        00:04:42 /usr/local/bin/envoy -c /etc/ist

Why do you use Pilot-agent?

envoy does not interact directly with K8s,consul,eureka and other platforms, so it requires other services to dock with them, manage the configuration, and Pilot-agent is one of the "control panels."

Start envoy

Load configuration

pilot-agent will generate a configuration file before booting:/etc/istio/proxy/envoy-rev0.json:

istio.io/istio/pilot/pkg/proxy/envoy/v1/config.go #88func BuildConfig(config meshconfig.ProxyConfig, pilotSAN []string) *Config {    ......    return out}

the specific contents of the file can be viewed directly inside the container file

docker exec -it 8e0de7294922 cat /etc/istio/proxy/envoy-rev0.json

the meaning of the configuration content can be seen in the official documentation

Startup parameters

A binary file boot will always require some parameters, envoy is no exception.

istio.io/istio/pilot/pkg/proxy/envoy/v1/watcher.go #274func (proxy envoy) args(fname string, epoch int) []string {    ......    return startupArgs}

envoy startup parameters can be docker logs 8e0de7294922 viewed by viewing, below is the parameters of intercepting envoy from the terminal. Understand the specific parameters meaning official website documentation.

-c /etc/istio/proxy/envoy-rev0.json --restart-epoch 0--drain-time-s 45 --parent-shutdown-time-s 60--service-cluster sleep --service-node sidecar~172.00.00.000~sleep-55b5877479-rwcct.default~default.svc.cluster.local --max-obj-name-len 189 -l info --v2-config-only

Start envoy

Pilot-agent uses exec.Command the boot envoy and listens for the envoy's running state (if envoy exits abnormally, status returns non-nil,pilot-agent a policy to restart envoy).
proxy.config.BinaryPathIs the envoy binary file path:/usr/local/bin/envoy.

argsThe envoy startup parameters that are described above.

istio.io/istio/pilot/pkg/proxy/envoy/v1/watcher.go #353func (proxy envoy) Run(config interface{}, epoch int, abort <-chan error) error {    ......    /* #nosec */    cmd := exec.Command(proxy.config.BinaryPath, args...)    cmd.Stdout = os.Stdout    cmd.Stderr = os.Stderr    if err := cmd.Start(); err != nil {      return err    }    ......    done := make(chan error, 1)    go func() {      done <- cmd.Wait()    }()    select {    case err := <-abort:      ......    case err := <-done:      return err    }}

Hot update envoy

here we only discuss pilot-agent how to get envoy hot update, as to how to trigger this step will be described in the following article.

Envoy Hot Update Policy

to learn more about Envoy's hot-update strategy, you can crossing the Web blog envoy Heat restart.
A brief introduction to the following Envoy hot update steps:

Start another envoy2 process (secondary)
Envoy2 notifies envoy1 (Primary process) to close its managed port, which is taken over by Envoy2
Bring envoy1 available listen sockets through the UDs.
Envoy2 initialization succeeds, notifies envoy1 to gracefully close a working request over a period of time ( drain-time-s )
Time ( parent-shutdown-time-s ), Envoy2 notifies envoy1 to close itself
Envoy2 Upgrade to Envoy1

from the above execution steps, Poilt-agent is only responsible for initiating another envoy process, and the other is handled by envoy itself.

When does the hot update take place?

when the poilt-agent is started, the files under the directory will be listened to, and /etc/certs/ if the files in this directory are modified or deleted, Poilt-agent will notify envoy for hot updates. As to how to trigger these files to be modified and deleted will be introduced in the next article.

istio.io/istio/pilot/pkg/proxy/envoy/v1/watcher.go #177func watchCerts(ctx context.Context, certsDirs []string, watchFileEventsFn watchFileEventsFn,    minDelay time.Duration, updateFunc func()) {    fw, err := fsnotify.NewWatcher()    if err != nil {        log.Warnf("failed to create a watcher for certificate files: %v", err)        return    }    defer func() {        if err := fw.Close(); err != nil {            log.Warnf("closing watcher encounters an error %v", err)        }    }()    // watch all directories    for _, d := range certsDirs {        if err := fw.Watch(d); err != nil {            log.Warnf("watching %s encounters an error %v", d, err)            return        }    }    watchFileEventsFn(ctx, fw.Event, minDelay, updateFunc)}

Hot Update Startup Parameters

-c /etc/istio/proxy/envoy-rev1.json --restart-epoch 1--drain-time-s 45 --parent-shutdown-time-s 60--service-cluster sleep --service-nodesidecar~172.00.00.000~sleep-898b65f84-pnsxr.default~default.svc.cluster.local --max-obj-name-len 189 -l info--v2-config-only

The
hot update start parameter and the first start parameter are different places are-C and--restart-epoch, in fact,-C just the configuration file name is different, their content is the same. --restart-epoch will increment by 1 each time a hot update is made, to determine if a hot update is being performed or to open an existing envoy (this should mean opening the envoy for the first time)
See the official description in detail

istio.io/istio/pilot/pkg/proxy/agent.go #258func (a *agent) reconcile() {    ......    // discover and increment the latest running epoch    epoch := a.latestEpoch() + 1    // buffer aborts to prevent blocking on failing proxy    abortCh := make(chan error, MaxAborts)    a.epochs[epoch] = a.desiredConfig    a.abortCh[epoch] = abortCh    a.currentConfig = a.desiredConfig    go a.waitForExit(a.desiredConfig, epoch, abortCh)}

Capture the hot update log from the terminal

2018-04-24T13:59:35.513160Z    info    watchFileEvents: "/etc/certs//..2018_04_24_13_59_35.824521609": CREATE2018-04-24T13:59:35.513228Z    info    watchFileEvents: "/etc/certs//..2018_04_24_13_59_35.824521609": MODIFY|ATTRIB2018-04-24T13:59:35.513283Z    info    watchFileEvents: "/etc/certs//..data_tmp": RENAME2018-04-24T13:59:35.513347Z    info    watchFileEvents: "/etc/certs//..data": CREATE2018-04-24T13:59:35.513372Z    info    watchFileEvents: "/etc/certs//..2018_04_24_04_30_11.964751916": DELETE

Rescue envoy

envoy is a service, since it is impossible to guarantee 100% of the availability of services, if envoy is not lucky to be down, then pilot-agent how to rescue, to ensure that envoy high availability?

Get exit status

in the above mentioned Pilot-agent start envoy, will listen to envoy exit status, found abnormal exit status, will rescue envoy.

func (proxy envoy) Run(config interface{}, epoch int, abort <-chan error) error {    ......    // Set if the caller is monitoring envoy, for example in tests or if envoy runs in same    // container with the app.    if proxy.errChan != nil {      // Caller passed a channel, will wait itself for termination      go func() {        proxy.errChan <- cmd.Wait()      }()      return nil    }    done := make(chan error, 1)    go func() {      done <- cmd.Wait()    }()    ......}

Rescue envoy

The
kill-9 can be used to simulate the envoy abnormal exit status. When an abnormal exit occurs, the Pilot-agent rescue mechanism is triggered. If the first rescue success, that of course is good, if failed, Pilot-agent will continue to rescue, up to 10 times, each time interval is 2 n time.millisecond. More than 10 times have not saved, Pilit-agent will give up rescue, announce death, and exit Istio/proxy, let k8s restart a new container.

Istio.io/istio/pilot/pkg/proxy/agent.go #164func (a *agent) Run (CTX context. Context) {... ... for {... ..... case Status: = <-a.statusch: ..... if Sta..... Tus.err = = Errabort {//pilot-agent notification exits or envoy abnormal exit log. Infof ("Epoch%d aborted", Status.epoch)} else if status.err! = Nil {//envoy Abnormal exit log. WARNF ("Epoch%d terminated with an error:%v", Status.epoch, Status.err) ... a.abortall ()} else {//exit log normally. Infof ("Epoch%d exited normally", Status.epoch)} ... if status.err! = Nil {//skip retrying twice by Checking retry Restart delay if A.retry.restart = = Nil {if A.retry.budget > 0 {delayduration: = A.retry.initialinterval * (1 << UINT (a.retry.maxretries-a.retry.budget)) Restart: = time. Now (). ADD (delayduration) A.retry.restart = &restart A.retry.budget = a.retry.budget-1 log.Infof ("Epoch%d:set retry delay to%v, budget to%d", Status.epoch, Delayduration, A.retry.budget)} else { Declare death, exit Istio/proxy log. Error ("Permanent Error:budget exhausted trying to fulfill the desired configuration") A.proxy.panic (A.desiredcon FIG) Return}} else {log. DEBUGF ("Epoch%d:restart already scheduled", Status.epoch)}} case <-time. After (delay): ... case _, more: = <-ctx.  Done (): ...} }}

istio.io/istio/pilot/pkg/proxy/agent.go #72var (  errAbort = errors.New("epoch aborted")  // DefaultRetry configuration for proxies  DefaultRetry = Retry{    MaxRetries:      10,    InitialInterval: 200 * time.Millisecond,  })

Rescue log

Epoch 6: set retry delay to 200ms, budget to 9Epoch 6: set retry delay to 400ms, budget to 8Epoch 6: set retry delay to 800ms, budget to 7

Graceful close envoy

Service offline or upgrade we all want them to be very gentle, so that users do not feel, to avoid disturbing users. This requires that the service receives an exit notification and finishes processing the task that is being performed before shutting down instead of directly. Does envoy support graceful shutdown? This requires k8s,pilot-agent also support this play. Because there is an association relationship k8s Management pilot-agent,pilot-agent management envoy.

K8s let the service gracefully exit

Online has a summary of the blog k8s graceful close pods, I am here to briefly introduce the graceful closing process:

K8s sends the SIGTERM signal to the 1th process of all services under Pods
After the service receives the signal, gracefully shuts down the task and exits
After a while (default 30s), if the service does not exit, K8s sends a SIGKILL signal to force the container to quit.

Pilot-agent let envoy gracefully quit

Pilot-agent receiving k8s Signal

Pilot-agent will receive syscall. SIGINT, Syscall. SIGTERM, these 2 signals can be achieved gracefully off the envoy effect.

istio.io/istio/pkg/cmd/cmd.go #29func WaitSignal(stop chan struct{}) {    sigs := make(chan os.Signal, 1)    signal.Notify(sigs, syscall.SIGINT, syscall.SIGTERM)    <-sigs    close(stop)    _ = log.Sync()}

Notify child service off envoy

in Golang There is a context Management Pack context , which notifies each sub-service to perform a shutdown operation by means of a broadcast.

  istio.io/istio/pilot/cmd/pilot-agent/main.go #242ctx, Cancel: = context. Withcancel (context. Background ()) Go watcher. Run (CTX) Stop: = Make (chan struct{}) cmd. Waitsignal (stop) <-stop//notifies the sub service cancel () Istio.io/istio/pilot/pkg/proxy/agent.gofunc (a *agent) Run (CTX context. Context) {...... for {... ...////Receive Master Service information Notification envoy Exit Case _, more: = <-ctx. Done (): if!more {a.terminate () return}}}}istio.io/istio/pilot/pkg/proxy/envoy/v1/watcher. Go #297func (proxy envoy) Run (config interface{}, epoch int, abort <-chan error) Error {... Select {case ERR: = <-abort:log. WARNF ("Aborting epoch%d", epoch)//Send kill signal to envoy if Errkill: = cmd. Process.kill (); Errkill! = Nil {log. WARNF ("Killing epoch%d caused an error%v", Epoch, Errkill)} return err ...}

The
above shows the process of pilot-agent receiving signals from the k8s to the notification envoy shutdown, which illustrates that poilt-agent is also support graceful shutdown. But eventually envoy does not gracefully shut down, which is okay with pilot-agent sending the kill signal, because envoy itself is not supported.

Envoy graceful off

Regret notice

It
's a pity to be here to inform you that Envoy can not gracefully close, envoy will receive SIGTERM,SIGHUP,SIGCHLD,SIGUSR1 4 signals, but these 4 are not related to elegance, these 4 signals can be seen in official documents. Of course, the official also noticed this problem, you can go to GitHub to learn about 2920 3307.

Alternative Solutions

in fact, using graceful close to achieve the goal is: Let the service smooth upgrade, reduce the impact on users. So we can use the canary deployment to achieve, not necessarily envoy implementation. The approximate process:

Define older versions of services (v1), new version (v2)
Publish new version
Slowly migrate traffic to V2 in a gradient manner
Migration complete, run for a period of time, no problem close V1

Golang Graceful Exit HTTP Service

take this opportunity to learn about the graceful closure of the next Golang, which was supported by Golang in version 1.8.

net/http/server.go #2487func (srv *Server) Shutdown(ctx context.Context) error {  atomic.AddInt32(&srv.inShutdown, 1)  defer atomic.AddInt32(&srv.inShutdown, -1)  srv.mu.Lock()  // 把监听者关掉  lnerr := srv.closeListenersLocked()  srv.closeDoneChanLocked()    //执行开发定义的函数如果有  for _, f := range srv.onShutdown {    go f()  }    srv.mu.Unlock()  //定时查询是否有未关闭的链接  ticker := time.NewTicker(shutdownPollInterval)  defer ticker.Stop()  for {    if srv.closeIdleConns() {      return lnerr    }    select {    case <-ctx.Done():      return ctx.Err()    case <-ticker.C:    }  }}

in fact, Golang's closing mechanism and envoy on GitHub discuss graceful shutdown mechanisms very similar:

Golang mechanism

Close Listener ( ln, err := net.Listen("tcp", addr) , nil to LN)
Check to see if there are no closed links
All links are exited, service exits

Envoy Mechanism:

Ingress listeners stop accepting new connections (clients see TCP Connection refused) or continues to service E xisting connections. Egress listeners is completely unaffected
configurable delay to allow workload to finish servicing existing con Nections
envoy (and workload) both terminate

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

application life cycle vulture life cycle how to manage bandwidth how to manage dns how to manage mysql life cycle extension radish life cycle

Golang client Sarama via SSL connection Kafka configuration 03-20

Golang private Key "encrypt" public key "decrypt" 07-01

Golang in net package usage (i) 06-17

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Istio Source Analysis--pilot-agent How to manage envoy life cycle

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support