This is a creation in Article, where the information may have evolved or changed.
AgentThe manager communication between and is session carried out, the following is the agent.session structure definition:
// session encapsulates one round of registration with the manager. session// starts the registration and heartbeat control cycle. Any failure will result// in a complete shutdown of the session and it must be reestablished.//// All communication with the master is done through session. Changes that// flow into the agent, such as task assignment, are called back into the// agent through errs, messages and tasks.type session struct { agent *Agent sessionID string session api.Dispatcher_SessionClient errs chan error messages chan *api.SessionMessage tasks chan *api.TasksMessage registered chan struct{} // closed registration closed chan struct{}}
(1) registered channel is used to notify that the agent registration has been manager successful:
func (s *session) run(ctx context.Context, delay time.Duration) { time.Sleep(delay) // delay before registering. if err := s.start(ctx); err != nil { select { case s.errs <- err: case <-s.closed: case <-ctx.Done(): } return } ctx = log.WithLogger(ctx, log.G(ctx).WithField("session.id", s.sessionID)) go runctx(ctx, s.closed, s.errs, s.heartbeat) go runctx(ctx, s.closed, s.errs, s.watch) go runctx(ctx, s.closed, s.errs, s.listen) close(s.registered)}
session.runfunction, if session.start() there is no problem running, it will be in the last close registered one channel . And in the Agent.Run() :
func (a *Agent) run(ctx context.Context) { ..... session = newSession(ctx, a, backoff) // start the initial session registered = session.registeredfor { select { ...... case <-registered: log.G(ctx).Debugln("agent: registered") if ready != nil { close(ready) } ready = nil registered = nil // we only care about this once per session backoff = 0 // reset backoff sessionq = a.sessionq ...... }}
Once it is registered close , <-registered this case will be executed immediately.
(2) When session there is an error in the operation, it will be error sent to errs channel . In the Agent.Run() :
case err := <-session.errs: // TODO(stevvooe): This may actually block if a session is closed // but no error was sent. Session.close must only be called here // for this to work. if err != nil { log.G(ctx).WithError(err).Error("agent: session failed") backoff = initialSessionFailureBackoff + 2*backoff if backoff > maxSessionFailureBackoff { backoff = maxSessionFailureBackoff } } if err := session.close(); err != nil { log.G(ctx).WithError(err).Error("agent: closing session failed") } sessionq = nil // if we're here before <-registered, do nothing for that event registered = nil // Bounce the connection. if a.config.Picker != nil { a.config.Picker.Reset() }
erroronce received, this will be closed session and some cleanup work done.
(3) messages channel to receive manager agent the message sent to the Agent.run() function for processing:
case msg := <-session.messages: if err := a.handleSessionMessage(ctx, msg); err != nil { log.G(ctx).WithError(err).Error("session message handler failed") }
(4) tasks channel to receive manager the information sent to the agent need to run on this node , the task same needs to be transferred to the Agent.run() function for processing:
case msg := <-session.tasks: if err := a.worker.Assign(ctx, msg.Tasks); err != nil { log.G(ctx).WithError(err).Error("task assignment failed") }
(5) closed channel session.close() is closed in the function. That is case err := <-session.errs: , in this branch, it executes. Once it closed channel is closed, the connection is re-established:
case <-session.closed: log.G(ctx).Debugf("agent: rebuild session") // select a session registration delay from backoff range. delay := time.Duration(rand.Int63n(int64(backoff))) session = newSession(ctx, a, delay) registered = session.registered sessionq = a.sessionq
Look at session.start() this function again:
Start begins the session and returns the first Sessionmessage.func (S *session) Start (CTX context. Context) Error {log. G (CTX). DEBUGF ("(*session). Start") Client: = API. Newdispatcherclient (s.agent.config.conn) description, err: = S.agent.config.executor.describe (CTX) if err! = Nil { Log. G (CTX). Witherror (ERR). Withfield ("executor", S.agent.config.executor). Errorf ("Node description unavailable") Return err}//Override hostname if s.agent.config.hostname! = "" {description. Hostname = S.agent.config.hostname} Errchan: = Make (chan error, 1) var (msg*api. Sessionmessage Stream API. dispatcher_sessionclient)//note:we don ' t defer cancellation of this context, because the//streaming RPC is Used after this function returned. We only Cancel//it in the timeout case to make sure the goroutine completes. Sessionctx, Cancelsession: = context. Withcancel (CTX)//need to run Session in a goroutine sInce there's no-to-set a//timeout for the individual Recv call in a stream. Go func () {stream, err = client. Session (Sessionctx, &api. sessionrequest{Description:description,}) if err! = Nil {Errchan <-err return} msg, err = stream. RECV () Errchan <-Err} () Select {Case ERR: = <-errchan:if Err! = Nil {return E RR} case <-time. After (dispatcherrpctimeout): Cancelsession () return errors. New ("Session Initiation timed Out")} S.sessionid = Msg. SessionID s.session = stream return S.handlesessionmessage (CTX, MSG)}
(1)
client := api.NewDispatcherClient(s.agent.config.Conn) description, err := s.agent.config.Executor.Describe(ctx) if err != nil { log.G(ctx).WithError(err).WithField("executor", s.agent.config.Executor). Errorf("node description unavailable") return err } // Override hostname if s.agent.config.Hostname != "" { description.Hostname = s.agent.config.Hostname }
api.NewDispatcherClient()the definition of the function and the type it returns is as follows:
type dispatcherClient struct { cc *grpc.ClientConn } func NewDispatcherClient(cc *grpc.ClientConn) DispatcherClient { return &dispatcherClient{cc} }
s.agent.config.Connis the Node.runAgent() direct connection previously obtained in the function through the following code manager GRPC :
conn, err := grpc.Dial(manager.Addr, grpc.WithPicker(picker), grpc.WithTransportCredentials(creds), grpc.WithBackoffMaxDelay(maxSessionFailureBackoff))
s.agent.config.Executor.Describe()Returns a description of the current node (type: *api.NodeDescription ).
(2)
errChan := make(chan error, 1) var ( msg*api.SessionMessage stream api.Dispatcher_SessionClient ) // Note: we don't defer cancellation of this context, because the // streaming RPC is used after this function returned. We only cancel // it in the timeout case to make sure the goroutine completes. sessionCtx, cancelSession := context.WithCancel(ctx) // Need to run Session in a goroutine since there's no way to set a // timeout for an individual Recv call in a stream. go func() { stream, err = client.Session(sessionCtx, &api.SessionRequest{ Description: description, }) if err != nil { errChan <- err return } msg, err = stream.Recv() errChan <- err }()
And the dispatcherClient.Session() code is as follows:
func (c *dispatcherClient) Session(ctx context.Context, in *SessionRequest, opts ...grpc.CallOption) (Dispatcher_SessionClient, error) { stream, err := grpc.NewClientStream(ctx, &_Dispatcher_serviceDesc.Streams[0], c.cc, "/docker.swarmkit.v1.Dispatcher/Session", opts...) if err != nil { return nil, err } x := &dispatcherSessionClient{stream} if err := x.ClientStream.SendMsg(in); err != nil { return nil, err } if err := x.ClientStream.CloseSend(); err != nil { return nil, err } return x, nil}
Returns a Dispatcher_SessionClient interface variable of the type that matches:
type Dispatcher_SessionClient interface { Recv() (*SessionMessage, error) grpc.ClientStream}
grpc.NewClientStream()The function returns grpc.ClientStream interface , and is dispatcherSessionClient defined as follows:
type dispatcherSessionClient struct { grpc.ClientStream}
To satisfy the Dispatcher_SessionClient interface definition, the dispatcherSessionClient struct also implements the Recv method:
func (x *dispatcherSessionClient) Recv() (*SessionMessage, error) { m := new(SessionMessage) if err := x.ClientStream.RecvMsg(m); err != nil { return nil, err } return m, nil}
x.ClientStream.SendMsg()is sent SessionRequest , and it contains only one of NodeDescription :
// SessionRequest starts a session.type SessionRequest struct { Description *NodeDescription `protobuf:"bytes,1,opt,name=description" json:"description,omitempty"`}
x.ClientStream.CloseSend()Indicates that all the send operations have completed.
Following manager the message received, send err to errChan :
msg, err = stream.Recv()errChan <- err
(3)
select { case err := <-errChan: if err != nil { return err } case <-time.After(dispatcherRPCTimeout): cancelSession() return errors.New("session initiation timed out") } s.sessionID = msg.SessionID s.session = stream return s.handleSessionMessage(ctx, msg)
The goroutine initial blocking is select , once the correct response is received, session the initialization is completed. Then continue waiting for the task to be manager assigned.
Once session.start() successful, another one will be 3 launched goroutine :
go runctx(ctx, s.closed, s.errs, s.heartbeat)go runctx(ctx, s.closed, s.errs, s.watch)go runctx(ctx, s.closed, s.errs, s.listen)
session.heartbeat()A new variable is created dispatcherClient , and then the 1 request is sent after a second, and it api.HeartbeatRequest manager api.HeartbeatResponse is returned, telling how often it will agent heartbeat be sent, and the default time is now 5 seconds.
session.watch()A new variable is created dispatcherTasksClient , and then a api.TasksRequest request is sent to inform itself that it manager is already ready . Next, block the Recv() function and wait for the request to be manager sent task .
session.listen()Multiplexing session.session variables, blocking in Recv() functions, waiting to be manager sent SessionMessage , and then processing.