This is a creation in Article, where the information may have evolved or changed.
Agent
The manager
communication between and is session
carried out, the following is the agent.session
structure definition:
// session encapsulates one round of registration with the manager. session// starts the registration and heartbeat control cycle. Any failure will result// in a complete shutdown of the session and it must be reestablished.//// All communication with the master is done through session. Changes that// flow into the agent, such as task assignment, are called back into the// agent through errs, messages and tasks.type session struct { agent *Agent sessionID string session api.Dispatcher_SessionClient errs chan error messages chan *api.SessionMessage tasks chan *api.TasksMessage registered chan struct{} // closed registration closed chan struct{}}
(1) registered channel
is used to notify that the agent
registration has been manager
successful:
func (s *session) run(ctx context.Context, delay time.Duration) { time.Sleep(delay) // delay before registering. if err := s.start(ctx); err != nil { select { case s.errs <- err: case <-s.closed: case <-ctx.Done(): } return } ctx = log.WithLogger(ctx, log.G(ctx).WithField("session.id", s.sessionID)) go runctx(ctx, s.closed, s.errs, s.heartbeat) go runctx(ctx, s.closed, s.errs, s.watch) go runctx(ctx, s.closed, s.errs, s.listen) close(s.registered)}
session.run
function, if session.start()
there is no problem running, it will be in the last close registered
one channel
. And in the Agent.Run()
:
func (a *Agent) run(ctx context.Context) { ..... session = newSession(ctx, a, backoff) // start the initial session registered = session.registeredfor { select { ...... case <-registered: log.G(ctx).Debugln("agent: registered") if ready != nil { close(ready) } ready = nil registered = nil // we only care about this once per session backoff = 0 // reset backoff sessionq = a.sessionq ...... }}
Once it is registered
close
, <-registered
this case
will be executed immediately.
(2) When session
there is an error in the operation, it will be error
sent to errs channel
. In the Agent.Run()
:
case err := <-session.errs: // TODO(stevvooe): This may actually block if a session is closed // but no error was sent. Session.close must only be called here // for this to work. if err != nil { log.G(ctx).WithError(err).Error("agent: session failed") backoff = initialSessionFailureBackoff + 2*backoff if backoff > maxSessionFailureBackoff { backoff = maxSessionFailureBackoff } } if err := session.close(); err != nil { log.G(ctx).WithError(err).Error("agent: closing session failed") } sessionq = nil // if we're here before <-registered, do nothing for that event registered = nil // Bounce the connection. if a.config.Picker != nil { a.config.Picker.Reset() }
error
once received, this will be closed session
and some cleanup work done.
(3) messages channel
to receive manager
agent
the message sent to the Agent.run()
function for processing:
case msg := <-session.messages: if err := a.handleSessionMessage(ctx, msg); err != nil { log.G(ctx).WithError(err).Error("session message handler failed") }
(4) tasks channel
to receive manager
the information sent to the agent
need to run on this node
, the task
same needs to be transferred to the Agent.run()
function for processing:
case msg := <-session.tasks: if err := a.worker.Assign(ctx, msg.Tasks); err != nil { log.G(ctx).WithError(err).Error("task assignment failed") }
(5) closed channel
session.close()
is closed in the function. That is case err := <-session.errs:
, in this branch, it executes. Once it closed channel
is closed, the connection is re-established:
case <-session.closed: log.G(ctx).Debugf("agent: rebuild session") // select a session registration delay from backoff range. delay := time.Duration(rand.Int63n(int64(backoff))) session = newSession(ctx, a, delay) registered = session.registered sessionq = a.sessionq
Look at session.start()
this function again:
Start begins the session and returns the first Sessionmessage.func (S *session) Start (CTX context. Context) Error {log. G (CTX). DEBUGF ("(*session). Start") Client: = API. Newdispatcherclient (s.agent.config.conn) description, err: = S.agent.config.executor.describe (CTX) if err! = Nil { Log. G (CTX). Witherror (ERR). Withfield ("executor", S.agent.config.executor). Errorf ("Node description unavailable") Return err}//Override hostname if s.agent.config.hostname! = "" {description. Hostname = S.agent.config.hostname} Errchan: = Make (chan error, 1) var (msg*api. Sessionmessage Stream API. dispatcher_sessionclient)//note:we don ' t defer cancellation of this context, because the//streaming RPC is Used after this function returned. We only Cancel//it in the timeout case to make sure the goroutine completes. Sessionctx, Cancelsession: = context. Withcancel (CTX)//need to run Session in a goroutine sInce there's no-to-set a//timeout for the individual Recv call in a stream. Go func () {stream, err = client. Session (Sessionctx, &api. sessionrequest{Description:description,}) if err! = Nil {Errchan <-err return} msg, err = stream. RECV () Errchan <-Err} () Select {Case ERR: = <-errchan:if Err! = Nil {return E RR} case <-time. After (dispatcherrpctimeout): Cancelsession () return errors. New ("Session Initiation timed Out")} S.sessionid = Msg. SessionID s.session = stream return S.handlesessionmessage (CTX, MSG)}
(1)
client := api.NewDispatcherClient(s.agent.config.Conn) description, err := s.agent.config.Executor.Describe(ctx) if err != nil { log.G(ctx).WithError(err).WithField("executor", s.agent.config.Executor). Errorf("node description unavailable") return err } // Override hostname if s.agent.config.Hostname != "" { description.Hostname = s.agent.config.Hostname }
api.NewDispatcherClient()
the definition of the function and the type it returns is as follows:
type dispatcherClient struct { cc *grpc.ClientConn } func NewDispatcherClient(cc *grpc.ClientConn) DispatcherClient { return &dispatcherClient{cc} }
s.agent.config.Conn
is the Node.runAgent()
direct connection previously obtained in the function through the following code manager
GRPC
:
conn, err := grpc.Dial(manager.Addr, grpc.WithPicker(picker), grpc.WithTransportCredentials(creds), grpc.WithBackoffMaxDelay(maxSessionFailureBackoff))
s.agent.config.Executor.Describe()
Returns a description of the current node
(type: *api.NodeDescription
).
(2)
errChan := make(chan error, 1) var ( msg*api.SessionMessage stream api.Dispatcher_SessionClient ) // Note: we don't defer cancellation of this context, because the // streaming RPC is used after this function returned. We only cancel // it in the timeout case to make sure the goroutine completes. sessionCtx, cancelSession := context.WithCancel(ctx) // Need to run Session in a goroutine since there's no way to set a // timeout for an individual Recv call in a stream. go func() { stream, err = client.Session(sessionCtx, &api.SessionRequest{ Description: description, }) if err != nil { errChan <- err return } msg, err = stream.Recv() errChan <- err }()
And the dispatcherClient.Session()
code is as follows:
func (c *dispatcherClient) Session(ctx context.Context, in *SessionRequest, opts ...grpc.CallOption) (Dispatcher_SessionClient, error) { stream, err := grpc.NewClientStream(ctx, &_Dispatcher_serviceDesc.Streams[0], c.cc, "/docker.swarmkit.v1.Dispatcher/Session", opts...) if err != nil { return nil, err } x := &dispatcherSessionClient{stream} if err := x.ClientStream.SendMsg(in); err != nil { return nil, err } if err := x.ClientStream.CloseSend(); err != nil { return nil, err } return x, nil}
Returns a Dispatcher_SessionClient interface
variable of the type that matches:
type Dispatcher_SessionClient interface { Recv() (*SessionMessage, error) grpc.ClientStream}
grpc.NewClientStream()
The function returns grpc.ClientStream interface
, and is dispatcherSessionClient
defined as follows:
type dispatcherSessionClient struct { grpc.ClientStream}
To satisfy the Dispatcher_SessionClient interface
definition, the dispatcherSessionClient
struct also implements the Recv
method:
func (x *dispatcherSessionClient) Recv() (*SessionMessage, error) { m := new(SessionMessage) if err := x.ClientStream.RecvMsg(m); err != nil { return nil, err } return m, nil}
x.ClientStream.SendMsg()
is sent SessionRequest
, and it contains only one of NodeDescription
:
// SessionRequest starts a session.type SessionRequest struct { Description *NodeDescription `protobuf:"bytes,1,opt,name=description" json:"description,omitempty"`}
x.ClientStream.CloseSend()
Indicates that all the send operations have completed.
Following manager
the message received, send err
to errChan
:
msg, err = stream.Recv()errChan <- err
(3)
select { case err := <-errChan: if err != nil { return err } case <-time.After(dispatcherRPCTimeout): cancelSession() return errors.New("session initiation timed out") } s.sessionID = msg.SessionID s.session = stream return s.handleSessionMessage(ctx, msg)
The goroutine
initial blocking is select
, once the correct response is received, session
the initialization is completed. Then continue waiting for the task to be manager
assigned.
Once session.start()
successful, another one will be 3
launched goroutine
:
go runctx(ctx, s.closed, s.errs, s.heartbeat)go runctx(ctx, s.closed, s.errs, s.watch)go runctx(ctx, s.closed, s.errs, s.listen)
session.heartbeat()
A new variable is created dispatcherClient
, and then the 1
request is sent after a second, and it api.HeartbeatRequest
manager
api.HeartbeatResponse
is returned, telling how often it will agent
heartbeat
be sent, and the default time is now 5
seconds.
session.watch()
A new variable is created dispatcherTasksClient
, and then a api.TasksRequest
request is sent to inform itself that it manager
is already ready
. Next, block the Recv()
function and wait for the request to be manager
sent task
.
session.listen()
Multiplexing session.session
variables, blocking in Recv()
functions, waiting to be manager
sent SessionMessage
, and then processing.