Docker swarm principle Large decryption __docker

Source: Internet
Author: User
Tags docker swarm etcd
Docker Swarm principle large decryption

Docker Swarm is the Docker company's 2014 production of Docker based cluster Management scheduling tool, Official document address: https://docs.docker.com/swarm/overview/. Swarm can build a docker cluster of many main bodies, the user can manage the Docker on multiple hosts only and Swarm API operation, then combine the overlay network to realize simple container scheduling and exchange visits.

Docker swarm design in accordance with the pluggable design idea, install the cluster only need to start a few Docker can be completed, the installation process can refer to here: Http://www.tuicool.com/articles/UJJJFjU.

Summarize the characteristics of the following swarm:
1. Registration and discovery of work nodes
2. Management node collects all the information in the cluster
3. Management node supports HA
4. The management node can perceive the change of the cluster, and then reschedule the container after the node is hung off.
5. Provide filter and scheduler scheduling strategy for the container in the cluster

Below, this article will decrypt the swarm from the source level to achieve the above characteristics.

The first one is the overall architecture diagram.
The schema diagram from Daocloud.

Registration and discovery of http://blog.daocloud.io/wp-content/uploads/2015/01/swarmarchitecture.jpg work nodes

A node is registered on the Kvstore on the back end when the work node is started, and the path Etcd://ip:2376/docker/swarm/nodeip,worker the current cluster eth0
IP registration on the ETCD, and then set up a TTL time, such as 3 seconds. And then start a for loop every 2 seconds (configure heartbeat) to register once, so that if the node on the ETCD is gone, it means the worker is dead.

for {
        log. Withfields (log. fields{"addr": addr, "discovery": Dflag}). Infof ("Registering on the discovery service every%s ...", HB)
        If err: = D.register (addr); Err!= nil {
            log. Error (ERR)
        } time
        . Sleep (HB)
    }

Manager's leader will start a go router watch back-end Kvstore on the registered IP, if the new node is registered to join the node in the manager's memory, start collecting data, if the node is dead to delete

Discoverych, Errch: = Cluster.discovery.Watch (nil) go
cluster.monitordiscovery (Discoverych, errch)
go Cluster.monitorpendingengines ()


for {
        Select {case
        entries: = <-ch:
            added, removed: = Currententries.diff (entries)
            currententries = entries

        //Remove engines-i ' addengine ' 'll refuse to add an E Ngine
        //If there ' s already a engine with the same ID.  If a engine
        //changes address, we are have to the it then add it back.
            For _, Entry: = Range removed {
                c.removeengine (entry. String ())
            }

            for _, Entry: = Range added {
                c.addengine (entry. String ())
            } case
        Err: = <-errch:
            log. Errorf ("Discovery Error:%v", err)
        }
    }
Management node collects all information in the cluster

The management node collects information from all the hosts in the cluster into memory. When a host is added to the Swarm, it first collects all the information on the node into memory, and then creates a Docker client link to get updates on the host via the event API.

The code to join the host, first do the full synchronization of the host, and then start Eventmonitor, monitor the host event:

E.eventsmonitor = Neweventsmonitor (e.apiclient, E.handler)

//Fetch the engine labels.
If err: = E.updatespecs (); Err!= Nil {return
    err
}

e.startmonitorevents ()

//Force A state update before returning.
If err: = E.refreshcontainers (true); Err!= Nil {return
    err
}

If err: = E.refreshimages (); Err!= Nil {return
    err
}

//Does not CHEC K Error as older daemon doesn ' t support this call.
E.refreshvolumes ()
E.refreshnetworks ()

The handler of the event updates the corresponding type of data according to the category of the event. Because it's a bit of a long time to consider Docker event compatibility, I'll just put a paragraph:

Switch MSG. Type {case
"network":
    e.refreshnetwork (Msg. actor.id) Case
"volume":
    E.refreshvolume (Msg. actor.id) Case
"image":
    e.refreshimages () Case
"container":
    Action: = Msg. Action
    //Healthcheck events are like ' health_status:unhealthy '
    if strings. Hasprefix (Action, "Health_status") {
        action = "Health_status"
    }
    Switch Action {case
    "commit":
        //Commit a container would generate a new image
        E.refreshimages () Case
    "Die", "Kill", "Oom", "Pause", "sta RT "," Restart "," Stop "," Unpause "," Rename "," Update "," Health_status ":
        E.refreshcontainer (Msg.id, True)
    Case "Top", "resize", "Export", "Exec_create", "Exec_start", "Exec_detach", "Attach", "Detach", "Extract-to-dir", "Copy" , "Archive-path":
        //No action needed
    default:
        E.refreshcontainer (Msg.id, False)
admin node supports ha

Like many other distributed projects, Docker Swarm is also using the raft election algorithm ha, we look at its implementation.

First create good candidata and follower, by the way leader election path is Docker/swarm/leader

Client: = Kvdiscovery.store ()
P: = path. Join (Kvdiscovery.prefix (), Leaderelectionpath)

Candidate: = leadership. Newcandidate (client, p, addr, Leaderttl)
Follower: = leadership. Newfollower (client, p)

Then the two-way, one for the election, if successful, then become leader, a monitoring election success message, if the monitor to other manager become leader will set himself into candidate, If the API requests to the candidate proxy to the real manager.

Primary: = API. Newprimary (cluster, Tlsconfig, &statushandler{cluster, candidate, follower}, C.globalbool ("Debug"), C.bool ("Cors "))
replica: = API. Newreplica (primary, Tlsconfig) go

func () {for
    {
        run (cluster, candidate, server, primary, replica)
        Time. Sleep (defaultrecovertime)
    }
} () go

func () {to
    {
        follow (follower, replica, addr) time
        . Sleep (defaultrecovertime)
    }
} ()

server. SetHandler (primary)
The management node can perceive the change of the cluster, and then reschedule the container after the node is hung off.

Because the worker loop sends messages to the Kvstore, if the manager can immediately perceive the change when the node is hung up and trigger the Removeengine action, it is easy to reassign the container to the other nodes. scheduling strategy for filter and scheduler the container in the cluster

In fact, with all the nodes in the cluster information, the scheduling container becomes simpler. Swarm provides filter and scheduler to allow users to define the scheduling policy.

Scheduling is essentially a policy that allows users to define container assigned to a cluster.
filter specifies that a node that satisfies such a condition is not (will) be assigned.
SCHEDULER specifies what priority the node is to meet after the filter is sorted, and the top is container on the limited assignment.

The types of filter and scheduler I will not repeat, you can refer to the official documents: https://docs.docker.com/swarm/scheduler/rescheduling/#rescheduling-policies (seems to have recently had a new strategy Rescheduler)

The code for the dispatch is as follows:

Accepted, err: = filter. Applyfilters (s.filters, config, nodes, soft)
If Err!= nil {return
    nil, err
}

If Len (accepted) = 0 {
  
   return nil, errnonodeavailable
} return

s.strategy.rankandsort (config, accepted)
  

In the use of Docker swarm you can actually find that the swarm design is still some defects, which will lead to some of the limitations of swarm, such as:

1.worker behavior is too simple. Just go to the kvstore on the sync state, start a container, do not do any actual work, put all the job to the manager dry, rather wasteful.
2. As the worker "does nothing", the manager must maintain a TCP long link for all nodes, with poor scalability.
3. No copy control included.

In conclusion, swarm as a generation of Docker scheduling tools to provide the basic scheduling capabilities to meet some of the internal CI/CD system use, but due to poor scalability and no copy control, can not directly deploy online systems, this is some regret.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.