"Kubernetes/k8s Source Analysis" Kube-scheduler Source analysis

Last Update:2018-07-17 Source: Internet

Author: User

Tags k8s

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Objective in the Kubernetes system, Scheduler is the only one in the plugin form of the module, this pluggable design to facilitate user-defined scheduling algorithm, so the source path for the plugin directory of CMD and pkg/scheduler

Scheduler is responsible for arranging the pod to the specific node, monitoring the Pods through the interface provided by API Server, acquiring the pod to be scheduled, ranking each node nodes according to a series of preselection strategies and optimization strategies, and then dispatching the pod to the node nodes with the highest score. The Kubelet is then responsible for creating the Pod.

The Kubernetes scheduling is divided into predicate (preselection) and Priority (preferably), divided into two processes: preselection: Traversing all node, filtering out the required node according to preselection, and if no node conforms to the predicate policy, the POD is suspended. Until node is able to meet all the strategies, preferably, on the basis of the first step, the highest score is obtained according to the preferred node scoring;

Structural Body in the Plugin/cmd/kube-scheduler/app/options.go file, Schedulerserver defines the context and parameters required to run a scheduler

Schedulerserver has all "context" and params needed to run a Scheduler
{
       componentconfig. Kubeschedulerconfiguration//Master is the ' address ' of the
       kubernetes API server (Overrides any
       //value in Kubeco Nfig).
       Master string
       //Kubeconfig is Path to kubeconfig file with authorization and master
       //location information.
       Kubeconfig string
       //Dynamic Conifguration for Scheduler features.
}

Kubernetes Scheduler Start Process One. Entry main function
The path plugin/cmd/kube-scheduler/scheduler.go, initializes a schedulerserver, defines the command-line arguments, and the main logic is to perform the second chapter of Run

FuncMain () {
       s: = options. Newschedulerserver ()
       s.addflags (pflag.commandline)

       flag. Initflags ()
       logs. Initlogs ()
       logs. Flushlogs ()

       Verflag. printandexitifrequested ()

       err: = App. Run (s); Err!= Nil {
              glog. Fatalf ("Scheduler app failed to run:%v", err)
       }
}

Addflags defines command line arguments a heap of piles, watching all the noise

Addflags adds flags for a specific schedulerserver to the specified flagset
(S *schedulerserver) addflags ( FS *pflag. Flagset) {
       fs. Int32var (&s.port, "Port", S.port, "the port that scheduler ' s HTTP service runs on")
       FS. Stringvar (&s.address, "address", S.address, "the IP addresses to serve in (set to 0.0.0.0 for all Interfaces)")
       FS.S Tringvar (&s.algorithmprovider, "Algorithm-provider", S.algorithmprovider, "The scheduling algorithm to Use, one of: "+factory." Listalgorithmproviders ())
       ...
       leaderelection. Bindflags (&s.leaderelection, FS)
       utilfeature. Defaultfeaturegate.addflag (FS)
}

Two. Run function
Path Plugin/cmd/kube-scheduler/app/server.go

Func

2.1 createclient Create a client connection to master

KUBECLI, err: = CreateClient (s)
err!= nil {
       fmt. Errorf ("Unable to create Kube client:%v", err)
}

2.2 Createrecorder Creates an event broadcaster that sends scheduled information to node in the cluster.

Recorder: = Createrecorder (KUBECLI, s)

2.3 Createscheduler Create a scheduler server

Sched, err: = Createscheduler (
       s,
       kubecli,
       informerfactory.core (). V1 (). Nodes (),
       Podinformer,
       informerfactory.core (). V1 (). Persistentvolumes (),
       Informerfactory.core (). V1 (). Persistentvolumeclaims (),
       Informerfactory.core (). V1 (). Replicationcontrollers (),
       informerfactory.extensions (). V1beta1 (). Replicasets (),
       Informerfactory.apps (). V1beta1 (). Statefulsets (),
       Informerfactory.core (). V1 (). Services (),
       Recorder,
)

2.4 Starthttp creates HTTP services for performance analysis, performance metrics,/DEBUG/PPROF interface for performance data collection,/metrics interface for Prometheus collection of monitoring data.

func Starthttp (S *options. Schedulerserver) {mux: = http. Newservemux () Healthz. Installhandler (MUX)ifs.enableprofiling {mux. Handlefunc ("/debug/pprof/", Pprof. Index) Mux. Handlefunc ("/debug/pprof/profile", Pprof. Profile) Mux. Handlefunc ("/debug/pprof/symbol", Pprof. Symbol) Mux. Handlefunc ("/debug/pprof/trace", Pprof. Trace)ifs.enablecontentionprofiling {goruntime. Setblockprofilerate (1)}}ifC, err: = Configz. New ("Componentconfig"); Err = = Nil {c.set (s.kubeschedulerconfiguration)}Else{Glog. Errorf ("Unable to register Configz:%s", err)} configz. Installhandler (MUX) mux. Handle ("/metrics", Prometheus. Handler ()) Server: = &http. server{addr:net. Joinhostport (S.address, StrConv. itoa (int (s.port))), Handler:mux, Glog. Fatal (server. Listenandserve ())}

2.5 completed from the election into the key executive Body sched. The Run main logical function is the Scheduleone path plugin/pkg/scheduler/scheduler.go, starts the goroutine, loops executes the Scheduleone method, until receives the shut down signal

func (_ <-chan struct{}) {
       sched. Run ()
       {}
}

!s.leaderelection.leaderelect {
       run (nil)
       Panic ("unreachable")
}

2.6 Scheduleone Each time to select a pod for processing, using the Scheduler function (2.7) for preselection (predicate) and optimization (priority), select a suitable host, the pod and host Making binding Associations

func(Sched *scheduler) Scheduleone () {pod: = Sched.config.NextPod () suggestedhost, err: = Sched.schedule (pod) Metrics. Schedulingalgorithmlatency.observe (metrics. Sinceinmicroseconds (start) Assumedpod: = *pod//Assume modifies ' assumedpod ' by setting Nodename=suggeste Dhost err = Sched.assume (&assumedpod, Suggestedhost)//Bind the pod to its host asynchronously (W E can do this b/c of the assumption step above).Go func() {Err: = Sched.bind (&assumedpod, &v1.) binding{Objectmeta:metav1. Objectmeta{namespace:assumedpod.namespace, Name:assumedPod.Name, UID:assumedPod.UID}, TARGET:V1.O
                     bjectreference{Kind: "Node", Name:suggestedhost, },}) metrics. E2eschedulinglatency.observe (metrics. Sinceinmicroseconds (Start)} ()}

2.7 Schedule mainly uses the Scheduler method in the call interface Scheduleralgorithm, the startup will have a default function registration (explained in the third chapter), here are three main contents,Findnodesthatfit: Filters eligible node lists according to all preselection algorithms prioritizenodes: prioritizing the nodes that match, and a sorted list of Selecthost: Select an optimal node for the preferred nodes list

Schedule tries to Schedule the given pod to one's node in the node list.
If it succeeds, it would return the name of the node. If it fails, it would return a fiterror error with reasons.func(g *genericscheduler) Schedule (pod *v1. Pod, Nodelister algorithm. Nodelister) (string, error) {nodes, err: = Nodelister.list () filterednodes, Failedpredicatemap, E RR: = Findnodesthatfit (pod, g.cachednodeinfomap, nodes, G.predicates, G.extenders, G.predicatemetaproducer, G.equivalencecache)ifLen (filterednodes) = = 0 { return"", &fiterror{pod:pod, Failedpredicates:failedpredicatemap, }} trace. Step ("prioritizing") Metaprioritiesinterface: = G.prioritymetaproducer (pod, g.cachednodeinfomap) priorityLis T, err: = Prioritizenodes (pod, G.cachednodeinfomap, Metaprioritiesinterface, G.prioritizers, Filterednodes, G.extenders) trace. Step ("Selecting Host") returnG.selecthost (Prioritylist)}

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More