"Kubernetes/k8s Source Analysis" Kube-scheduler Source analysis

Source: Internet
Author: User
Tags k8s
Objective in the Kubernetes system, Scheduler is the only one in the plugin form of the module, this pluggable design to facilitate user-defined scheduling algorithm, so the source path for the plugin directory of CMD and pkg/scheduler

Scheduler is responsible for arranging the pod to the specific node, monitoring the Pods through the interface provided by API Server, acquiring the pod to be scheduled, ranking each node nodes according to a series of preselection strategies and optimization strategies, and then dispatching the pod to the node nodes with the highest score. The Kubelet is then responsible for creating the Pod.

The Kubernetes scheduling is divided into predicate (preselection) and Priority (preferably), divided into two processes: preselection: Traversing all node, filtering out the required node according to preselection, and if no node conforms to the predicate policy, the POD is suspended. Until node is able to meet all the strategies, preferably, on the basis of the first step, the highest score is obtained according to the preferred node scoring;


Structural Body in the Plugin/cmd/kube-scheduler/app/options.go file, Schedulerserver defines the context and parameters required to run a scheduler

Schedulerserver has all "context" and params needed to run a Scheduler
{
       componentconfig. Kubeschedulerconfiguration//Master is the ' address ' of the
       kubernetes API server (Overrides any
       //value in Kubeco Nfig).
       Master string
       //Kubeconfig is Path to kubeconfig file with authorization and master
       //location information.
       Kubeconfig string
       //Dynamic Conifguration for Scheduler features.
}



Kubernetes Scheduler Start Process      One. Entry main function
The path plugin/cmd/kube-scheduler/scheduler.go, initializes a schedulerserver, defines the command-line arguments, and the main logic is to perform the second chapter of Run
FuncMain () {
       s: = options. Newschedulerserver ()
       s.addflags (pflag.commandline)

       flag. Initflags ()
       logs. Initlogs ()
       logs. Flushlogs ()

       Verflag. printandexitifrequested ()

       err: = App. Run (s); Err!= Nil {
              glog. Fatalf ("Scheduler app failed to run:%v", err)
       }
}

Addflags defines command line arguments a heap of piles, watching all the noise
Addflags adds flags for a specific schedulerserver to the specified flagset
(S *schedulerserver) addflags ( FS *pflag. Flagset) {
       fs. Int32var (&s.port, "Port", S.port, "the port that scheduler ' s HTTP service runs on")
       FS. Stringvar (&s.address, "address", S.address, "the IP addresses to serve in (set to 0.0.0.0 for all Interfaces)")
       FS.S Tringvar (&s.algorithmprovider, "Algorithm-provider", S.algorithmprovider, "The scheduling algorithm to Use, one of: "+factory." Listalgorithmproviders ())
       ...
       leaderelection. Bindflags (&s.leaderelection, FS)
       utilfeature. Defaultfeaturegate.addflag (FS)
}
Two. Run function
Path Plugin/cmd/kube-scheduler/app/server.go
Func

2.1 createclient Create a client connection to master
KUBECLI, err: = CreateClient (s)
err!= nil {
       fmt. Errorf ("Unable to create Kube client:%v", err)
}

2.2 Createrecorder Creates an event broadcaster that sends scheduled information to node in the cluster.
Recorder: = Createrecorder (KUBECLI, s)

2.3 Createscheduler Create a scheduler server
Sched, err: = Createscheduler (
       s,
       kubecli,
       informerfactory.core (). V1 (). Nodes (),
       Podinformer,
       informerfactory.core (). V1 (). Persistentvolumes (),
       Informerfactory.core (). V1 (). Persistentvolumeclaims (),
       Informerfactory.core (). V1 (). Replicationcontrollers (),
       informerfactory.extensions (). V1beta1 (). Replicasets (),
       Informerfactory.apps (). V1beta1 (). Statefulsets (),
       Informerfactory.core (). V1 (). Services (),
       Recorder,
)

2.4 Starthttp creates HTTP services for performance analysis, performance metrics,/DEBUG/PPROF interface for performance data collection,/metrics interface for Prometheus collection of monitoring data.
func Starthttp (S *options. Schedulerserver) {mux: = http. Newservemux () Healthz. Installhandler (MUX)ifs.enableprofiling {mux. Handlefunc ("/debug/pprof/", Pprof. Index) Mux. Handlefunc ("/debug/pprof/profile", Pprof. Profile) Mux. Handlefunc ("/debug/pprof/symbol", Pprof. Symbol) Mux. Handlefunc ("/debug/pprof/trace", Pprof. Trace)ifs.enablecontentionprofiling {goruntime. Setblockprofilerate (1)}}ifC, err: = Configz. New ("Componentconfig"); Err = = Nil {c.set (s.kubeschedulerconfiguration)}Else{Glog. Errorf ("Unable to register Configz:%s", err)} configz. Installhandler (MUX) mux. Handle ("/metrics", Prometheus. Handler ()) Server: = &http. server{addr:net. Joinhostport (S.address, StrConv. itoa (int (s.port))), Handler:mux, Glog. Fatal (server. Listenandserve ())}

2.5 completed from the election into the key executive Body sched. The Run main logical function is the Scheduleone path plugin/pkg/scheduler/scheduler.go, starts the goroutine, loops executes the Scheduleone method, until receives the shut down signal
func (_ <-chan struct{}) {
       sched. Run ()
       {}
}

!s.leaderelection.leaderelect {
       run (nil)
       Panic ("unreachable")
}

2.6 Scheduleone Each time to select a pod for processing, using the Scheduler function (2.7) for preselection (predicate) and optimization (priority), select a suitable host, the pod and host Making binding Associations
func(Sched *scheduler) Scheduleone () {pod: = Sched.config.NextPod () suggestedhost, err: = Sched.schedule (pod) Metrics. Schedulingalgorithmlatency.observe (metrics. Sinceinmicroseconds (start) Assumedpod: = *pod//Assume modifies ' assumedpod ' by setting Nodename=suggeste Dhost err = Sched.assume (&assumedpod, Suggestedhost)//Bind the pod to its host asynchronously (W E can do this b/c of the assumption step above).Go func() {Err: = Sched.bind (&assumedpod, &v1.) binding{Objectmeta:metav1. Objectmeta{namespace:assumedpod.namespace, Name:assumedPod.Name, UID:assumedPod.UID}, TARGET:V1.O
                     bjectreference{Kind: "Node", Name:suggestedhost, },}) metrics. E2eschedulinglatency.observe (metrics. Sinceinmicroseconds (Start)} ()}

2.7 Schedule mainly uses the Scheduler method in the call interface Scheduleralgorithm, the startup will have a default function registration (explained in the third chapter), here are three main contents,Findnodesthatfit: Filters eligible node lists according to all preselection algorithms prioritizenodes: prioritizing the nodes that match, and a sorted list of Selecthost: Select an optimal node for the preferred nodes list
Schedule tries to Schedule the given pod to one's node in the node list.
If it succeeds, it would return the name of the node. If it fails, it would return a fiterror error with reasons.func(g *genericscheduler) Schedule (pod *v1. Pod, Nodelister algorithm. Nodelister) (string, error) {nodes, err: = Nodelister.list () filterednodes, Failedpredicatemap, E RR: = Findnodesthatfit (pod, g.cachednodeinfomap, nodes, G.predicates, G.extenders, G.predicatemetaproducer, G.equivalencecache)ifLen (filterednodes) = = 0 { return"", &fiterror{pod:pod, Failedpredicates:failedpredicatemap, }} trace. Step ("prioritizing") Metaprioritiesinterface: = G.prioritymetaproducer (pod, g.cachednodeinfomap) priorityLis T, err: = Prioritizenodes (pod, G.cachednodeinfomap, Metaprioritiesinterface, G.prioritizers, Filterednodes, G.extenders) trace. Step ("Selecting Host") returnG.selecthost (Prioritylist)}

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.