Kubernetes Dispatch Detailed

Source: Internet
Author: User
Tags etcd


Wen/Sky Cloud software Cloud Platform development engineer Zhang Wei

Kubernetes cpu limit

Zhang Wei is mainly responsible for the cloud software Skyform Cloud Platform design and development work. Familiar with a variety of open source IaaS platform, kubernetes cpu limit 0 such as Cloudstack,openstack, familiar with various resource management and scheduling framework, such as kubernetes,mesos,yarn,borg.

Cpu limit kubernetes

After six months of continuous optimization, Kubernetes announced that the 1.2 version has been able to support the 1000+ node cluster and has excellent responsiveness, which is a significant improvement for kubernetes. With the enlargement of the scale of kubernetes cluster, the Kubernetes scheduler, as the brain of cluster, becomes more and more important in how to improve the resource utilization of the cluster and ensure the stable operation of the service in the cluster. This paper will introduce the Kubernetes scheduler in detail from the scheduling process, scheduling algorithm and resource constraints. 

Cpu requests and limits kubernetes

Kubernetes cpu memory limits

Kubernets Scheduler is primarily responsible for accepting the new pod created by API server and arranging a host for it to write information to the ETCD. Of course, the things to deal with in this process is far from simple, need to consider a variety of decision factors, such as the same replication controller pod allocated to different hosts, to prevent the host node downtime on the business caused by a great impact, and how to consider resource balance, Thus, the resource utilization rate of the whole cluster is improved. Scheduling Process



The Kubernetes Scheduler uses API server to find pods that have not yet been assigned hosts, and attempts to allocate hosts for these pods, as shown in the following illustration:


Kubernetes cpu limits vs requests

The client submits the creation request, either through the API server's RESTful API or by using the Kubectl command-line tool. Supported data types include JSON and YAML. API Server handles user requests, storing pod data to ETCD. The scheduler views the unbound pod through the API server. Try to assign a host to the pod. Filter Host: The Dispatcher filters out the host that does not meet the requirements by a set of rules. For example, when the pod specifies the amount of resources required, the host that has less resources than the pod needs will be filtered out. Host Rating: The first step to screen the requirements of the host to mark, in the host scoring phase, the scheduler will consider some of the overall optimization strategy, such as the capacity of a replication controller copies distributed to different hosts, using the lowest load of the host. Select Host: Select the highest rated host, perform binding operations, and store the results in ETCD. The selected host performs pod creation operations for Kubelet based on the scheduling results. Scheduling Algorithm



Kubernetes through a set of rules for each of the unscheduled pod select a host, such as the scheduling process introduced, kubernetes scheduling algorithm mainly includes two aspects, filtering host and host scoring.



The source code of the Kubernetes Scheduler is located in the kubernetes/plugin/, and the general coding directory structure looks like this:



--kubernetes



--–plugin



———— cmd//kub-scheduler start function in cmd package



———— pkg//scheduling-related specific implementation



—————-Scheduler



—————— –algorithm



———————— predicates//Host filtering policy



———————— priorities//host scoring strategy



—————— –algorithmprovider



———————— defaults//define the default scheduler


The purpose of the filtering host is to filter out the host that does not meet the pod requirements, and now the filtering rules implemented in Kubernetes include the following (implemented in Kubernetes/plugin/pkg/scheduler/algorithm/predicates): Nodiskconflict: Check if there is a volume conflict on this host. If the host has mounted the volume, other pods that use the same volume cannot be dispatched to this host. gce,        Amazon EBS, and Ceph RBD use the following rules: GCE allows you to mount multiple volumes at the same time, as long as the volumes are read-only. Amazon EBS does not allow different pods to mount the same volume. Ceph RBD does not allow any two pods to share the same monitor,match pool and image. Novolumezoneconflict: Check for a given zone limit to check if there is a volume conflict with POD deployment on this host. Assuming that some volumes may have zone scheduling constraints, Volumezonepredicate evaluates the pod to meet the conditions according to volumes's own needs. The prerequisite is that any volumes zone-labels must exactly match the zone-labels on the node. There can be multiple zone-labels constraints on a node (for example, a hypothetical replication volume might allow for zone-wide access). Currently, this only supports persistentvolumeclaims, and only looks for tags within the Persistentvolume range. It may become more difficult to handle the volumes defined in the properties of the pod (that is, not using persistentvolume), as it is likely that the cloud provider will be called upon to determine the zone of the volume during the scheduling process. Podfitsresources: Check that the resources of the host meet the requirements of pod. Scheduling based on the amount of resources actually allocated, rather than using the amount of resources that have actually been used. Podfitshostports: Check whether the hostport required for each container in the pod has been occupied by other containers. Pod cannot be dispatched to this host if the required hostport does not meet the requirements. HostName: Check that the host name is not the HostName specified by the pod. Matchnodeselector: Check whether the host's label satisfies the pod's *nodeselector* attribute requirements. MaxebsvolumecoUNT: Ensure that the mounted EBS storage volume does not exceed the maximum set size. The default value is 39. It examines the storage volumes that are used directly, and the PVC that is used indirectly for this type of storage. To calculate the heads of different volumes, the pod cannot be dispatched to this host if the new pod is deployed and the number of volumes exceeds the maximum set. Maxgcepdvolumecount: Ensure that the mounted GCE storage volume does not exceed the maximum set size. The default value is 16. Rule Ibid.


The filtering rules supported by the Kubernetes default can be modified by configuration.



After filtering, and then to meet the requirements of the host list, the final choice of the highest-scoring host deployment pod. Kubernetes uses a set of priority functions to handle each host that is to be selected (implemented in Kubernetes/plugin/pkg/scheduler/algorithm/priorities). Each priority function returns a 0-10 score, the higher the score means the host is "good", and each function also corresponds to a value representing the weight. The score of the final host is calculated with the following formula:



Finalscorenode = (WEIGHT1 * priorityFunc1) + (WEIGHT2 * priorityFunc2) + ... + (WEIGHTN * PRIORITYFUNCN)



The priority functions that are supported now include the following: Leastrequestedpriority: If the new pod is to be assigned to a node, the priority of the node is the ratio of the node's spare portion to the total capacity (i.e. (total capacity-the capacity sum of the pod on the node-the capacity of the new pod)/ Total capacity) to decide. CPU and memory weights are equivalent, the ratio of the largest node score the highest. It is important to note that this priority function plays a role in allocating pods across nodes according to resource consumption. The calculation formula is as follows:



CPU ((Capacity–sum (requested)) * 10/capacity) + memory ((Capacity–sum (requested)) * 10/capacity)/2 BALANCEDRESOURC Eallocation: Try to choose a more balanced machine after the deployment of pod. *balancedresourceallocation* cannot be used alone, and must be used concurrently with *leastrequestedpriority*, which calculates the CPU and memory weights on the host, respectively, The host's score is determined by the CPU proportion and the "distance" of the memory proportion. The calculation formula is as follows:



Score = 10–abs (cpufraction-memoryfraction) *10 selectorspreadpriority: For pod belonging to the same service, replication controller, Try to spread it across different hosts. If you specify an area, you will try to spread the pod across different hosts in different regions. When scheduling a pod, find the service or replication controller for the pod, and then look for the pod that already exists in the service or replication controller, the less existing pod is running on the host, The higher the rating of the host. Calculateantiaffinitypriority: For pods that belong to the same service, try to spread them across different hosts with the specified label. Imagelocalitypriority: Rate based on whether the pod is operating on the host. Imagelocalitypriority will determine whether the required mirror of the pod is already on the host, and return a 0-10 rating based on the size of the existing mirror. Returns 0 if a desired mirror is not present on the host, and if a partial mirror image is present on the host, the value is determined based on the size of the mirrors, and the higher the image, the more the score is. Nodeaffinitypriority (new feature in Kubernetes1.2 Experiment): The affinity mechanism in kubernetes scheduling. Node selectors (the pod is limited to the specified node at dispatch) and supports a variety of operators (in, Notin, Exists, Doesnotexist, Gt, Lt), not limited to exact matching of node labels. In addition, Kubernetes supports two types of selectors, one is the "hard (requiredduringschedulingignoredduringexecution)" selector, which guarantees that the selected host must meet all of the pod's rule requirements for the host. This selector is more like the previous nodeselector, adding more appropriate performance syntax on the basis of nodeselector. The other is the "soft (preferresduringschedulingignoredduringexecution)" selector, which serves as a hint to the scheduler that the scheduler will try, but not guarantee, to meet all nodeselector requirements. Multi-level resource restrictions



Kubernetes contains a variety of resource constraints to control resource sharing at the container, pod, and multi-tenant levels.






Each container can be specified *spec.container[].resources.limits.cpu*,*spec.container[].resources.limits.memory*,*spec.container[] . resources.requests.cpu* and *spec.container[].resources.requests.memory*. Resource restrictions on containers are optional, and you can set default resource restriction properties by modifying the cluster configuration.



Limitrange is the pod capacity limit set at the namespace level. All pod in the corresponding namespace is subject to limitrange resource constraints.



Each of the pod resource limits is the sum of the corresponding resource type limits in the pod, in which the scheduler compares the sum value with the available capacity on the host to determine whether the host meets the resource requirements. This scheduling is static resource scheduling, even if the real load on the host is very low, as long as the capacity limit does not meet the requirements, can not deploy the pod on the host.



Resource quota is a resource constraint set at the namespace level, primarily to address the problem of multiple tenant shared resources. The CPU in the Resource quota refers to the sum of all the CPUs that can be used by all pods in the corresponding namespace, memory the sum of memory that can be used by all pods in the corresponding namespace. Conclusion



The Kubernetes Scheduler is a fully-plug-in implementation that can be easily integrated into other scheduler extensions. The current scheduler contains several levels of resource constraints, but the implementation of scheduling correlation is relatively simple, the use of some static rules to do scheduling, in order to improve the utilization of cluster resources, ensure that the service QoS effect is not obvious.



The Kubernetes Scheduler may later support the use of different schedulers to schedule different tasks, and the new features in this experiment can be useful for users with multiple types of business deployed in the Kubernetes cluster.



Kubernetes is developing the QoS to learn from Borg's mature experience, focusing on improving the utilization of cluster resources, this feature is well worth looking forward to.


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.