Kubernetes1.6 new features: Full support for multiple GPUs

Source: Internet
Author: User

(i) Background information

The GPU is the graphics processor and is the abbreviation for the graphics processing unit. The image displayed on the computer monitor, before the display on the display, to go through some column processing, the process has a proprietary noun called "rendering", the previous computer is no GPU, all through the CPU for "rendering" processing, which involves "rendering" the calculation work is very time-consuming, Consumes most of the CPU time. Then came the GPU, specifically designed to "render" this calculation, to free the CPU, the GPU is designed to perform complex mathematical and geometric calculations, which are required for "rendering".

Below the Baidu encyclopedia on the CPU with the GPU comparison diagram, wherein the green is the calculation unit:


It can be seen that the GPU has a large number of computational units, so the GPU is designed specifically for "rendering" This kind of computing work.

(ii) Areas of application

The first GPU-related applications are simply stuck in graphics-related applications, such as in-game 3D graphics "rendering" and other image processing applications, now the application of the GPU is very extensive, in games, entertainment, research, medical, Internet and other areas involving large-scale computing, there are GPU applications exist, such as high-performance computing applications, machine learning applications, AI applications, autonomous driving applications, virtual reality applications, natural language processing applications, and so on.

1. Take a look at the analysis results from the deep learning areas that Nvidia offers using GPUs:


It can be seen from 2013 to 2015 in the field of deep learning shows a trend of explosive growth.

2. See the information provided by NVIDIA below:


With the use of GPUs for deep learning applications, the efficiency of autonomous driving, medical diagnostics, and machine learning is significantly improved.

(iii) GPU-enabled implementations in Kubernetes 1.3

Support for the NVIDIA-branded GPU is provided in kubernetes1.3, and an alpha feature for the Nvidia-branded GPU is added to each node in the Kubernetes managed cluster by extending the original capacity and allocatable variables: alpha.ku Bernetes.io/nvidia-gpu. Where the capacity variable represents the actual resource capacity in each node, including CPU, memory, storage, ALPHA.KUBERNETES.IO/NVIDIA-GPU, and the allocatable variable represents the allocated resource capacity per node, Also includes CPU, memory, storage, ALPHA.KUBERNETES.IO/NVIDIA-GPU.


When you start Kubelet, add the--experimental-nvidia-gpu with the GPU to the kubernetes to manage it by adding the parameter. This parameter experimental-nvidia-gpu is used to tell Kubelet the number of Nvidia-branded GPUs in this node, and if 0 means no Nvidia-branded GPU, if this parameter is not added, Then the system defaults to no NVIDIA branded GPU on this node.

When multiple NVIDIA-branded GPUs are installed on a node, the parameter Experimental-nvidia-gpu can enter values greater than 1, But for kubernetes1.3 this version, the GPU is still an alpha feature, in the code parameter EXPERIMENTAL-NVIDIA-GPU actually only supports two values, respectively 0 and 1, we can see through the following code:


When you run Docker, you need to map the devices on the node to Docker, which is telling Docker to map only the first NVIDIA-branded GPU. As can be seen in the above code, in the kubernetes1.3, the GPU this α feature, parameter Experimental-nvidia-gpu actually only support two values, respectively, 0 and 1. As can be seen from the above code, why only NVIDIA-branded GPUs are supported in kubernetes1.3, and for different brands of GPUs, there are different device paths mapped to the Linux operating system and need to be implemented separately for different GPU brands.

(iv) GPU-enabled implementations in Kubernetes 1.6

More comprehensive support for the NVIDIA-branded GPU is available in kubernetes1.6, preserving the alpha features of the kubernetes1.3 for Nvidia-branded GPUs: Alpha.kubernetes.io/nvidia-gpu, However, when the Kubelet is started, the parameter--experimental-nvidia-gpu is removed and changed to start the Alpha feature by configuring accelerators to True, and the full start parameter is--feature-gates= " Accelerators=true ".

Only one nvidiagpu on a node can be exploited in kubernetes1.3, but all Nvidia GPUs on the node are automatically identified and dispatched in kubernetes1.6.


As you can see from the code above, all NVIDIAGPU devices in the node can be obtained in 1.6.

The following is the NVIDIA GPU-related fabric that was added to Kubelet in 1.6:


In the Nvidiagpumanager structure, the Allgpus variable represents all the GPU information on this node, and the allocated variable represents the GPU information that has been allocated on this node, and the allocated variable is a podgpus struct variable. Used to denote the relationship between pod and the GPU used, and the dockerclient variable is a docker interface variable that represents all the pods that use the GPU's docker;activepodslister variable to represent all the active states on that node, and through this variable, You can release GPU resources that are already bound by the signaled pod.

The NVIDIA GPU feature in Kubernetes only takes effect when the container is Docker, and if the container is using RKT, it cannot be used on the NVIDIA GPU.

In 1.6 You can refer to the following example using the Nvidia GPU:


can see that When using the GPU in 1.6, there is no way to share the GPU between different docker, that is to say, each Docker will monopolize the entire GPU, and the Nvidiagpu type above all the nodes in the Kubernetes cluster will be the same, if there are nvid on different nodes in a cluster Unlike the IA GPU type, you also need to configure node tags and node selectors for the scheduler to differentiate between nodes of different nvidia GPU types.

At node startup, you can indicate the NVIDIA GPU type and pass it as a node label to Kubelet as follows:


In use, you can refer to the following examples:


In this example, a node affinity rule is used to ensure that the pod can only use a node with a GPU type of "TeslaK80" or "Tesla P100".

If Cuda (Compute unifieddevice Architecture is already installed on the node, it is the computing platform that the graphics manufacturer Nvidia launched. CUDA? is a general-purpose parallel computing architecture introduced by NVIDIA that enables the GPU to solve complex computational problems. It includes the CUDA instruction set architecture and the parallel computing engine within the GPU, so pods can access the Cuda library via the Hostpath volume plugin:


(v) Future prospects

This alpha feature will be perfected in the future, making the GPU a part of kubernetes computing resources and improving the convenience of using GPU resources, as well as allowing Kubernetes to automatically ensure that applications using GPUs can achieve optimal performance.

With the hot machine learning, in order to support a variety of GPU-based machine learning computing platform, I believe that kubernetes will continue to improve the GPU processing, and gradually become the bottom of machine learning Orchestration architecture.

Kubernetes1.6 new features: Full support for multiple GPUs

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.