In a KVM virtual machine, how does one perform GPU computing ?, Kvm Virtual Machine gpu computing
We know that CUDA is a general parallel computing architecture launched by NVIDIA, which enables complex parallel computing on the GPU. In some scenarios, you must use virtual machines for resource isolation and physical GPUs for large-scale parallel computing. This article carries out related practices: the NVIDIA graphics card is transparently transmitted to the virtual machine, and then the CUDA platform is used for GPU operations.
Graphics card model: NVIDIA Tesla P4
View the graphics card on the physical Host:
# Lspci | grep NVIDIA
81: 00.0 3D controller: NVIDIA Corporation Device 1bb3 (rev a1)
#
Remove the pci video card from the Host:
# Virsh nodedev-list
Pci_0000_81_00_0
# Virsh nodedev-dettachpci_0000_81_00_0
The VM directly specifies the pci video card:
......
Check whether a video card exists in the VM:
# Lspci | grep NVIDIA
. 0 3D controller: NVIDIA Corporation Device 1bb3 (rev a1)
#
Virtual Machine preparation environment:
Ubuntu16.04
# Apt-get install gcc
# Apt-get install linux-headers-$ (uname-r)
CUDA Toolkit 9.1 Download in the virtual machine:
CUDA Toolkit Install in the virtual machine:
# Dpkg-I cuda-repo-ubuntu1604-9-1-local_9.1.85-1_amd64.deb
# Apt-key add/var/cuda-repo-9-1-local/7fa2af80. pub
# Apt-get update
# Apt-get install cuda
Sample GPU operation code:
// Add. cu
# Include
# Include
// Kernel function to add the elements of two arrays
_ Global __
Void add (int n, float * x, float * y)
{
For (int I = 0; I <n; I ++)
Y [I] = x [I] + y [I];
}
Int main (void)
{
Int N = 1 <20;
Float * x, * y;
// Allocate Unified Memory-accessible from CPU or GPU
CudaMallocManaged (& x, N * sizeof (float ));
CudaMallocManaged (& y, N * sizeof (float ));
// Initialize x and y arrays on the host
For (int I = 0; I <N; I ++ ){
X [I] = 1.0f;
Y [I] = 2.0f;
}
// Run kernel on 1 M elements on the GPU
Add <1, 1 >>> (N, x, y );
// Wait for GPU to finish before accessing on host
CudaDeviceSynchronize ();
// Check for errors (all values shoshould be 3.0f)
Float maxError = 0.0f;
For (int I = 0; I <N; I ++)
MaxError = fmax (maxError, fabs (y [I]-3.0f ));
Std: cout <"Max error:" <maxError <std: endl;
// Free memory
CudaFree (x );
CudaFree (y );
Return 0;
}
Compile and run in the VM:
#/Usr/local/cuda-9.1/bin/nvcc add. cu-o add_cuda
#./Add_cuda
#/Usr/local/cuda-9.1/bin/nvprof./add_cuda
Running result:
The calculation result shows that the program we run inside the virtual machine is indeed executed on Tesla P4. Then we can run the deep learning algorithm inside the virtual machine.