CUDA 4, CUDA
Device management
NVIDIA provides centralized query and Management of GPU devices. It is very important to know GPU information query, because it can help you set the kernel execution configuration.
This blog will mainly introduce the following two aspects:
- CUDA runtime API function
- NVIDIA system management command line
Use the runtime API to query GPU Information
You can use the following function to query all GPU device information:
CudaError_t cudaGetDeviceProperties (cudaDeviceProp * prop, int device );
GPU information is stored in the cudaDeviceProp struct.
Code
#include <cuda_runtime.h>
#include <stdio.h>
int main(int argc, char **argv) {
printf("%s Starting...\n", argv[0]); int deviceCount = 0; cudaError_t error_id = cudaGetDeviceCount(&deviceCount); if (error_id != cudaSuccess) { printf("cudaGetDeviceCount returned %d\n-> %s\n", (int)error_id, cudaGetErrorString(error_id)); printf("Result = FAIL\n"); exit(EXIT_FAILURE); } if (deviceCount == 0) { printf("There are no available device(s) that support CUDA\n"); } else { printf("Detected %d CUDA Capable device(s)\n", deviceCount); }
int dev, driverVersion = 0, runtimeVersion = 0; dev =0; cudaSetDevice(dev); cudaDeviceProp deviceProp; cudaGetDeviceProperties(&deviceProp, dev); printf("Device %d: \"%s\"\n", dev, deviceProp.name); cudaDriverGetVersion(&driverVersion); cudaRuntimeGetVersion(&runtimeVersion); printf(" CUDA Driver Version / Runtime Version %d.%d / %d.%d\n",driverVersion/1000, (driverVersion%100)/10,runtimeVersion/1000, (runtimeVersion%100)/10); printf(" CUDA Capability Major/Minor version number: %d.%d\n",deviceProp.major, deviceProp.minor); printf(" Total amount of global memory: %.2f MBytes (%llu bytes)\n",(float)deviceProp.totalGlobalMem/(pow(1024.0,3)),(unsigned long long) deviceProp.totalGlobalMem); printf(" GPU Clock rate: %.0f MHz (%0.2f GHz)\n",deviceProp.clockRate * 1e-3f, deviceProp.clockRate * 1e-6f); printf(" Memory Clock rate: %.0f Mhz\n",deviceProp.memoryClockRate * 1e-3f); printf(" Memory Bus Width: %d-bit\n",deviceProp.memoryBusWidth); if (deviceProp.l2CacheSize) { printf(" L2 Cache Size: %d bytes\n", deviceProp.l2CacheSize); }
printf(" Max Texture Dimension Size (x,y,z) 1D=(%d), 2D=(%d,%d), 3D=(%d,%d,%d)\n", deviceProp.maxTexture1D , deviceProp.maxTexture2D[0], deviceProp.maxTexture2D[1], deviceProp.maxTexture3D[0], deviceProp.maxTexture3D[1], deviceProp.maxTexture3D[2]);
printf(" Max Layered Texture Size (dim) x layers 1D=(%d) x %d, 2D=(%d,%d) x %d\n", deviceProp.maxTexture1DLayered[0], deviceProp.maxTexture1DLayered[1], deviceProp.maxTexture2DLayered[0], deviceProp.maxTexture2DLayered[1], deviceProp.maxTexture2DLayered[2]);
printf(" Total amount of constant memory: %lu bytes\n",deviceProp.totalConstMem); printf(" Total amount of shared memory per block: %lu bytes\n",deviceProp.sharedMemPerBlock); printf(" Total number of registers available per block: %d\n",deviceProp.regsPerBlock); printf(" Warp size: %d\n", deviceProp.warpSize); printf(" Maximum number of threads per multiprocessor: %d\n",deviceProp.maxThreadsPerMultiProcessor); printf(" Maximum number of threads per block: %d\n",deviceProp.maxThreadsPerBlock);
printf(" Maximum sizes of each dimension of a block: %d x %d x %d\n", deviceProp.maxThreadsDim[0], deviceProp.maxThreadsDim[1], deviceProp.maxThreadsDim[2]);
printf(" Maximum sizes of each dimension of a grid: %d x %d x %d\n", deviceProp.maxGridSize[0], deviceProp.maxGridSize[1], deviceProp.maxGridSize[2]);
printf(" Maximum memory pitch: %lu bytes\n", deviceProp.memPitch);
exit(EXIT_SUCCESS);}
Compile and run:
$ nvcc checkDeviceInfor.cu -o checkDeviceInfor$ ./checkDeviceInfor
Output:
./checkDeviceInfor Starting...Detected 2 CUDA Capable device(s)Device 0: "Tesla M2070"CUDA Driver Version / Runtime Version 5.5 / 5.5CUDA Capability Major/Minor version number: 2.0Total amount of global memory: 5.25 MBytes (5636554752 bytes)GPU Clock rate: 1147 MHz (1.15 GHz)Memory Clock rate: 1566 MhzMemory Bus Width: 384-bitL2 Cache Size: 786432 bytesMax Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65535), 3D=(2048,2048,2048)Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048Total amount of constant memory: 65536 bytesTotal amount of shared memory per block: 49152 bytesTotal number of registers available per block: 32768Warp size: 32Maximum number of threads per multiprocessor: 1536Maximum number of threads per block: 1024Maximum sizes of each dimension of a block: 1024 x 1024 x 64Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535Maximum memory pitch: 2147483647 bytes
Determine the best GPU
For systems that support multiple GPUs, we need to choose one of them as our device. One way to choose the best computing performance GPU is determined by the number of processors it has, you can use the following code to select the best GPU.
int numDevices = 0;cudaGetDeviceCount(&numDevices);if (numDevices > 1) { int maxMultiprocessors = 0, maxDevice = 0; for (int device=0; device<numDevices; device++) { cudaDeviceProp props; cudaGetDeviceProperties(&props, device); if (maxMultiprocessors < props.multiProcessorCount) { maxMultiprocessors = props.multiProcessorCount; maxDevice = device; } } cudaSetDevice(maxDevice);}
Use nvidia-smi to query GPU Information
Nvidia-smi is a command line tool that helps you manage and operate GPU devices and allows you to query and change device statuses.
Nvidia-smi is very useful. For example, the following command:
$ nvidia-smi -LGPU 0: Tesla M2070 (UUID: GPU-68df8aec-e85c-9934-2b81-0c9e689a43a7)GPU 1: Tesla M2070 (UUID: GPU-382f23c1-5160-01e2-3291-ff9628930b70)
Then you can use the following command to query detailed GPU 0 information:
$nvidia-smi –q –i 0
The following are some parameters of the command to streamline the display information of nvidia-smi:
MEMORY
UTILIZATION
ECC
TEMPERATURE
POWER
CLOCK
COMPUTE
PIDS
PERFORMANCE
SUPPORTED_CLOCKS
PAGE_RETIREMENT
ACCOUNTING
For example, display only device memory information:
$nvidia-smi –q –i 0 –d MEMORY | tail –n 5Memory UsageTotal : 5375 MBUsed : 9 MBFree : 5366 MB
Set device
For multi-GPU systems, you can use nvidia-smi to view GPU attributes. Each GPU is marked from 0 and the environment variable CUDA_VISIBLE_DEVICES can be used to specify the GPU without modifying the application.
You can set the environment variable CUDA_VISIBLE_DEVICES-2 to shield other GPUs so that only GPU2 can be used. Of course you can also use CUDA_VISIBLE_DEVICES-2, 3 to set multiple GPUs, their device IDs are 0 and 1 respectively.
Download Code: CodeSamples.zip