CUDA 4, CUDA

Source: Internet
Author: User

CUDA 4, CUDA
Device management

NVIDIA provides centralized query and Management of GPU devices. It is very important to know GPU information query, because it can help you set the kernel execution configuration.

This blog will mainly introduce the following two aspects:

  • CUDA runtime API function
  • NVIDIA system management command line
Use the runtime API to query GPU Information

You can use the following function to query all GPU device information:

CudaError_t cudaGetDeviceProperties (cudaDeviceProp * prop, int device );

GPU information is stored in the cudaDeviceProp struct.

Code
#include <cuda_runtime.h>
#include <stdio.h>
int main(int argc, char **argv) {
  printf("%s Starting...\n", argv[0]); int deviceCount = 0; cudaError_t error_id = cudaGetDeviceCount(&deviceCount); if (error_id != cudaSuccess) { printf("cudaGetDeviceCount returned %d\n-> %s\n", (int)error_id, cudaGetErrorString(error_id)); printf("Result = FAIL\n"); exit(EXIT_FAILURE); } if (deviceCount == 0) { printf("There are no available device(s) that support CUDA\n"); } else { printf("Detected %d CUDA Capable device(s)\n", deviceCount); }
int dev, driverVersion = 0, runtimeVersion = 0; dev =0; cudaSetDevice(dev); cudaDeviceProp deviceProp; cudaGetDeviceProperties(&deviceProp, dev); printf("Device %d: \"%s\"\n", dev, deviceProp.name); cudaDriverGetVersion(&driverVersion); cudaRuntimeGetVersion(&runtimeVersion); printf(" CUDA Driver Version / Runtime Version %d.%d / %d.%d\n",driverVersion/1000, (driverVersion%100)/10,runtimeVersion/1000, (runtimeVersion%100)/10); printf(" CUDA Capability Major/Minor version number: %d.%d\n",deviceProp.major, deviceProp.minor); printf(" Total amount of global memory: %.2f MBytes (%llu bytes)\n",(float)deviceProp.totalGlobalMem/(pow(1024.0,3)),(unsigned long long) deviceProp.totalGlobalMem); printf(" GPU Clock rate: %.0f MHz (%0.2f GHz)\n",deviceProp.clockRate * 1e-3f, deviceProp.clockRate * 1e-6f); printf(" Memory Clock rate: %.0f Mhz\n",deviceProp.memoryClockRate * 1e-3f); printf(" Memory Bus Width: %d-bit\n",deviceProp.memoryBusWidth); if (deviceProp.l2CacheSize) { printf(" L2 Cache Size: %d bytes\n", deviceProp.l2CacheSize); }
printf(" Max Texture Dimension Size (x,y,z) 1D=(%d), 2D=(%d,%d), 3D=(%d,%d,%d)\n", deviceProp.maxTexture1D , deviceProp.maxTexture2D[0], deviceProp.maxTexture2D[1], deviceProp.maxTexture3D[0], deviceProp.maxTexture3D[1], deviceProp.maxTexture3D[2]);
printf(" Max Layered Texture Size (dim) x layers 1D=(%d) x %d, 2D=(%d,%d) x %d\n", deviceProp.maxTexture1DLayered[0], deviceProp.maxTexture1DLayered[1], deviceProp.maxTexture2DLayered[0], deviceProp.maxTexture2DLayered[1], deviceProp.maxTexture2DLayered[2]);
printf(" Total amount of constant memory: %lu bytes\n",deviceProp.totalConstMem); printf(" Total amount of shared memory per block: %lu bytes\n",deviceProp.sharedMemPerBlock); printf(" Total number of registers available per block: %d\n",deviceProp.regsPerBlock); printf(" Warp size: %d\n", deviceProp.warpSize); printf(" Maximum number of threads per multiprocessor: %d\n",deviceProp.maxThreadsPerMultiProcessor); printf(" Maximum number of threads per block: %d\n",deviceProp.maxThreadsPerBlock);
printf(" Maximum sizes of each dimension of a block: %d x %d x %d\n", deviceProp.maxThreadsDim[0], deviceProp.maxThreadsDim[1], deviceProp.maxThreadsDim[2]);
printf(" Maximum sizes of each dimension of a grid: %d x %d x %d\n", deviceProp.maxGridSize[0], deviceProp.maxGridSize[1], deviceProp.maxGridSize[2]);
printf(" Maximum memory pitch: %lu bytes\n", deviceProp.memPitch);
exit(EXIT_SUCCESS);}

 

Compile and run:

$ nvcc checkDeviceInfor.cu -o checkDeviceInfor$ ./checkDeviceInfor

Output:

./checkDeviceInfor Starting...Detected 2 CUDA Capable device(s)Device 0: "Tesla M2070"CUDA Driver Version / Runtime Version 5.5 / 5.5CUDA Capability Major/Minor version number: 2.0Total amount of global memory: 5.25 MBytes (5636554752 bytes)GPU Clock rate: 1147 MHz (1.15 GHz)Memory Clock rate: 1566 MhzMemory Bus Width: 384-bitL2 Cache Size: 786432 bytesMax Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536,65535), 3D=(2048,2048,2048)Max Layered Texture Size (dim) x layers 1D=(16384) x 2048, 2D=(16384,16384) x 2048Total amount of constant memory: 65536 bytesTotal amount of shared memory per block: 49152 bytesTotal number of registers available per block: 32768Warp size: 32Maximum number of threads per multiprocessor: 1536Maximum number of threads per block: 1024Maximum sizes of each dimension of a block: 1024 x 1024 x 64Maximum sizes of each dimension of a grid: 65535 x 65535 x 65535Maximum memory pitch: 2147483647 bytes
Determine the best GPU

For systems that support multiple GPUs, we need to choose one of them as our device. One way to choose the best computing performance GPU is determined by the number of processors it has, you can use the following code to select the best GPU.

int numDevices = 0;cudaGetDeviceCount(&numDevices);if (numDevices > 1) {    int maxMultiprocessors = 0, maxDevice = 0;    for (int device=0; device<numDevices; device++) {        cudaDeviceProp props;        cudaGetDeviceProperties(&props, device);        if (maxMultiprocessors < props.multiProcessorCount) {            maxMultiprocessors = props.multiProcessorCount;            maxDevice = device;        }    }    cudaSetDevice(maxDevice);}    

Use nvidia-smi to query GPU Information

Nvidia-smi is a command line tool that helps you manage and operate GPU devices and allows you to query and change device statuses.

Nvidia-smi is very useful. For example, the following command:

$ nvidia-smi -LGPU 0: Tesla M2070 (UUID: GPU-68df8aec-e85c-9934-2b81-0c9e689a43a7)GPU 1: Tesla M2070 (UUID: GPU-382f23c1-5160-01e2-3291-ff9628930b70)

Then you can use the following command to query detailed GPU 0 information:

$nvidia-smi –q –i 0

The following are some parameters of the command to streamline the display information of nvidia-smi:

MEMORY

UTILIZATION

ECC

TEMPERATURE

POWER

CLOCK

COMPUTE

PIDS

PERFORMANCE

SUPPORTED_CLOCKS

PAGE_RETIREMENT

ACCOUNTING

For example, display only device memory information:

$nvidia-smi –q –i 0 –d    MEMORY | tail –n 5Memory UsageTotal : 5375 MBUsed : 9 MBFree : 5366 MB
Set device

For multi-GPU systems, you can use nvidia-smi to view GPU attributes. Each GPU is marked from 0 and the environment variable CUDA_VISIBLE_DEVICES can be used to specify the GPU without modifying the application.

You can set the environment variable CUDA_VISIBLE_DEVICES-2 to shield other GPUs so that only GPU2 can be used. Of course you can also use CUDA_VISIBLE_DEVICES-2, 3 to set multiple GPUs, their device IDs are 0 and 1 respectively.

 

Download Code: CodeSamples.zip

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.