GPGPU is a nuclear equipment, including a large number of computing units, to achieve ultra-high speed parallelism.
When you use Cuda to program on the NVIDIA graphics card, you can use the event provided by Cuda to do the timer.
Of course, each programming language basically provides a function to get the system time, such as the C/c++/java program timer function
An event can be used to count the exact elapsed time of a task or code segment above the GPU.
As in the following program instance (CALTIME.CU):
#include <stdio.h> #include <cuda_runtime.h>//__global__ declared function, tell the compiler that this code is called by the CPU, __global__ void Mu is executed by the GPU
L (int *dev_a,const int NUM) {int idx = blockidx.x * blockdim.x + threadidx.x;
int dis=blockdim.x * GRIDDIM.X;
while (Idx<num) {dev_a[idx]=dev_a[idx]%23*dev_a[idx]*5%9;
Idx+=dis; int main (void) {const int thread_pre_block = 64; Number of threads per block const int Block_pre_grid = 8;
Number of blocks in grid const int NUM = 45056;
Request host memory, and initialize int host_a[num];
for (int i=0;i<num;i++) host_a[i]=i;
Define Cudaerror, default to Cudasuccess (0) cudaerror_t err = cudasuccess;
Application for GPU storage space int *dev_a;
Err=cudamalloc (void * * *) &dev_a, sizeof (int) *num);
if (err!=cudasuccess) {perror ("The Cudamalloc on a GPU is failed");
return 1; ///data to be computed using cudamemcpy transfer to GPU//Back to Column page: http://www.bianceng.cnhttp://www.bianceng.cn/Programming/extra/cudaMe mcpy (dEv_a,host_a,sizeof (host_a), cudamemcpyhosttodevice);
DIM3 threads = dim3 (Thread_pre_block);
DIM3 blocks = dim3 (Block_pre_grid);
Use event to calculate time float time_elapsed=0;
cudaevent_t Start,stop; Cudaeventcreate (&start);
Create Event Cudaeventcreate (&stop); Cudaeventrecord (start,0);
Record current time mul<<<blocks, threads, 0, 0>>> (dev_a,num); Cudaeventrecord (stop,0); Record the current time cudaeventsynchronize (start);
Waits for a event to complete. Cudaeventsynchronize (stop); Waits for a event to complete. The task before the record Cudaeventelapsedtime (&time_elapsed,start,stop); Calculate Time Difference cudamemcpy (&host_a,dev_a,sizeof (host_a), cudamemcpydevicetohost); The results are transferred back to the CPU Cudaeventdestroy (start);
Destory the event Cudaeventdestroy (stop);
Cudafree (dev_a)//Free GPU memory printf ("Execution Time:%f (ms) \ n", time_elapsed);
return 0; }
To compile the execution code: