GPU-side time-consuming statistics
1 cudaevent_t start, stop;2Checkcudaerrors (Cudaeventcreate (&start));3Checkcudaerrors (Cudaeventcreate (&stop));4 checkcudaerrors (Cudadevicesynchronize ());5 6 floatGpu_time =0.0f;7Cudaeventrecord (Start,0);//operation Complete event is logged in Cuda context8 //allocating device-side memory9 float*D_idata;TenCheckcudaerrors (Cudamalloc (void* *) &D_idata, mem_size)); One A //Copy host-side data to device-side memory - checkcudaerrors (cudamemcpy (D_idata, H_idata, Mem_size, Cudamemcpyhosttodevice)); - the //device side allocates memory for results - float*D_odata; -Checkcudaerrors (Cudamalloc (void* *) &D_odata, mem_size)); - + //Setting Execution Parameters -DIM3 Grid (1,1,1); +DIM3 Threads (Num_threads,1,1); A at //execution Kernel, parameter meaning: grid is the latitude of the grid, threads is the latitude of the block, Mem_size can dynamically allocate the maximum amount of shared memory -testkernel<<< grid, threads, Mem_size >>>(D_idata, d_odata); - - //check kernel execution status -Getlastcudaerror ("Kernel Execution failed"); - in //allocating memory to the results on the host side - float*h_odata = (float*) malloc (mem_size); to //copy results from device side to host side +Checkcudaerrors (cudamemcpy (H_odata, D_odata,sizeof(float) *Num_threads, - cudamemcpydevicetohost)); the *Cudaeventrecord (Stop,0); $UnsignedLong intCounter =0;Panax Notoginseng while(Cudaeventquery (stop) = =Cudaerrornotready) - { thecounter++; + } ACheckcudaerrors (Cudaeventelapsedtime (&Gpu_time, start, stop)); theprintf"GPU Execution Time:%.2f (ms) \ n", gpu_time); +printf"CPU executed%lu iterations while waiting for GPU to finish\n", counter);
CPU time-consuming statistics
1Stopwatchinterface *timer =0;2Sdkcreatetimer (&timer);3Sdkresettimer (&timer);4 5Sdkstarttimer (&timer);6 //Calculation Reference Scheme7 float*reference = (float*) malloc (mem_size);8 computegold (Reference, H_idata, num_threads);9Sdkstoptimer (&timer);Tenprintf"Serial Time:%f (ms) \ n", Sdkgettimervalue (&timer));
GPU and CPU time-consuming statistics methods