GPU and CPU time-consuming statistics methods

Source: Internet
Author: User

GPU-side time-consuming statistics

1 cudaevent_t start, stop;2Checkcudaerrors (Cudaeventcreate (&start));3Checkcudaerrors (Cudaeventcreate (&stop));4 checkcudaerrors (Cudadevicesynchronize ());5 6     floatGpu_time =0.0f;7Cudaeventrecord (Start,0);//operation Complete event is logged in Cuda context8     //allocating device-side memory9     float*D_idata;TenCheckcudaerrors (Cudamalloc (void* *) &D_idata, mem_size)); One      A     //Copy host-side data to device-side memory - checkcudaerrors (cudamemcpy (D_idata, H_idata, Mem_size, Cudamemcpyhosttodevice)); -  the     //device side allocates memory for results -     float*D_odata; -Checkcudaerrors (Cudamalloc (void* *) &D_odata, mem_size)); -  +     //Setting Execution Parameters -DIM3 Grid (1,1,1); +DIM3 Threads (Num_threads,1,1); A  at     //execution Kernel, parameter meaning: grid is the latitude of the grid, threads is the latitude of the block, Mem_size can dynamically allocate the maximum amount of shared memory -testkernel<<< grid, threads, Mem_size >>>(D_idata, d_odata); -  -     //check kernel execution status -Getlastcudaerror ("Kernel Execution failed"); -  in     //allocating memory to the results on the host side -     float*h_odata = (float*) malloc (mem_size); to     //copy results from device side to host side +Checkcudaerrors (cudamemcpy (H_odata, D_odata,sizeof(float) *Num_threads, - cudamemcpydevicetohost)); the  *Cudaeventrecord (Stop,0); $UnsignedLong intCounter =0;Panax Notoginseng      while(Cudaeventquery (stop) = =Cudaerrornotready) -     { thecounter++; +     } ACheckcudaerrors (Cudaeventelapsedtime (&Gpu_time, start, stop)); theprintf"GPU Execution Time:%.2f (ms) \ n", gpu_time); +printf"CPU executed%lu iterations while waiting for GPU to finish\n", counter);

CPU time-consuming statistics

1Stopwatchinterface *timer =0;2Sdkcreatetimer (&timer);3Sdkresettimer (&timer);4 5Sdkstarttimer (&timer);6     //Calculation Reference Scheme7     float*reference = (float*) malloc (mem_size);8 computegold (Reference, H_idata, num_threads);9Sdkstoptimer (&timer);Tenprintf"Serial Time:%f (ms) \ n", Sdkgettimervalue (&timer));

GPU and CPU time-consuming statistics methods

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.