How to use event for program timing in Cuda

Source: Internet
Author: User
Tags current time

GPGPU is a nuclear equipment, including a large number of computing units, to achieve ultra-high speed parallelism.

When you use Cuda to program on the NVIDIA graphics card, you can use the event provided by Cuda to do the timer.

Of course, each programming language basically provides a function to get the system time, such as the C/c++/java program timer function

An event can be used to count the exact elapsed time of a task or code segment above the GPU.

As in the following program instance (CALTIME.CU):

#include <stdio.h> #include <cuda_runtime.h>//__global__ declared function, tell the compiler that this code is called by the CPU, __global__ void Mu is executed by the GPU
    L (int *dev_a,const int NUM) {int idx = blockidx.x * blockdim.x + threadidx.x;
    int dis=blockdim.x * GRIDDIM.X;
        while (Idx<num) {dev_a[idx]=dev_a[idx]%23*dev_a[idx]*5%9;
    Idx+=dis;    int main (void) {const int thread_pre_block = 64;    Number of threads per block const int Block_pre_grid = 8;
    
    Number of blocks in grid const int NUM = 45056;
    Request host memory, and initialize int host_a[num];
    
    for (int i=0;i<num;i++) host_a[i]=i;
    
    Define Cudaerror, default to Cudasuccess (0) cudaerror_t err = cudasuccess;
    Application for GPU storage space int *dev_a;
    Err=cudamalloc (void * * *) &dev_a, sizeof (int) *num);
        if (err!=cudasuccess) {perror ("The Cudamalloc on a GPU is failed");
    return 1; ///data to be computed using cudamemcpy transfer to GPU//Back to Column page: http://www.bianceng.cnhttp://www.bianceng.cn/Programming/extra/cudaMe mcpy (dEv_a,host_a,sizeof (host_a), cudamemcpyhosttodevice);
    DIM3 threads = dim3 (Thread_pre_block);
    
    DIM3 blocks = dim3 (Block_pre_grid);
    Use event to calculate time float time_elapsed=0;
    cudaevent_t Start,stop;    Cudaeventcreate (&start);
    
    Create Event Cudaeventcreate (&stop);    Cudaeventrecord (start,0);
    Record current time mul<<<blocks, threads, 0, 0>>> (dev_a,num);    Cudaeventrecord (stop,0);    Record the current time cudaeventsynchronize (start);
    Waits for a event to complete.    Cudaeventsynchronize (stop); Waits for a event to complete.    The task before the record Cudaeventelapsedtime (&time_elapsed,start,stop);    Calculate Time Difference cudamemcpy (&host_a,dev_a,sizeof (host_a), cudamemcpydevicetohost);    The results are transferred back to the CPU Cudaeventdestroy (start);
    Destory the event Cudaeventdestroy (stop);
    Cudafree (dev_a)//Free GPU memory printf ("Execution Time:%f (ms) \ n", time_elapsed);
return 0; }

To compile the execution code:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.