Cudadevicesynchronize vs Cudathreadsynchronize vs Cudastreamsynchronize

Source: Internet
Author: User

Cudadevicesynchronize vs Cudathreadsynchronize vs Cudastreamsynchronize

Let's start by explaining these three functions:

Cudadevicesynchronize () blocks the execution of the current program until all tasks have been processed (the task here is to mean that all the threads have executed the kernel function).

Cudathreadsynchronize () function and cudadevicesynchronize () basically the same, this function in the new version of Cuda has been "deprecated", not recommended, if the program really need to do synchronous operation, It is recommended to use Cudadevicesynchronize ().

Cudastreamsynchronize () is similar to the above two functions, this function takes a parameter, the Cuda stream ID, which only blocks those cuda routines whose Cuda stream ID equals the specified ID in the parameter, and executes asynchronously for routines that have unequal flow IDs.

As explained in Cuda's official documentation, the Cuda kernel function is executed asynchronously, that is, the kernel function switches control to CPU,CPU immediately after the call and executes it down. Based on this explanation, we are writing cuda programs, such as:

kernel1<<<X,Y>>>(...); kernel2<<<X,Y>>>(...); cudaMemcpy(...);

Should I add a synchronous statement after each kernel function? or, when should we add a synchronous statement?

In fact, although the kernel function is executed asynchronously, it is executed sequentially for all cuda routines within the same stream, as described in the official documentation. In our Cuda routines, if you do not specify a stream ID, the stream ID takes the default value, that is, the code above, without the synchronous statement is also possible. Of course, if you are still not at ease, plus the synchronous statement does not matter, it is possible to spend a little more time to synchronize.

However, when there is more than one stream in our program, and there is a need for communication between the streams at some point, it is necessary to add a synchronous statement at that points, that is, cudadevicesynchronize ().

It is also stated that the CUDAMEMCPY function is synchronous to the host, but the Cudamemcpyasyn is asynchronous, so the use of Cudamemcpyasyn () is followed by a cudadevicesynchronize (), with a single bar cudamemcpy (), the effect is the same.

Cudadevicesynchronize vs Cudathreadsynchronize vs Cudastreamsynchronize

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.