The title is a bit of a detour. What I want to say is the global __device__ unsigned char data[64] defined when using Cuda;
When a class of algorithms is parallelized, it is found that it is not necessary to copy data to the GPU every time the loop is initialized, but to copy the data to the GPU at initialization time, so we define a global __device__ variable, and all the calculations are just to save the results of the calculation to data[], but the problem comes. When the calculation is complete, the value in data cannot be copied back from the GPU, the cudamemcpy function return value is 11, the effect is pitch out of bounds. Think about it might be the cudamemcpy function to first use & take the first address, this time if the function copy offset 64, it may not be 64 bytes, which is the difference with unsigned char* data, the latter one per offset is a byte.
The above are their own conjecture, written out for their own encounter similar problems no longer helpless, if someone encountered the same problem can be exchanged.
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.