Query of Cuda zero-memory copy

Last Update:2014-09-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Today, I thought about the Cuda zero-memory copy problem. I felt that it would be useful in the program to be designed, so I checked the relevant information.

The following are some helpful links:

Zero copy usage in Cuda -- for two-dimensional pointers

Zero-copy usage in Cuda -- for one-dimensional pointers

Cuda zero copy usage-two-dimensional struct pointer

Discussion on Cuda zero-copy memory

After investigation, it is found that the zero copy technology is suitable for centralized computing and less memory copy times. Such as vector dot product and sum calculation.

Since the zero copy technology opens up memory space on the CPU and the GPU can directly access the space, I have a question: "If the space opened on the CPU exceeds the available space of the GPU, will the GPU memory overflow occur?"

Specifically:

Assuming that the GPU memory is 1 GB, I use 999 MB, and the idle memory is only 1 MB, but the space opened on the CPU is 10 MB, and the GPU is required to perform operations, will the graphics memory on the GPU overflow at this time?

After some investigation, the conclusion is that it will not overflow.

On the csdn Forum, someone asked: "Is the GPU graphics card memory large enough in the shooting process? Don't you think about it ?"

Someone replied: "You can apply for memory larger than the GPU memory, as long as the host memory is large enough ~",

Also, "you can write a program to implement it by yourself. Use the API mentioned above to apply for a memory space that exceeds the GPU memory, and then get the pointer of the device to perform operations, my GPU memory is 6 GB and the memory is 32 GB. If I applied for a 16 GB space in the experiment, I can apply for it and the result of the kernel operation is correct ".

So far, this question is answered. The conclusion is that the zero copy technology opens up the complete memory on the host, while the GPU is used for reading and operating from the GPU, rather than reading the whole block.

P.s. some people say that this problem exists: "zerocopy does not seem to support complex operations, and make_float4 () does not support it. I will see an error when I use it." It will be verified in future use, I don't know if the same problem will occur in later Cuda versions.

Query of Cuda zero-memory copy

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Query of Cuda zero-memory copy

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support