Today, I thought about the Cuda zero-memory copy problem. I felt that it would be useful in the program to be designed, so I checked the relevant information.
The following are some helpful links:
Zero copy usage in Cuda -- for two-dimensional pointers
Zero-copy usage in Cuda -- for one-dimensional pointers
Cuda zero copy usage-two-dimensional struct pointer
Discussion on Cuda zero-copy memory
After investigation, it is found that the zero copy technology is suitable for centralized computing and less memory copy times. Such as vector dot product and sum calculation.
Since the zero copy technology opens up memory space on the CPU and the GPU can directly access the space, I have a question: "If the space opened on the CPU exceeds the available space of the GPU, will the GPU memory overflow occur?"
Specifically:
Assuming that the GPU memory is 1 GB, I use 999 MB, and the idle memory is only 1 MB, but the space opened on the CPU is 10 MB, and the GPU is required to perform operations, will the graphics memory on the GPU overflow at this time?
After some investigation, the conclusion is that it will not overflow.
On the csdn Forum, someone asked: "Is the GPU graphics card memory large enough in the shooting process? Don't you think about it ?"
Someone replied: "You can apply for memory larger than the GPU memory, as long as the host memory is large enough ~",
Also, "you can write a program to implement it by yourself. Use the API mentioned above to apply for a memory space that exceeds the GPU memory, and then get the pointer of the device to perform operations, my GPU memory is 6 GB and the memory is 32 GB. If I applied for a 16 GB space in the experiment, I can apply for it and the result of the kernel operation is correct ".
So far, this question is answered. The conclusion is that the zero copy technology opens up the complete memory on the host, while the GPU is used for reading and operating from the GPU, rather than reading the whole block.
P.s. some people say that this problem exists: "zerocopy does not seem to support complex operations, and make_float4 () does not support it. I will see an error when I use it." It will be verified in future use, I don't know if the same problem will occur in later Cuda versions.
Query of Cuda zero-memory copy