Cuda was introduced a few months ago. At that time, I only learned about how to use it. Now I have read the large-scale parallel processor programming practice book again, the book talks about the first generation of Cuda architecture. Now the GPU has gone through Fermi and is already in the Kepler architecture. I still use the g80 card. It seems that I have to keep up with the times.
Today, when we use cuda, we have encountered problems according to our previous ideas. After the solution, the following is summarized:
1. Cuda can be used in console applications or GUI. When using the GUI, write the Cuda computing part in the function (. Cu file, because only
The. Cu file can be compiled by nvcc). You can call this function.
2. After the project is created, the nvcc compiler is not added. In this case, add Cuda (right-click Project) to the custom generation rule. For example, you only need the Cuda Runtime API.
3. I use cuda3.2 + vs2008. It seems that debugging is not supported in the. Cu file, so I do not know the storage conditions during GPU computing.
4. Pay special attention to the memory size during computing. The required global memory, shared memory, and registers must be strictly calculated to achieve optimal performance.
5. The performance is also affected by the consistency of the instructions executed by a single warp thread, and whether the data is merged and loaded when a single warp thread reads data.
6. Data prefetch is very effective for long thread latency.
7. And each instruction in the function occupies the instruction processing bandwidth, so reduce the instruction as much as possible, such as accumulating and expanding (example in books ).
Note: The previously written post custom generation Rules use the Cuda runtime. Today, I tried it as if it was not correct in use and sometimes failed to get the correct results, but it was okay to use Cuda. In some cases, there is another problem with Cuda. The correct result can be obtained by Cuda runtime. Why is it unclear. (This problem has been solved. GPU Device setup problem, cudasetdevice (0), using Cuda Runtime API, this is accurate, Runtime API, using real GPU devices)