Original article link
There are three projects, the first is global memory, and the other two are based on 1D and 2D Texture memory. Package and download the project.
The texture memory is read-only. The same as the constant memory, the texture memory is also cached in the chip. Therefore, in some cases, it can reduce memory requests and provide more efficient memory bandwidth. The texture memory is specially designed for graphics applications with a large amount of space locality in memory access modes. In a computing application, this means that the reading location of a thread may be "very close" to that of the neighboring thread ". Texture cache is designed to accelerate access to discontinuous addresses.
In the memory access mode of temperature computing, there is a huge memory space locality. This access mode can be accelerated with GPU texture memory. First declare the texture memory texture <float> Tex; this cache area needs to be bound to the memory buffer after memory is allocated. Then, when starting the kernel function, use a special function to tell the GPU to forward read requests to the texture memory instead of the standard global memory. When reading the memory, the blend_kernel () is changed to tex1dfetch () instead of using square brackets to read the buffer (). Another parameter dstout in blend_kernel () tells the buffer as the input and the output.
Cuda Heat Conduction Simulation Based on Texture memory