Bo Master due to the needs of the work, began to learn the GPU above the programming, mainly related to the GPU based on the depth of knowledge, in view of the previous did not contact GPU programming, so here specifically to learn the GPU above programming. Have like-minded small partners, welcome to exchange and study, my email: caijinping220@gmail.com. Using the Geforce 103m graphics card on his old notebook, although the graphics card is already very weak relative to the current mainstream series, it is still available for learning. This series of posts also follows the process of documenting your learning from simplicity to complexity. 0. Directory GPU Programming primer to Proficient (i) CUDA environment installation GPU Programming primer to Proficient (ii) running the first program GPU programming to Master (iii) the first GPU program to master GPU programming to Proficient (iv) GPU program excellent The introduction of GPU programming to proficient (v) GPU Program optimization Advanced 1. CUDA initialization function
Because you are using the Runtime API, you add the cuda_runtime.h header file at the beginning of the file.
The initialization function consists of a few steps: 1.1. Get the number of CUDA devices
You can use the Cudagetdevicecount function to get the number of CUDA devices, as follows:
Get the Cuda device Count
cudagetdevicecount (&count);
if (count = = 0) {
fprintf (stderr, "There is no device.\n");
return false;
}
function to obtain the number of CUDA devices currently supported by passing the count value by reference. 1.2. Get CUDA Device Properties
You can obtain the properties of the CUDA device by using the Cudagetdeviceproperties function, as follows:
Find the device >= 1.X
int i;
for (i = 0; i < count; ++i) {
cudadeviceprop prop;
if (Cudagetdeviceproperties (&prop, i) = = cudasuccess) {
if (prop.major >= 1) {
printdeviceprop (prop);
break;
}
}} If can ' t find the device
if (i = = count) {
fprintf (stderr, "There is no-device supporting CUDA 1.x.\n");
return false;
The function passes the prop about the property by reference, and lists the device properties with the main device number greater than 1, where the device properties are printed by the function Printdeviceprop. The print function looks like this:
function Printdeviceprop void Printdeviceprop (const cudadeviceprop &prop) {printf ("Device Name:%s.\n", prop.)
name);
printf ("Totalglobalmem:%d.\n", Prop.totalglobalmem);
printf ("Sharedmemperblock:%d.\n", Prop.sharedmemperblock);
printf ("Regsperblock:%d.\n", Prop.regsperblock);
printf ("Warpsize:%d.\n", prop.warpsize);
printf ("Mempitch:%d.\n", Prop.mempitch);
printf ("Maxthreadsperblock:%d.\n", Prop.maxthreadsperblock);
printf ("maxthreadsdim[0-2]:%d%d%d.\n", prop.maxthreadsdim[0], prop.maxthreadsdim[1], prop.maxthreadsdim[2]);
printf ("maxgridsize[0-2]:%d%d%d.\n", prop.maxgridsize[0], prop.maxgridsize[1], prop.maxgridsize[2]);
printf ("Totalconstmem:%d.\n", Prop.totalconstmem);
printf ("Major.minor:%d.%d.\n", Prop.major, Prop.minor);
printf ("Clockrate:%d.\n", prop.clockrate);
printf ("Texturealignment:%d.\n", prop.texturealignment);
printf ("Deviceoverlap:%d.\n", Prop.deviceoverlap); printf ("MUltiprocessorcount:%d.\n ", Prop.multiprocessorcount);
}
1.3. Set up CUDA equipment
The CUDA device can be set up by function Cudasetdevice, as follows:
Set Cuda device
cudasetdevice (i);
1.4. CUDA Initialize complete code
/* ******************************************************************* ##### file Name:first_cuda.cu ##### file Func: Initial CUDA device and print device prop ##### author:caijinping ##### e-mail:caijinping220@gmail.com ##### Create time : 2014-4-21 * ********************************************************************/#include <stdio.h> # Include <cuda_runtime.h> void Printdeviceprop (const cudadeviceprop &prop) {printf ("Device Name:%s.\n", p
Rop.name);
printf ("Totalglobalmem:%d.\n", Prop.totalglobalmem);
printf ("Sharedmemperblock:%d.\n", Prop.sharedmemperblock);
printf ("Regsperblock:%d.\n", Prop.regsperblock);
printf ("Warpsize:%d.\n", prop.warpsize);
printf ("Mempitch:%d.\n", Prop.mempitch);
printf ("Maxthreadsperblock:%d.\n", Prop.maxthreadsperblock);
printf ("maxthreadsdim[0-2]:%d%d%d.\n", prop.maxthreadsdim[0], prop.maxthreadsdim[1], prop.maxthreadsdim[2]); printf ("maxgridsize[0-2]:%d%d%d.\n", Prop.maxgRidsize[0], prop.maxgridsize[1], prop.maxgridsize[2]);
printf ("Totalconstmem:%d.\n", Prop.totalconstmem);
printf ("Major.minor:%d.%d.\n", Prop.major, Prop.minor);
printf ("Clockrate:%d.\n", prop.clockrate);
printf ("Texturealignment:%d.\n", prop.texturealignment);
printf ("Deviceoverlap:%d.\n", Prop.deviceoverlap);
printf ("Multiprocessorcount:%d.\n", Prop.multiprocessorcount);
BOOL Initcuda () {//used to count the device numbers int count;
Get the Cuda device Count Cudagetdevicecount (&count);
if (count = = 0) {fprintf (stderr, "There is no device.\n");
return false;
}//Find the device >= 1.X int i;
for (i = 0; i < count; ++i) {Cudadeviceprop prop; if (Cudagetdeviceproperties (&prop, i) = = cudasuccess) {if (prop.major >= 1) {Printdev
Iceprop (prop);
Break
}}//If can ' t find the device if (i = = count) {fprintf (stderr, "There is no device supporting CUDA 1.x.\n");
return false;
}//Set Cuda device Cudasetdevice (i);
return true;
int main (int argc, char const *argv[]) {if (Initcuda ()) {printf ("CUDA initialized.\n");
return 0;
}
2. Runtime API function Parsing
2.1. Cudagetdevicecount
Cudagetdevicecount
Returns the number of devices that have the ability to compute
function Prototypes:
cudaerror_t Cudagetdevicecount (int* count)
function Description:
Returns the number of devices that can be used for execution in *count form greater than or equal to 1.0. If no such device exists, it will return 1
return Value:
Cudasuccess, note that the function might return an error code if it was previously started asynchronously.
2.2. Cudagetdeviceproperties
cudagetdeviceproperties
Returns information about the computing device
function Prototypes:
cudaerror_t cudagetdeviceproperties (struct cudadeviceprop* prop,int dev)
function Description:
Returns the properties of the device dev in *prop form.
return Value:
Cudasuccess, Cudaerrorinvaliddevice, note that the function may return an error code if it was previously started asynchronously.
In addition the CUDADEVICEPROP structure is defined as follows:
struct Cudadeviceprop {
char name [256];
size_t Totalglobalmem;
size_t Sharedmemperblock;
int regsperblock;
int warpsize;
size_t Mempitch;
int maxthreadsperblock;
int Maxthreadsdim [3];
int maxgridsize [3];
size_t Totalconstmem;
int major;
int minor;
int clockrate;
size_t texturealignment;
int deviceoverlap;
int multiprocessorcount;
}
The variable meanings in the Cudadeviceprop structure are as follows:
name
An ASCII string used to identify the device;
Totalglobalmem
The total amount of global storage available on the device, in bytes;
Sharedmemperblock
The maximum amount of shared memory that can be used by the thread block, in bytes; all the thread blocks on the multiprocessor can share the memory at the same time;
Regsperblock
The maximum number of 32-bit registers that can be used by the thread block, and all thread blocks on multiple processors can share these registers at the same time;
warpsize
The size of the warp block calculated by thread;
Mempitch
Allows the maximum amount of space (pitch) allocated by the Cudamallocpitch () to the memory containing the memory region to be copied, in bytes;
Maxthreadsperblock
The maximum number of threads in each block
Maxthreadsdim[3]
Maximum values for each dimension of the block:
Maxgridsize[3]
The maximum value of each dimension of the grid;
Totalconstmem
The total amount of invariant memory available on the device, in bytes;
Major,minor
Defining the major revision and minor revision numbers of the equipment's computing capacity;
Clockrate
The clock frequency in khz unit;
texturealignment
alignment requirements; The texture base address that is aligned with the texturealignment byte does not need to apply an offset to the texture sampling;
Deviceoverlap
This value is 1 if the device can replicate the memory between the host and the device concurrently while executing the kernel, otherwise this value is 0;
Multiprocessorcount
Number of multiprocessor devices on the device.2.3. Cudagetdevicecount
Cudasetdevice
To set up a device for use by the GPU
function Prototypes:
cudaerror_t cudasetdevice (int dev)
function Description:
The dev record is the device where the active main thread will execute the device code.
return Value:
Cudasuccess, Cudaerrorinvaliddevice, note that the function may return an error code if it was previously started asynchronously.
3. NVCC Compile Code
NVCC is a CUDA compilation tool that parses the. cu file out of the parts that are executed on the GPU and host, that is, it helps separate the execution on the GPU from the code that executes on the host and does not have to be done manually. The portion of the GPU execution is compiled into a mediation code through the compiler supplied by NVIDIA, and the host executes the component by invoking GCC compilation.
You can compile a previously written FIRST_CUDA.CU program by using the following command:
Nvcc-o First_cuda First_cuda.cu
Build the executable file with the above compilation First_cuda
The results of the run are as follows:
This blog post describes how to create your own first CUDA program through the runtime API. Through this program, you can learn to use the general process of CUDA. The next section will describe how CUDA is programmed for GPU.
Welcome to discuss and learn about GPU programming with me.
Caijinping220@gmail.com
Http://blog.csdn.net/xsc_c