Reprinted from http://soft.zdnet.com.cn/software_zone/2009/1127/1527418.shtml
1. software requirements:
Cudadriver_2.3_winvista_64_190.38_general
Cudatoolkit_2.3_win_64
Cudasdk_2.3_win_64
Vs2008
Uninstall the previously installed SDK, toolkit, and driver before installing the software. If the development platform does not support Cuda graphics, you do not need to install cudadriver_2.3_winvista_64_190.38_general.
2. Installation check
Run nvcc-V in cmd to view the current version number.
Nvcc: NVIDIA (r) Cuda compiler driver
Copyright (c) 2005-2009 NVIDIA Corporation
Built on mon_aug _ 3_19: 43: 55_pdt_2009
Cuda compilation tools, release 2.3, v0.2.1221
Run bandwidthtest to check whether the configuration is normal.
Go to the programdatanvidia configurationnvidia GPU computing sdkcbinwin64release> directory and run
.Bandwidthtest.exe -- Memory = pinned -- mode = range -- start = 10240000 -- end = 10240000-increment = 10240000
If it is normal, there will be similar information
Running on ......
Device 0: Quadro fx580
Range Mode
Host to device bandwidth for pinned memory
Transfer size (bytes) bandwidth (MB/s)
10240000 5101.1
Range Mode
Device to host bandwidth for pinned memory
Transfer size (bytes) bandwidth (MB/s)
10240000 4650.8
Range Mode
Device to device bandwidth
Transfer size (bytes) bandwidth (MB/s)
10240000 14812.5
& Test passed
Press enter to exit...
Execute devicequery.exe to view the specific model of the video card.
. Devicequery.exe
If it is normal, there will be similar information
Cuda device query (Runtime API) version (cudart static linking)
There is 1 device supporting Cuda
Device 0: "Quadro fx580"
Cuda driver version: 2.30
Cuda runtime version: 2.30
Cuda capability major revision number: 1
Cuda capability minor revision number: 1
Total amount of global memory: 536870912 bytes
Number of multiprocessors: 4
Number of cores: 32
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 16384 bytes
Total number of Registers available per block: 8192
Warp Size: 32
Maximum number of threads per block: 512
Maximum sizes of each dimension of a block: 512x512x64
Maximum sizes of each dimension of a grid: 65535x65535X1
Maximum memory pitch: 262144 bytes
Texture alignment: 256 bytes
Clock rate: 1.13 GHz
Concurrent copy and execution: Yes
Run Time limit on kernels: No
Integrated: No
Support host page-locked memory mapping: No
Compute mode: default (multiple host threads can use this device simultaneously)
Test passed
Press enter to exit...
Based on the information, the single precision floating point processing performance of the video card is estimated to be 3*32*1.13 = 108.48 gflops
3. Set system environment variables
Add the path of the installed Cuda SDK to the system environment variable:
Example C: programdatanvidia configurationnvidia GPU computing sdkcbinwin64
Under
─ ── Debug
├ ── Emudebug
└ ── Emurelease
└ ── Release
Several directories are added to the path of the system environment variable so that the corresponding dll library can be found during program running. (Practice: Add % cudarelease % to path, and cudarelease is the set system variable name)
Place the required header file to the vs2008 environment.
Copy the C: programdatanvidia configurationnvidia GPU computing sdkccommon directory to the C: usersdawningdocumentsvisual studio 2008 directory.
4. Build a simple Cuda project in vs2008
Copy the template project C: programdatanvidia configurationnvidia GPU computing sdkcsrc template to the vs2008 project directory c: usersdawningdocumentsvisual studio 2008 Projects
Open vs2008 and open template_vc90