First download the CUDNN V3 installation package and routines on the Nvidia website, as shown in the red box:
Before installing CUDNN v3, you will need to install Cuda 7.0 or later and not repeat it.
Copy the downloaded two tgz packets to a path for the target machine that has CUDA 7.0 installed (I'm/home/yongke.zyk/local_install here), unzip it, get three subdirectories include/,lib64/and samples/. Where include/is the cudnn.h header file, it needs to be included at compile time; lib64/cudnn Static library libcudnn_static.a and dynamic library libcudnn.so.7.0.58 and its soft links libcudnn.so libcudnn.so.7.0, join when the program is linked. Samples/The following is a simple routine mnistcudnn using the CUDNN, which enables simple digital recognition.
To modify the 第41-46 behavior of Mnistcudnn/makefile:
Elsecuda_path =/usr/local/cudacuda_lib_path = $ (Cuda_path)/$ (cuda_libsubdir) Cudnn_lib_path =/home/yongke.zyk/local _install/lib64cudnn_include_path =/home/yongke.zyk/local_install/includeendif
Execute make, and compile the link to get the executable mnistcudnn. Next run, the output is as follows:
$./mnistcudnncudnngetversion (): 3002, cudnn_version from cudnn.h:3002 (3.0.02) Host compiler VERSION:GCC 4.4.6There is 2 CUDA capable devices on your machine:d evice 0:sms capabilities 5.2, Smclock 1076.0 Mhz, Memsize (Mb) 12287, M Emclock 3505.0 MHz, ecc=0, Boardgroupid=0device 1:sms capabilities 5.2, Smclock 1076.0 Mhz, Memsize (Mb) 12287, MemC Lock 3505.0 Mhz, ecc=0, boardgroupid=1using device 0Testing single precisionloading image data/one_28x28.pgmperforming fo Rward propagation ... Testing Cudnngetconvolutionforwardalgorithm ... Fastest algorithm is Algo 1Testing cudnnfindconvolutionforwardalgorithm ... ^^ ^^ cudnn_status_success for Algo 0:0.045056 Time requiring 0 memory^ ^^ ^ cudnn_status_success for Algo 1:0.055328 time requiring 3464 memory^ ^^ ^ cudnn_status_success For Algo 2:0.060416 time requiring 57600 memory^ ^^ ^ cudnn_status_success for Algo 4:0.142336 time requiring 207360 memo ry^ ^^ ^ cudnn_status_not_supported for Algo 3: -1.000000 time requiring 0 memoryrEsulting weights from softmax:0.0000000 0.9999399 0.0000000 0.0000000 0.0000561 0.0000000 0.0000012 0.0000017 0.0000010 0. 0000000Loading image data/three_28x28.pgmperforming forward propagation ... Resulting weights from softmax:0.0000000 0.0000000 0.0000000 0.9999288 0.0000000 0.0000711 0.0000000 0.0000000 0.0000000 0 .0000000Loading image data/five_28x28.pgmperforming forward propagation ... Resulting weights from softmax:0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 0.9999819 0.0000154 0.0000000 0.0000012 0 .0000006Result of Classification:1 3 5Test passed! Testing half precision (math in single precision) Loading image data/one_28x28.pgmperforming forward propagation ... Testing Cudnngetconvolutionforwardalgorithm ... Fastest algorithm is Algo 1Testing cudnnfindconvolutionforwardalgorithm ... ^^ ^^ cudnn_status_success for Algo 0:0.036896 Time requiring 0 memory^ ^^ ^ cudnn_status_success for Algo 1:0.039936 time requiring 3464 memory^ ^^ ^ cudnn_status_success For Algo 2:0.078880 Time Requiring 28800 memory^ ^^ ^ cudnn_status_success for Algo 4:0.131104 time requiring 207360 memory^ ^^ ^ Cudnn_status_not_supp Orted for Algo 3: -1.000000 time requiring 0 memoryresulting weights from softmax:0.0000001 1.0000000 0.0000001 0.0000000 0.0000563 0.0000001 0.0000012 0.0000017 0.0000010 0.0000001Loading Image data/three_28x28.pgmperforming forward Propagation ... Resulting weights from softmax:0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000709 0.0000000 0.0000000 0.0000000 0 .0000000Loading image data/five_28x28.pgmperforming forward propagation ... Resulting weights from softmax:0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 1.0000000 0.0000154 0.0000000 0.0000012 0 .0000006Result of Classification:1 3 5Test passed!
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
CUDNN V3 Routine Demo