In recent days, we have been exposed to deep learning, in view of the requirements of deep learning for speed and GPU computing, and the increasing complexity of the derivation calculation after the network layer deepens, the intention is to build a Theano platform (discard matlab), only for your own entertainment (fancy irrigation) ...
Main steps:
CPU Calculation of Theano
Build Cuda & VS2010
GPU Computing of Theano
1.Theano CPU Only
Conditions for installing Theano:
- Need to install numpy,scipy,noise and so on Python's package. Download Anaconda (go to the official website to download this http://www.continuum.io/downloads, very simple can be configured successfully, Anaconda integrated Pip,numpy, SciPy et cetera a lot of Python scientific computing common packages) Add environment variable "PATH = H:\Anaconda2; H:\Anaconda2\Scripts; H:\Anaconda2\Library\bin; " You can change it to your own.
- GCC compilers are required. If you already have it on your computer, you can skip this step. If you have not previously installed the GCC compiler, you can go to http://www.mingw.org/download the latest version, follow the above steps to go. But the simplest way is to install two packages via PIP (which can be used to manage the Python package), and enter Conda install MinGW Libpython under CMD. Add environment variable "PATH = H:\Anaconda2\MinGW\bin; H:\Anaconda2\MinGW\x86_64-w64-mingw32\lib; "
- Connect gcc and Theno. Create a new text document under the home folder (c:/user/{your name}), named. Theanorc.txt, and enter the following in it, Blas is accelerated CPU calculation, see http://icl.cs.utk.edu/ lapack-for-windows/lapack/, but for those who pursue GPU multiple parallel units, you can skip
[Blas]
Ldflags =
I installed the GCC compiler directly through PIP, and the other methods need to add a few more sentences:
[Blas]
Ldflags =
[GCC]
Cxxflags =-i{your gcc Address}
That is, tell Theano where your gcc compiler is, and if installing MinGW via PIP is not required.
After that, we can happily install Theano, just need to enter Pip install Theano in cmd ....
OK, and at the end of the Spyder test,
Import NumPy as NP
Import time
Import Theano
A = Np.random.rand (1000,10000). Astype (Theano.config.floatX)
B = Np.random.rand (10000,1000). Astype (Theano.config.floatX)
Np_start = Time.time ()
AB = A.dot (B)
Np_end = Time.time ()
X, y = theano.tensor.matrices (' XY ')
MF = theano.function ([X,y],x.dot (Y))
T_start = Time.time ()
TAB = MF (A, B)
T_end = Time.time ()
Print "NP time:%f[s], Theano time:%f[s] (Times should be close if run on cpu!)"% (
Np_end-np_start, T_end-t_start)
Print "Result difference:%f"% (Np.abs (ab-tab). Max (),)
Result is
NP Time:0.675000[s], Theano time:0.535000[s] (times should be close if run on cpu!)
Result difference:0.000000
2. Build Cuda and VS2010
VS Installation: Http://blog.sina.com.cn/s/blog_4f8cdc9e0100kklr.html (will not be MS Pull black ... )
Cuda Configuration: http://blog.csdn.net/yeyang911/article/details/17450963
Follow the above steps to do, it is emphasized that before the installation of Cuda must go to nvidia official online download the appropriate driver, do not drive XX drive, prone to problems, I first installed the graphics card driver lost ...
There is how to know whether their graphics card can be used in Cuda parallel computing framework, Https://developer.nvidia.com/cuda-gpus told you.
3.Theano GPU Computing
After completing the above steps, the rest is very simple, the home folder (c:/user/{your name}). The contents of the Theanorc.txt are modified to:
[Blas]
Ldflags =
[NVCC]
Fastmath = True
Flags =-lh:\anaconda2\libs
Compiler_bindir = D:\Program Files (x86) \microsoft Visual Studio 10.0\vc\bin
[Global]
Floatx = float32
device = GPU
Flags and Complier_binder change to your own address, and then test:
Run under the Spyder
From Theano import function, config, shared, sandbox
Import Theano.tensor as T
Import NumPy
Import time
Vlen = ten * 768 # ten X #cores x # Threads per core
Iters = 1000
RNG = Numpy.random.RandomState (22)
x = Shared (Numpy.asarray (Rng.rand (Vlen), CONFIG.FLOATX))
f = function ([], T.exp (x))
Print (F.maker.fgraph.toposort ())
T0 = Time.time ()
For I in Xrange (iters):
R = f ()
T1 = Time.time ()
Print ("It took%f seconds"% (t1-t0))
If Numpy.any ([Isinstance (X.op, t.elemwise) for x in F.maker.fgraph.toposort ()]):
Print (' used the CPU ')
Else
Print (' used the GPU ')
The above is done to do many times (10*30*768*1000) times exp calculation, with a tensor (Tensor) expressed.
Compare the difference between CPU and GPU computing under My Computer
Cpu:i7-4720hq
[Elemwise{exp,no_inplace} (<tensortype (float64, Vector) >)]
It took 55.188000 seconds
Used the CPU
GPU (GTX 960M)
[Gpuelemwise{exp,no_inplace} (<cudandarraytype (float32, Vector) >), Hostfromgpu (Gpuelemwise{exp,no_inplace}. 0)]
It took 1.817000 seconds
used the GPU
Using GPU Device 0:geforce GTX 960M
The speed has increased 30 times times ....
Oh, the next Shuangshuang.
Theano Study Notes (1. Environment Anaconda + Theano + VS2010 + CUDA)