Deep learning five: using GPU to accelerate neural network training

Source: Internet
Author: User

Using neural network training, one of the biggest problems is the training speed problem, especially for deep learning, too many parameters will consume a lot of time, in the neural network training process, the most of the operation is about the matrix operation, this time just used to GPU,GPU is used to deal with graphics, However, because of the efficiency of its processing matrix calculation, it is applied to deep learning. Theano supports GPU programming, but it only supports Nvidia's graphics card, and for Python programming, modifying some code can use the GPU for acceleration.

First, you need to install the GPU environment (description: I began to follow the official website of the error, the following is my comprehensive online some information after the final installation of the environment configuration, I can use the machine)

  1. The installation of Cuda,cuda is the GPU development environment provided by NVIDIA, which can be downloaded directly on the official website, I install the WINDOWS64-bit version, follow the steps of the established step by step installation
  2. Install Visual studio2010 (Cuda support for Visual Studio 2010,2012,2013), and I'm just not pretending this is causing the error: NVCC compiler not found
  3. The CUDA_PATH environment variable is automatically added to the Windows environment variables during the CUDA installation: C:\Program files\nvidia GPU Computing toolkit\cuda\v7.5
  4. The path configuration in the environment variable is as follows:
C:\Program Files\nvidia GPU Computing toolkit\cuda\v7.5\bin; C:\Program Files\nvidia GPU Computing TOOLKIT\CUDA\V7.5\LIBNVVP;
5. In The previous article describes the installation of Theano under Windows, in which there is a. theanorc.txt file, and if you need to use the GPU, you need to change its file to:
[Global]device=gpufloatx=float32openmp=false[blas]ldflags=[gcc]cxxflags =-id:\anaconda2\mingw[cuda]root=c:\ Program Files\nvidia GPU Computing toolkit\cuda\v7.5\bin[nvcc]fastmath=trueflags=-ld:\anaconda2\libscompiler_ Bindir=c:\program Files (x86) \microsoft Visual Studio 10.0\vc\bin

If you need to verify that you have successfully turned on the GPU, you can use the following test program (see Understanding gputest.py) and if the display uses a GPU, it means success! here, you can use Theano to write GPU-accelerated programs, in Theano to write the Theano program need to pay attention to several points:floating-point numbers in 1.Python are float64-bit by default, but if you need cuda then you have to turn the floating-point number into float32, which is used in the. theanorc.txt above. Floatx=float32 That's why, of course, there are several other methods, such as the T.fvector method of using tensor.2. Using GPU programming, the shared parameters in Theano needs to be fully float32 bit data , and the array must use Dtype= Float32 are defined or converted into float32 using methods such as Astype3. Be careful to access data from the GPU, if you need to put all the data into the GPU, then it is best to tell the parameters all become 32-bit shared parameters, avoid or use the Gpu_from_host method cautiouslyKnowing the above, we can change the code of the previous article to a code that can be run on the GPU, where the changes are as follows:change all data types to float32 bits.
Np.random.seed (0) train_x, train_y = Datasets.make_moons (noise=0.20) train_y_onehot = Np.eye (2) [train_y] #设置参数num _example=len (train_x) nn_input_dim=2 #输入神经元个数nn_output_dim =2 #输出神经元个数nn_hdim =1000# Gradient drop parameter Epsilon=np.float32 (0.01) # Learning Ratereg_lambda=np.float32 (0.01) #正则化长度 # Set shared variable # GPU note:conversion to float32 to store them on the gpu! X = theano.shared (Train_x.astype (' float32 ')) # initialized on the Gpuy = theano.shared (Train_y_onehot.astype (' float32 ') ) # GPU Note:conversion to float32 to store them on the GPU!W1 = theano.shared (Np.random.randn (Nn_input_dim, Nn_hdim). asty PE (' float32 '), name= ' W1 ') B1 = theano.shared (Np.zeros (Nn_hdim). Astype (' float32 '), Name= ' B1 ') W2 = theano.shared ( Np.random.randn (Nn_hdim, Nn_output_dim). Astype (' float32 '), name= ' W2 ') b2 = theano.shared (Np.zeros (Nn_output_dim). Astype (' float32 '), name= ' B2 ') W1.set_value ((Np.random.randn (Nn_input_dim, Nn_hdim)/np.sqrt (Nn_input_dim)). Astype ( ' float32 ') b1.set_value (Np.zeros (Nn_hdim). Astype (' float32 ')) W2.set_value ((Np.random.randn (Nn_hdim, Nn_output_dim)/np.sqrt (Nn_hdim)). Astype (' float32 ') b2.set_value (Np.zeros (nn_output_ Dim). Astype (' float32 '))

 The input values traing_x and train_y are also set to the shared variables of Theano, and the data is put into the GPU for operation. The rest of the process is the same, the entire code is shown below:
#-*-Coding:utf-8-*-import theanoimport theano.tensor as Timport numpy as Npfrom sklearn import Datasetsimport Matplotl Ib.pyplot as Pltimport time# definition data type np.random.seed (0) train_x, train_y = Datasets.make_moons (noise=0.20) train_y_ Onehot = Np.eye (2) [train_y] #设置参数num_example =len (train_x) nn_input_dim=2 #输入神经元个数nn_output_dim =2 #输出神经元个数nn_hdim = 1000# Gradient Descent parameter Epsilon=np.float32 (0.01) #learning ratereg_lambda=np.float32 (0.01) #正则化长度 # Set shared variable # GPU note:conversion to Float32 to store them on the gpu! X = theano.shared (Train_x.astype (' float32 ')) # initialized on the Gpuy = theano.shared (Train_y_onehot.astype (' float32 ') ) # GPU Note:conversion to float32 to store them on the GPU!W1 = theano.shared (Np.random.randn (Nn_input_dim, Nn_hdim). asty PE (' float32 '), name= ' W1 ') B1 = theano.shared (Np.zeros (Nn_hdim). Astype (' float32 '), Name= ' B1 ') W2 = theano.shared ( Np.random.randn (Nn_hdim, Nn_output_dim). Astype (' float32 '), name= ' W2 ') b2 = theano.shared (Np.zeros (Nn_output_dim). Astype (' float32 '), name= ' B2 ') #前馈算法z1 =x.Dot (W1) +b1a1=t.tanh (z1) Z2=a1.dot (W2) +b2y_hat=t.nnet.softmax (Z2) #正则化项loss_reg =1./num_example * REG_LAMBDA/2 * ( T.sum (T.square (W1)) +t.sum (T.square (W2))) Loss=t.nnet.categorical_crossentropy (y_hat,y). Mean () +loss_reg# Prediction Results Prediction=t.argmax (Y_hat,axis=1) forword_prop=theano.function ([],y_hat) Calculate_loss=theano.function ([], Loss) predict=theano.function ([],prediction) #求导dw2 =t.grad (LOSS,W2) Db2=t.grad (LOSS,B2) Dw1=t.grad (LOSS,W1) db1= T.grad (LOSS,B1) #更新值gradient_step =theano.function ([], updates= ((W2,W2-EPSILON*DW2), (b2,b2-epsilon*db 2), (W1,W1-EPSILON*DW1), (B1,B1-EPSILON*DB1))) def Build_model (Num_passes=20000,print_loss=false): W1. Set_value ((Np.random.randn (Nn_input_dim, Nn_hdim)/np.sqrt (Nn_input_dim)). Astype (' float32 ')) B1.set_value ( Np.zeros (Nn_hdim). Astype (' float32 ')) W2.set_value ((Np.random.randn (Nn_hdim, Nn_output_dim)/np.sqrt (Nn_hdim)). Astype (' float32 ')) B2.set_value (Np.zeros (Nn_output_dim)) astype (' float32 ')) for I in Xrange (0,num_pAsses): Start=time.time () Gradient_step () end=time.time () # print "Time require:" # pri NT (End-start) if Print_loss and i%1000==0:print "loss after iteration%i:%f"% (I,calculate_loss ()) def    Accuracy_rate (): Predict_result=predict () count=0; For I in range (len (predict_result)): Realresult=train_y[i] if (Realresult==predict_result[i]): cou Nt+=1 print "Count" Print count print "The correct rate is:%f"% (float (count)/len (Predict_result)) def Plot_decis Ion_boundary (Pred_func): # Set min and Max values and give it some padding x_min, X_max = train_x[:, 0].min ()-. 5, train_x[:, 0].max () +. 5 y_min, Y_max = train_x[:, 1].min ()-. 5, train_x[:, 1].max () +. 5 h = 0.01 # Generate A Grid of points with distance h between them xx, yy = Np.meshgrid (Np.arange (X_min, X_max, h), Np.arange (Y_min, Y_max, h) ) # Predict The function value for the whole gid Z = Pred_func (Np.c_[xx.ravel (), Yy.ravel ()]) Z = Z.reshape (xx.shape) # Plot the contour and training examples Plt.contourf (xx, yy, Z, cmap=plt.c m.spectral) Plt.scatter (train_x[:, 0], train_x[:, 1], c=train_y, cmap=plt.cm.spectral) plt.show () Build_model (print_l oss=true) Accuracy_rate () # plot_decision_boundary (Lambda x:predict (x)) # Plt.title ("decision boundary for hidden layer Size 3 ")
in order to make the acceleration effect more obvious, adjust the number of hidden layers to 1000 and then adjust the number of training parameters to 5,000, first look at the results of the execution:

Then we compare the time cost of one iteration before using GPU acceleration and after using GPU acceleration, requiring the CPU to simply change the GPU of the device of the configuration file above to CPU . after using the GPU, the time for one iteration of Gradient_step () is:

the result of running with the CPU is:

My video card is GT720, belongs to the relatively low-end graphics card, my CPU is Inter i5, is not a lot of CPU, but even if the configuration is large, but the acceleration effect is also 5 times times more, in a slightly better GPU, the experiment can run to 7.5ms, accelerating a full 40 times times more, So the GPU's acceleration of the training process is obvious.


Deep learning five: using GPU to accelerate neural network training

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.