Deep Learning Learning Notes (ii): Neural network Python Implementation __python

Source: Internet
Author: User

Python implementation of multilayer neural networks.

The code is pasted first, the programming thing is not explained.

Basic theory reference Next: Deep Learning Learning Notes (iii): Derivation of neural network reverse propagation algorithm

Supervisedlearningmodel, Nnlayer, and softmaxregression that appear in your code, refer to the previous note: Deep Learning Learning Notes (i)--softmax regression

Multilayer Neural Network:

Import NumPy as NP from nnbase import Nnlayer to Softmax import softmaxregression from dp.supervised import nnbase from
        Time Import Time Class MNN (Nnbase.supervisedlearningmodel): ' Classdocs ' def __init__ (self, params): "' Constructor parameters:params-the network configuration, Dict p
            Arams.inputsize-dimension of input features params.outputsize-number of output classes Params.layersizes-an Array, sizes of all layer, including all hidden layers and output layer p Arams.  Lambda-scaling parameter for L2 weight regularization penalty Params.activitionfunc-which type
        of activation function to use in hidden layers ' layersizes = params[' layersizes '] Self.numlayers = Len (layersizes) self.alllayers = [] self.
        X=0 #initialize all hidden layersInputsize = params[' inputsize ' for I in Range (self.numlayers-1): Layer = Nnlayer (inputsize,layersizes [i],params[' Lambda '],actfunc=params[' Activitionfunc ']) self.allLayers.append (layer) Inputsize=lay Ersizes[i] #initialize the Softmax layer-output layer outputlayer=softmaxregression (inputsize,params[' ou
        Tputsize '],params[' Lambda]) self.allLayers.append (outputlayer) def Rebuildtheta (Self,theta): "Convert the 1-dim weight to all layers weights and intercepts overwrite the" super CLA " SS ' ' starter=0 for I in Range (self.numlayers): Thetasize = (self.alllayers[i].inpu tsize+1) *self.alllayers[i].outputsize Th=theta[starter:starter+thetasize] Starter=star Ter+thetasize Self.alllayers[i].rebuildtheta (TH) def flattheta (self): ' Conver T all weights and INTERCept to 1-dim vector overwrite of super class ' Theta= Self.alllayers[0].flattheta () For I in Range (self.numlayers-1): temp = Self.alllayers[i+1].flattheta () Theta =np.hstack (  theta,temp)) return theta def nnforward (self,theta,x,y): ' The Forward method ' Act=x Self.rebuildtheta (theta) Self.alllayers[-1].settrainin Glabels (y) for I in Range (self.numlayers): Self.alllayers[i].input=act AC
        t = self.alllayers[i].forward () Return Act def cost (self, theta,x,y): '
        The cost function.
                       Parameters:theta-the vector hold the weights and intercept, needed by scipy.optimize function Size: (numClasses-1) * (numfeatures + 1) ' ' H = Np.log (Self.nnforward (theta,x,y) #h * Self.y_mat, apply the indicator function cost =-np.sum (H *self.alllayers[-1].y_mat, axis= (0, 1)) 
        X.SHAPE[1] return cost def gradient (self,theta,x,y): ' Compute the gradient.
        Overwrite the method of super class. Parameters:theta-1-dim vector,containing all weights and intercepts "' Self.nnforward" (t heta,x,y) i= self.numlayers-1 grad = np.empty (0) while i>0: #get the Gradi ent of one Layer gwb=self.alllayers[i].layergradient () #backpropa Gate the error Terms self.alllayers[i-1].delta=self.alllayers[i].backpropagate () GRA D=np.hstack ((Gwb.ravel (), grad)) I=i-1 #get The gradient of the Firs 
                   
  T hidden layer gwb=self.alllayers[0].layergradient ()      Grad=np.hstack ((Gwb.ravel (), grad)) return grad def costfunc (self,theta,x,y): " ' Grad=self.gradient (theta, X, y) h=np.log (self.alllayers[-1].activation) cost =-N  P.sum (H * self.alllayers[-1].y_mat, axis= (0, 1))/x.shape[1] return Cost,grad def predict (self,
        Xtest): "Prediction."
        Overwrite the method of super class. Before calling this method, this model should is training parameter:xtest-the data to be predict
            Ed, numfeatures by Numdata ' "Act=xtest for I in Range (self.numlayers-1): Self.alllayers[i].input=act act = Self.alllayers[i].forward () return self.alllayers[-1].predict ( ACT) def checkgradient (x,y): params = dict () params[' inputsize ']=x.shape[0] params[' outputsize ']=10 par ams[' layersizes ']=[50,20,10] params[' Lambda ']=0 params[' activitionfunc ']= ' sigmoid ' TESTNN = MNN (params) #testnn. Settraindata (X, y) theta = t  
    Estnn.flattheta () Cost,grad = Testnn.costfunc (theta,x,y) #print (Np.size (theta)) #print (Np.size (grad)) Numgrad = Np.zeros (grad.shape) e = 1e-6 for I in range (Np.size (grad)): theta[i]=the Ta[i]-e loss1,g1 =testnn.costfunc (theta,x,y) theta[i]=theta[i]+2*e loss2,g2 = Testnn.costfunc (thet A,x,y) theta[i]=theta[i]-e Numgrad[i] = (-loss1 + loss2)/(2 * e) print   (Np.sum (Np.abs (Grad-numgrad))/np.size (grad))

Random gradient drop (rewrite from ufldl matlab random gradient descent code):

Import NumPy as NP def MINFUNCSGD (funcobj,theta,data,labels,options): ' Runs stochastic gradient with Mo
    
    Mentum to optimize the parameters for the given objective.  Parameters:funobj-function handle which accepts as input theta, data, labels and returns
      Cost and gradient w.r.t to Theta.  theta-unrolled parameter vector data-stores data in m x N x numexamples tensor labels-
    
     corresponding labels in numexamples x 1 vector options-struct to store specific options for optimization returns:opttheta-optimized parameter Vector Options (* required) epochs*-Number of Epochs through data alpha*-initial learning rate minibatch*-size of Minibatch Momentum-mo  MENTUM constant, defualts to 0.9 ' ' epochs =options[' epochs '] alpha = options[' alpha '] Minibatch = Options[' mInibatch '] if options.get (' momentum ') ==none:options[' momentum ']=0.9 m= labels.shape[0] mom=0.5 mo Mincrease = velocity = Np.zeros (theta.shape) #SGD loop it =0 for E in range (epochs): RP=NP . Random.permutation (m) for I in Range (0,m-minibatch,minibatch): it =it+1 #increas
            E momentum after momincrease iterations if it==momincrease:mom=options[' momentum ' #get Next randomly selected Minibatch Mb_data = Data[:,rp[i:i+minibatch]] Mb_labels = labels[rp[ I:i+minibatch]] # Evaluate the objective function on the next minibatch Cost,grad = Funcobj (theta, Mb_data,mb_labels) ' Instructions:add in the weighted velocity vector to the grad
             Ient evaluated above scaled by the learning rate. Then update the current weights theta according to the SGD update rule 
            ' Velocity=mom*velocity+alpha*grad theta=theta-velocity print (' Ep
        Och%d:cost on iteration%d are%f\n '% (e,it,cost)) #aneal learning rate by factor of two after each epoch Alpha = alpha/2.0 return theta

Test:

Use the Mnist dataset for testing, the correct rate is about 96%.

Test code:

    X = Np.load ('.. /.. /common/trainimages.npy ')/255
    X = x.t
    y = np.load ('.. /.. /common/trainlabels.npy ')
    ' '
    x1=x[:,:10]
    y1=y[:10]
    checkgradient (x1,y1) ' "
    xtest = Np.load ('.. /.. /common/testimages.npy ')/255
    xtest = xtest.t
    ytest = Np.load ('.. /.. /common/testlabels.npy ') 
    params = Dict ()
    params[' inputsize ']=x.shape[0]
    params[' outputsize ']=10
    params[' layersizes ']=[256,10]    
    params[' Lambda ']=0
    params[' activitionfunc ']= '  
    
    MNN (params)
    t0=time ()
    nn.train (X, y)
    print (' Training time%.5f s '% (time ()-t0))
    print (' Test ACC: %.3f%% '% (nn.performance (xtest,ytest)))
    


The problems that exist:

1. Using the FMIN_CG and fmin_l_bfgs_b two functions in Scipy.optimize, only one hidden layer has no problem, can get the desired result, but the hidden layer is more than one layer of time but not the correct result, the number of iterations is only single digits. Using gradient descent or random gradient descent method, we can get the desired result for the model of multiple hidden layers. I do not know whether the neural network to achieve problems or scipy.optimize problems.

2. Penalty items are not used in the code for cost functions and gradients. Because the output layer uses Softmax (fixed last class output is 0, does not use the penalty item), does not know whether wants the hidden layer the parameter to normalize. However, from the actual results, without any penalty, the result is similar to the result of using the two-time cost function + penalty.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.