Neural network and deep Learning series Article 16: Reverse Propagation algorithm Code

Source: Internet
Author: User

Source: Michael Nielsen's "Neural Network and Deep learning", click the end of "read the original" To view the original English.

This section translator: Hit Scir master Li Shengyu

Disclaimer: If you want to reprint please contact [email protected], without authorization not reproduced.

    1. Using neural networks to recognize handwritten numbers

    2. How the inverse propagation algorithm works

      • Warm-up: A method of fast computing neural network output based on matrix

      • Two assumptions about the loss function

      • Hadamard product

      • Four basic equations behind the reverse propagation

      • Proof of four basic equations (selected readings)

      • Inverse propagation algorithm

      • Inverse Propagation Algorithm Code

      • Why is it that the reverse propagation algorithm is efficient

      • Reverse Propagation: Overall description

    3. Learning method of improving neural network

    4. Neural network can calculate visual proof of arbitrary function

    5. Why the training of deep neural networks is difficult

    6. Deep learning

Hint: This section code is many, recommended to read on the computer

After a theoretical understanding of the inverse propagation algorithm, it is possible to understand the code used to implement the reverse propagation algorithm in the previous chapter. Recall the code for the update_mini_batch and Backprop methods in the Network class in the first chapter. This code can be seen as a direct translation of the above algorithm description. Specifically, the update_mini_batch method updates the weight and bias of the Network for the current small batch (Mini_batch) by calculating the gradient.

  class Network (object): Def update_mini_batch (self, Mini_batch, eta): "" Update the Network ' s weigh        TS and biases by applying gradient descent using BackPropagation to a single mini batch.        The "Mini_batch" is a list of tuples "(x, Y)", and "ETA" are the learning rate. "" Nabla_b = [Np.zeros (b.shape) for B in self.biases] nabla_w = [Np.zeros (w.shape) for W in Self.weights] for X , y in mini_batch:delta_nabla_b, delta_nabla_w = Self.backprop (x, y) nabla_b = [Nb+dnb for NB, DNB In Zip (Nabla_b, delta_nabla_b)] Nabla_w = [NW+DNW-NW, DNW in Zip (Nabla_w, delta_nabla_w)] Self.weigh TS = [W (Eta/len (mini_batch)) *nw for W, NW in Zip (self.weights, nabla_w)] Self.biases = [B- (Eta/len (Mini_batch)) *nb for B, NB in Zip (self.biases, nabla_b)]  

Most of the work is done by delta_nabla_b, Delta_nabla_w = Self.backprop (x, y) this line of code. It uses the Backprop method to calculate the biased and. The Backprop method is basically implemented as described in the previous section, but with a different point: we used a slightly different method to index the layer. This change leverages the advantages of the list negative index feature in Python to index a list from the back forward. For example, l[-3] represents the penultimate third item of the list L . The code for the Backprop method is as follows, along with some helper methods for calculating the function, the derivative, and the derivative of the cost function. You should be able to understand the code below. However, if you encounter difficulties, you can refer to the first chapter of the code described in this section.

Class Network (object): ... def backprop (self, x, y): "" "Return a Tuple" (Nabla_b, Nabla_w) "representing the  Gradient for the cost function c_x. "Nabla_b" and "Nabla_w" is layer-by-layer lists of numpy arrays, similar to "self.biases" and "self.weights        ".""" Nabla_b = [Np.zeros (b.shape) for B in self.biases] nabla_w = [Np.zeros (w.shape) for W in Self.weights] # fee  Dforward activation = x activations = [x] # list to store all the activations, layer by layer ZS = [] # list to store all the z-vectors, layer by layer for B, w in Zip (self.biases, self.weights): Z = np.do        T (w, activation) +b zs.append (z) activation = sigmoid (z) activations.append (activation) # backward Pass Delta = self.cost_derivative (activations[-1], y) * Sigmoid_prime (zs[-1]) NA BLA_B[-1] = Delta Nabla_w[-1] = Np.dot (Delta, Activations[-2].transpose ()) # note that the variable L in the loop below was used a little # differently to the notation in Chapter 2 of the book.  Here, # L = 1 means the last layer of neurons, L = 2 are the # Second-last layer, and so on. It's a renumbering of the # scheme in the book, used here to take advantage of the fact # that Python can us        e negative indices in lists. For L in Xrange (2, self.num_layers): Z = zs[-l] sp = Sigmoid_prime (z) delta = Np.dot (self . Weights[-l+1].transpose (), Delta) * SP NABLA_B[-L] = Delta Nabla_w[-l] = Np.dot (Delta, activations[ -l-1].transpose ()) return (Nabla_b, nabla_w) ... def cost_derivative (self, output_activations, y): "" "Retu        RN the vector of partial derivatives \partial c_x/\partial A for the output activations. ""    Return (OUTPUT_ACTIVATIONS-Y) def sigmoid (z): "" "the sigmoid function." " Return 1.0/(1.0+np.exp (z)) def sigmoid_prime (z): "" "DerivAtive of the sigmoid function. "" return sigmoid (z) * (1-sigmoid (z))
Problem
    • Applying a completely matrix-based reverse propagation method on a batch (Mini-batch)

In our implementation of the random gradient descent algorithm, we need to traverse the training sample in one batch (Mini-batch) in turn. We can also modify the inverse propagation algorithm so that it can calculate gradients for all training samples in a batch at the same time. We pass in a matrix (instead of a vector) at the input, and the columns of this matrix represent the vectors in this batch. In forward propagation, each node multiplies the input by multiplying the weight matrix, adding a bias matrix, and applying sigmoid functions to get the output, which is also calculated in a similar way when it is transmitted in reverse. Explicitly write this method of reverse propagation and modify network.py it so that it is calculated using this completely matrix-based method. The advantage of this approach is that it makes better use of the modern linear library and works faster than the loop. (for example, when solving a mnist classification problem similar to the one discussed in the previous chapter on my laptop, it can be up to twice times faster.) In practice, all formal reverse propagation algorithm libraries use this completely matrix-based approach or its variants.

In the next section we will cover "why the reverse propagation algorithm is efficient", so stay tuned!

    • "Hit Scir" public number

    • Editorial office: Guo Jiang, Li Jiaqi, Xu June, Li Zhongyang, Hulin Lin

    • Editor of the issue: Li Zhongyang

Neural network and deep Learning series Article 16: Reverse Propagation algorithm Code

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.