Neural network and deep Learning series Article 16: Reverse Propagation algorithm Code

Last Update:2015-11-26 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Source: Michael Nielsen's "Neural Network and Deep learning", click the end of "read the original" To view the original English.

This section translator: Hit Scir master Li Shengyu

Disclaimer: If you want to reprint please contact [email protected], without authorization not reproduced.

Using neural networks to recognize handwritten numbers
How the inverse propagation algorithm works

Warm-up: A method of fast computing neural network output based on matrix
Two assumptions about the loss function
Hadamard product
Four basic equations behind the reverse propagation
Proof of four basic equations (selected readings)
Inverse propagation algorithm
Inverse Propagation Algorithm Code
Why is it that the reverse propagation algorithm is efficient
Reverse Propagation: Overall description

Learning method of improving neural network
Neural network can calculate visual proof of arbitrary function
Why the training of deep neural networks is difficult
Deep learning

Hint: This section code is many, recommended to read on the computer

After a theoretical understanding of the inverse propagation algorithm, it is possible to understand the code used to implement the reverse propagation algorithm in the previous chapter. Recall the code for the update_mini_batch and Backprop methods in the Network class in the first chapter. This code can be seen as a direct translation of the above algorithm description. Specifically, the update_mini_batch method updates the weight and bias of the Network for the current small batch (Mini_batch) by calculating the gradient.

  class Network (object): Def update_mini_batch (self, Mini_batch, eta): "" Update the Network ' s weigh        TS and biases by applying gradient descent using BackPropagation to a single mini batch.        The "Mini_batch" is a list of tuples "(x, Y)", and "ETA" are the learning rate. "" Nabla_b = [Np.zeros (b.shape) for B in self.biases] nabla_w = [Np.zeros (w.shape) for W in Self.weights] for X , y in mini_batch:delta_nabla_b, delta_nabla_w = Self.backprop (x, y) nabla_b = [Nb+dnb for NB, DNB In Zip (Nabla_b, delta_nabla_b)] Nabla_w = [NW+DNW-NW, DNW in Zip (Nabla_w, delta_nabla_w)] Self.weigh TS = [W (Eta/len (mini_batch)) *nw for W, NW in Zip (self.weights, nabla_w)] Self.biases = [B- (Eta/len (Mini_batch)) *nb for B, NB in Zip (self.biases, nabla_b)]

Most of the work is done by delta_nabla_b, Delta_nabla_w = Self.backprop (x, y) this line of code. It uses the Backprop method to calculate the biased and. The Backprop method is basically implemented as described in the previous section, but with a different point: we used a slightly different method to index the layer. This change leverages the advantages of the list negative index feature in Python to index a list from the back forward. For example, l[-3] represents the penultimate third item of the list L . The code for the Backprop method is as follows, along with some helper methods for calculating the function, the derivative, and the derivative of the cost function. You should be able to understand the code below. However, if you encounter difficulties, you can refer to the first chapter of the code described in this section.

Class Network (object): ... def backprop (self, x, y): "" "Return a Tuple" (Nabla_b, Nabla_w) "representing the  Gradient for the cost function c_x. "Nabla_b" and "Nabla_w" is layer-by-layer lists of numpy arrays, similar to "self.biases" and "self.weights        ".""" Nabla_b = [Np.zeros (b.shape) for B in self.biases] nabla_w = [Np.zeros (w.shape) for W in Self.weights] # fee  Dforward activation = x activations = [x] # list to store all the activations, layer by layer ZS = [] # list to store all the z-vectors, layer by layer for B, w in Zip (self.biases, self.weights): Z = np.do        T (w, activation) +b zs.append (z) activation = sigmoid (z) activations.append (activation) # backward Pass Delta = self.cost_derivative (activations[-1], y) * Sigmoid_prime (zs[-1]) NA BLA_B[-1] = Delta Nabla_w[-1] = Np.dot (Delta, Activations[-2].transpose ()) # note that the variable L in the loop below was used a little # differently to the notation in Chapter 2 of the book.  Here, # L = 1 means the last layer of neurons, L = 2 are the # Second-last layer, and so on. It's a renumbering of the # scheme in the book, used here to take advantage of the fact # that Python can us        e negative indices in lists. For L in Xrange (2, self.num_layers): Z = zs[-l] sp = Sigmoid_prime (z) delta = Np.dot (self . Weights[-l+1].transpose (), Delta) * SP NABLA_B[-L] = Delta Nabla_w[-l] = Np.dot (Delta, activations[ -l-1].transpose ()) return (Nabla_b, nabla_w) ... def cost_derivative (self, output_activations, y): "" "Retu        RN the vector of partial derivatives \partial c_x/\partial A for the output activations. ""    Return (OUTPUT_ACTIVATIONS-Y) def sigmoid (z): "" "the sigmoid function." " Return 1.0/(1.0+np.exp (z)) def sigmoid_prime (z): "" "DerivAtive of the sigmoid function. "" return sigmoid (z) * (1-sigmoid (z))

Problem

Applying a completely matrix-based reverse propagation method on a batch (Mini-batch)

In our implementation of the random gradient descent algorithm, we need to traverse the training sample in one batch (Mini-batch) in turn. We can also modify the inverse propagation algorithm so that it can calculate gradients for all training samples in a batch at the same time. We pass in a matrix (instead of a vector) at the input, and the columns of this matrix represent the vectors in this batch. In forward propagation, each node multiplies the input by multiplying the weight matrix, adding a bias matrix, and applying sigmoid functions to get the output, which is also calculated in a similar way when it is transmitted in reverse. Explicitly write this method of reverse propagation and modify network.py it so that it is calculated using this completely matrix-based method. The advantage of this approach is that it makes better use of the modern linear library and works faster than the loop. (for example, when solving a mnist classification problem similar to the one discussed in the previous chapter on my laptop, it can be up to twice times faster.) In practice, all formal reverse propagation algorithm libraries use this completely matrix-based approach or its variants.

In the next section we will cover "why the reverse propagation algorithm is efficient", so stay tuned!

"Hit Scir" public number
Editorial office: Guo Jiang, Li Jiaqi, Xu June, Li Zhongyang, Hulin Lin
Editor of the issue: Li Zhongyang

Neural network and deep Learning series Article 16: Reverse Propagation algorithm Code

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Neural network and deep Learning series Article 16: Reverse Propagation algorithm Code

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Neural network and deep Learning series Article 16: Reverse Propagation algorithm Code

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support