Plain English explained BP algorithm (reprint)

Source: Internet
Author: User

Recently in the deep study of things, a beginning to see the Wunda of the UFLDL tutorial, there is a Chinese version of the direct look, and later found that some places are not very clear, and to see the English version, and then find some information to see, only to find that the Chinese version of the translator in the translation of the ellipsis formula deduction process to supplement, But the addition is wrong, no wonder that there is a problem. The reverse propagation method is actually the foundation of the neural network, but a lot of people will encounter some problems when they are learning, or see the formula of large article feel as if it is difficult to retreat, in fact, is not difficult, is a chain of derivation law repeated use. If you do not want to see the formula, you can directly take the numerical value, the actual calculation, experience the process and then to deduce the formula, it will feel very easy.

When it comes to neural networks, you can see that this diagram should be no stranger:

This is a typical three-layer neural network basic structure, layer L1 is the input layer, layer L2 is the hidden layer, layer L3 is the hidden layer, we now have a bunch of data {x1,x2,x3,..., xn}, the output is a bunch of data {y1,y2,y3,..., yn}, Now ask them to do some kind of transformation in the hidden layer so that you get the data into the output you expect. If you want your output to be the same as the original input, then the most common self-coding model (Auto-encoder). One might ask, why is the input and output all the same? What's the use? In fact, the application is very broad, in the image recognition, text classification and so will be used, I will write a special Auto-encoder article to explain, including some variants and so on. If your output is different from the original input, then it is a very common artificial neural network, which is equivalent to letting the raw data through a map to get the output data we want, which is the topic we are going to talk about today.

This article directly give an example, bring into the numerical demonstration of the process of reverse propagation, the derivation of the formula until the next time write Auto-encoder, in fact, it is very simple, interested students can deduce their own next try:) (Note: This article assumes that you already understand the basic neural network composition, if not understand, can refer to poll written notes: [Mechine learning & Algorithm] Neural Network Foundation)

Suppose that you have such a network layer:

The first layer is the input layer, contains two neurons i1,i2, and the Intercept item B1, the second layer is the hidden layer, contains two neurons h1,h2 and intercept item B2, the third layer is the output O1,O2, each line is labeled WI is the layer and layer connection between the weight, activation function We default to sigmoid function.

Now give them an initial value, such as:

wherein, the input data i1=0.05,i2=0.10;

Output data o1=0.01,o2=0.99;

Initial weight w1=0.15,w2=0.20,w3=0.25,w4=0.30;

w5=0.40,w6=0.45,w7=0.50,w8=0.55

Target: Give input data I1,i2 (0.05 and 0.10) so that the output is as close as possible to the original output O1,o2 (0.01 and 0.99).

Step 1 Forward Propagation

1. Input layer--hidden layer:

Calculate the input weights of the neuron H1:

The output of the neuron H1 O1: (Here the activation function is the sigmoid function):

In the same vein, the output O2 of the neuron H2 can be calculated:

  

2. Hidden layer and output layer:

Calculates the values of the output layer neurons O1 and O2:

  

So the forward propagation process is over, we get the output value of [0.75136079, 0.772928465], and the actual value [0.01, 0.99] is still very far away, now we reverse propagation of errors, update weights, recalculate the output.

Step 2 Reverse Propagation

1. Calculate the total error

Total Error: (square error)

However, there are two outputs, so the error of O1 and O2 is calculated separately, and the total error is the sum of the two:

2. The weight update of the hidden layer and the output layer:

Taking the weight parameter W5 as an example, if we want to know how much influence W5 has on the overall error, we can use the whole error to find out the W5: (Chain Law)

The figure below is a more intuitive way to see how the error is transmitted in reverse:

Now let's calculate the value of each equation separately:

Calculation:

Calculation:

(This step is actually a derivative of the sigmoid function, relatively simple, you can deduce it yourself)

Calculation:

The last three are multiplied by:

This allows us to calculate the partial derivative of the overall error E (total) on the W5.

Looking back at the formula above, we found:

For ease of expression, it is used to indicate the error of the output layer:

Thus, the partial-derivative formula of the overall error E (total) on the W5 can be written as:

If the output layer error meter is negative, it can also be written as:

Finally, let's update the value of the W5:

(Which, is the learning rate, here we take 0.5)

Similarly, you can update W6,w7,w8:

3. Implied layer-to-hidden layer weights updated:

The method is in fact similar to the above, but there is a place to change, in the above calculation of the total error on the W5 bias, is from out (O1)-->net (O1)-->w5, but in the hidden layer between the weight of the update, is out (H1)-->net (H1)-- >W1, and Out (H1) will accept the error from the two places of E (O1) and E (O2), so two of this place is calculated.

Calculation:

Calculate First:

In the same vein, the calculated:

          

The sum of the two gets the total value:

Re-calculation:

Re-calculation:

Finally, the three are multiplied by:

To simplify the formula, use Sigma (H1) to indicate the error of the hidden layer element H1:

Finally, update the weights of the W1:

Similarly, the amount can be updated W2,W3,W4 weight:

So the error back-propagation method is completed, and finally we re-calculate the updated weights, continue to iterate, in this example, after the first iteration, the total error E (overall) from 0.298371109 to 0.291027924. After iterating 10,000 times, the total error is 0.000035085 and the output is [0.015912196,0.984065734] (the original input is [0.01,0.99]), which proves the effect is good.

Code (Python):

  1 #coding: Utf-8 2 Import random 3 import Math 4 5 # 6 # parameter explanation: 7 # "Pd_": Biased prefix 8 # "D_": Prefix of derivative 9 # "W_h O ": Index of the weighted coefficients of the hidden layer to the output layer #" W_ih ": Index of the weight factor of the input layer to the hidden layer one by one. Class Neuralnetwork:13 learning_rate = 0.5 + def __ Init__ (self, num_inputs, Num_hidden, num_outputs, hidden_layer_weights = none, Hidden_layer_bias = None, Output_layer_ weights = none, Output_layer_bias = none): self.num_inputs = num_inputs Self.hidden_layer = neur         Onlayer (Num_hidden, hidden_layer_bias) Self.output_layer = Neuronlayer (num_outputs, Output_layer_bias) 20 21 Self.init_weights_from_inputs_to_hidden_layer_neurons (hidden_layer_weights) Self.init_weights_from_hidde N_layer_neurons_to_output_layer_neurons (output_layer_weights)-def Init_weights_from_inputs_to_hidden_layer_ Neurons (self, hidden_layer_weights): Weight_num = 0 to H in range (Len (self.hidden_layer.neurons)): The For I in range (SELf.num_inputs): If not hidden_layer_weights:29 self.hidden_layer.neurons[h].weight S.append (Random.random ()) else:31 Self.hidden_layer.neurons[h].weights.append (Hidd En_layer_weights[weight_num]) Weight_num + = 1 + def init_weights_from_hidden_layer_neurons_to_ Output_layer_neurons (self, output_layer_weights): Weight_num = 0, the for O in range (len (Self.output_laye r.neurons)): PNs for h in range (Len (self.hidden_layer.neurons)): If not output_layer_weights                     : Self.output_layer.neurons[o].weights.append (Random.random ()) else:41  Self.output_layer.neurons[o].weights.append (Output_layer_weights[weight_num]) Weight_num + =         1 Inspect def (self): print ('------') print (' * Inputs: {} '. Format (self.num_inputs)) 47 Print ('------') 48         Print (' Hidden Layer ') self.hidden_layer.inspect ('------') print (' * OUTPU         T Layer ') Self.output_layer.inspect () print ('------') Feed_forward def (self, inputs): 56 Hidden_layer_outputs = Self.hidden_layer.feed_forward (inputs) return Self.output_layer.feed_forward (HID Den_layer_outputs) Train def (self, training_inputs, training_outputs): Self.feed_forward (training_in Puts) 61 62 # 1. Output neuron value of Pd_errors_wrt_output_neuron_total_net_input = [0] * Len (self.output_layer.neurons) Range (len (self.output_layer.neurons)): 65 66 #? E/?z? Pd_errors_wrt_output_neuron_total_net_input[o] = Self.output_layer.neurons[o].calculate_pd_error_wrt_total_ Net_input (Training_outputs[o]) 68 69 # 2.  The value of the hidden layer neuron pd_errors_wrt_hidden_neuron_total_net_input = [0] * Len (self.hidden_layer.neurons) for H in RAnge (Len (self.hidden_layer.neurons)): De/dy? =σ? E/?z? * z/?y? =σ? E/?z? * w??                 D_error_wrt_hidden_neuron_output = 0 for O in range (len (self.output_layer.neurons)): 76 D_error_wrt_hidden_neuron_output + = pd_errors_wrt_output_neuron_total_net_input[o] * self.output_layer.neurons [O].weights[h] 77 78 #? E/?z? = De/dy? * z?/? PD_ERRORS_WRT_HIDDEN_NEURON_TOTAL_NET_INPUT[H] = D_error_wrt_hidden_neuron_output * Self.hidden_layer.neuro Ns[h].calculate_pd_total_net_input_wrt_input () 80 81 # 3. Update output layer weight factor for the O in range (len (self.output_layer.neurons)): The W_ho in range (Len (Self.output_laye r.neurons[o].weights)): 84 85 #? E?/?w?? = ? E/?z? * z?/?w?? Pd_error_wrt_weight = pd_errors_wrt_output_neuron_total_net_input[o] * self.output_layer.neurons[o].cal Culate_pd_total_net_input_wrt_weight (W_HO) 87 88 #δw =α*? E?/?W? Self.output_layer.neurons[o].weights[w_ho]-= self. Learning_rate * Pd_error_wrt_weight 90 91 # 4. Update the weight factor of the hidden layer for h in range (Len (self.hidden_layer.neurons)): The W_ih in range (Len (self.hidden_lay er.neurons[h].weights)): 94 95 #? E?/?w? = ? E/?z? * z?/?w? Pd_error_wrt_weight = pd_errors_wrt_hidden_neuron_total_net_input[h] * self.hidden_layer.neurons[h].cal Culate_pd_total_net_input_wrt_weight (W_IH) 97 98 #δw =α*? E?/?w? SELF.HIDDEN_LAYER.NEURONS[H].WEIGHTS[W_IH]-= self. Learning_rate * PD_ERROR_WRT_WEIGHT100 101 def calculate_total_error (Self, training_sets): 102 Total_error = 01             Training_sets for T in range (len ()): 104 training_inputs, training_outputs = training_sets[t]105 Self.feed_forward (training_inputs) 106 for O in range (len (training_outputs)): 107 Total_ Error + = Self.output_layer.neUrons[o].calculate_error (Training_outputs[o]) 108 return total_error109 class neuronlayer:111 def __init__ (          Self, num_neurons, bias): 112 113 # Neurons in the same layer share a intercept entry b114 Self.bias = bias if bias else random.random () 115 116 Self.neurons = []117 for i in Range (num_neurons): 118 self.neurons.append (Neuron (self.bias)) 11 9 def Inspect (self): 121 print (' Neurons: ', Len (self.neurons)) 122 for n in range (len (self.neurons)): 123 Print (' Neuron ', N) 124 for W in range (len (self.neurons[n].weights)): "Print" ( ' Weight: ', self.neurons[n].weights[w]) 126 print (' Bias: ', Self.bias) 127 def feed_forward (self, Inpu TS): 129 outputs = []130 for neuron in self.neurons:131 outputs.append (Neuron.calculate_output ( Inputs)) outputs133 return 134 def get_outputs (self): 135 outputs = []136 for neuron in self            . neurons:137 Outputs.append (neuron.output) 138 return outputs139-Class neuron:141 def __init__ (self, bias): 142 Self.bias = bias143 Self.weights = []144 145 def calculate_output (self, inputs): 146 self.inputs = Inpu ts147 self.output = Self.squash (Self.calculate_total_net_input ()) 148 return self.output149 def Cal Culate_total_net_input (self): 151 total = 0152 for I in range (len (self.inputs)): 153 Total + = SE Lf.inputs[i] * self.weights[i]154 return total + self.bias155 156 # activation function sigmoid157 def squash (self, Total_ Net_input): 158 return 1/(1 + math.exp (-total_net_input)) 159 161 def CALCULATE_PD_ERROR_WRT_TOTAL_NET_INP UT (self, target_output): 162 return Self.calculate_pd_error_wrt_output (target_output) * self.calculate_pd_total_net _input_wrt_input (); 163 164 # Each Neuron's error is calculated by the squared difference Formula 165 def calculate_error (self, target_output): 166 return 0. 5 * (Target_output-selF.output) * * 2167 168 169 def calculate_pd_error_wrt_output (self, target_output): Return-(TARGET_OUTPU  T-self.output) 171 172 173 def calculate_pd_total_net_input_wrt_input (self): 174 return Self.output * (1- Self.output) 175 176 177 def calculate_pd_total_net_input_wrt_weight (self, Index): 178 return Self.inputs[index ]179 180 181 # Examples in this article: 182 183 nn = neuralnetwork (2, 2, 2, hidden_layer_weights=[0.15, 0.2, 0.25, 0.3], hidden_layer_bias=0. output_layer_weights=[0.4, 0.45, 0.5, 0.55], output_layer_bias=0.6) 184 for I in range (10000): 185 Nn.train ([0.05, 0 .1], [0.01, 0.09]) 186 print (I, round (Nn.calculate_total_error ([[[[0.05], 0.1], [0.01, 0.09]]), 9) 187 188 189 #另外一个例子, can be     To comment out the above example and run it again: 191 # training_sets = [192 # [[0, 0], [0]],193 # [[0, 1], [1]],194 # [[1], 0], [1]],195 # [[1, 1], [0]]196 #]197 198 # nn = neuralnetwork (len (Training_sets[0][0]), 5, Len (Training_sets[0][1])) 199 # for I in R Ange (10000): $ # TRAining_inputs, training_outputs = Random.choice (training_sets) 201 # Nn.train (training_inputs, training_outputs) 202 # Print (I, Nn.calculate_total_error (training_sets))

  

The end of the last write here, and now do not use latex to edit the mathematical formula, would have directly wanted to write on the draft paper and then scanned the transmission, but feel too much impact on the reading experience. You will later use the formula Editor and then re-edit the formula again. Prudent use of the sigmoid activation function, there are several different activation functions can be selected, the specific reference [3], and finally recommend an online demo neural network change URL: http://www.emergentmind.com/ Neural-network, you can fill in the input and output, and then watch each iteration weight changes, very fun ~ if there are errors or do not understand the welcome message:)

Reference documents:

1.Poll notes: [Mechine learning & Algorithm] Neural network fundamentals (http://www.cnblogs.com/maybe2030/p/5597716.html#3457159)

2.rachel_zhang:http://blog.csdn.net/abcjennifer/article/details/7758797

3.http://www.cedar.buffalo.edu/%7esrihari/cse574/chap5/chap5.3-backprop.pdf

4.https://mattmazur.com/2015/03/17/a-step-by-step-backpropagation-example/

Plain English explained BP algorithm (reprint)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.