1. Neural network
This is a common neural network diagram:
This is the basic composition of a common three-layer neural network, Layer L1 is the input layer, Layer L2 is the hidden layer, Layer L3 is the hidden layer, when we enter data such as X1,X2,X3, through the hidden layer of calculation, conversion, output your expectations, when your input and output is the same time, Become a Auto-encoder model, and when your input and output are inconsistent, which is what we often call artificial neural networks.
2. How to calculate the transmission
First we build a simple network layer as an example:
In this network layer there are
First layer input layer: inside contains neuron i1,i2, intercept: B1, Weight: W1,w2,w3,w4 The second layer is hidden layer: inside contains h1,h2, intercept: B2, Weight: W5,w6,w7,w8 The third layer is the output layer: it contains O1,O2
We use sigmoid as the activation function
Suppose we enter data i1:0.02 i2:0.04 intercept b1:0.4 b2:0.7 expected output data o1:0.5
The unknown is the weight w1,w2,w3,w4,w5,w6,w7,w8
Our aim is to calculate the W1,w2,w3....w8 weight value for the desired value of the o1:0.5 o2:0.9.
First, if you construct a weighted w1,w2,w3.....w8 value, you get the best w1,w2,w3....w8 weight by calculating
The initial value of the weight:
w1=0.25
w2=0.25
w3=0.15
w4=0.20
w5=0.30
w6=0.35
w7=0.40
w8=0.35
2.1 Forward Propagation
2.1.1 Input layer to hidden layer
NET (H1) =w1*i1+w2*i2+b1=0.25*0.02+0.25*0.04+0.4=0.005+0.01+0.4=0.415
The activation function of the neuron H1 to the output H1 is sigmoid
Out (H1) =1/(1+e^ (-net (H1)) =1/(1+0.660340281) =0.602286177
Similarly, we can get the value of Out (H2)
NET (H2) =w3*i1+w4*i2+b1=0.15*0.02+0.20*0.04+0.4=0.003+0.008+0.4=0.411
Out (H2) =1/(1+e^ (-net (H2)) =1/(1+0.662986932) =0.601327636
2.1.2 from hidden layer to output layer
Computes the neuron O1 of the output layer, the value of the O2, and the calculation method is similar to the output layer to the hidden layer
NET (O1) =w5*h1+w6*h2+b2=0.3*0.602286177+0.35*0.601327636+0.7=0.180685853+0.210464672+0.7=1.091150525
Out (O1) =1/(1+e^ (-net (O1)) =1/(1+0.335829891) =0.748598311
Empathy
NET (O2) =w7*h1+w8*h2+b2=0.4*0.602286177+0.35*0.601327636+0.7=0.240914471+0.210464672+0.7=1.151379143
Out (O2) =1/(1+e^ (-net (O2)) =1/1.316200383=0.759762733
o1:0.748598311 o2:0.759762733 distance from our desired o1:0.5 o2:0.9.
2.2 Calculating the total error
Formula:
That is, we need to calculate each desired error and
E (total) = e (o0) +e (O1) = (1/2) * (0.748598311-0.5) ^2+ (1/2) * (0.759762733-0.9) ^2=0.01545028+0.009833246=0.025283526
2.3 Reverse Propagation
The impact of each weight on the error, we can see the following figure more intuitive to understand the reverse transmission of errors
2.3.1 The power value of hidden layer to output layer
The weighted value of the hidden layer to the output layer, in the example above is W5,w6,w7,w8
We take the W6 parameter as the example, calculates the W6 to the overall error the influence to have how big, may use the whole error to the W6 parameter derivation:
Obviously there is no W6 formula for Etotal, we only have W6 calculation formula for net (O1)
But according to the chain law of the partial derivative, we can multiply the derivation formula of our existence by the chain type.
Let's calculate the bias of each formula:
Calculation
:
This is the derivative of a composite function.
If {\displaystyle F} and {\displaystyle g} are two related to {\displaystyle x}-guided functions, the derivative of the composite function {\displaystyle (F\circ g) (x)} is {\displaystyle ( F\circ g) ' (x)} is:
{\displaystyle (F\circ g) ' (x) =f ' (g (x)) G ' (x).}
Here g (x) =target (O1) the-out (O1) G ' (x) =-1
=-(0.5-0.748598311) =0.248598311
Calculation
Known
Let's deduce
:
Or the derivation of compound functions
The result of the final derivation:
=0.748598311* (1-0.748598311) =0.251401689*0.748598311=0.18819888
Calculation
That is, net (O1) ' =out (H2) =0.601327636
Finally our formula
= *out (H2)
=0.248598311*0.18819888*0.601327636=0.028133669
The weight of 2.3.1.1 and the new W6.
w6=w6-x*
where x is what we often say learning rate, set X learning rate to 0.1 so the weight of the new W6 is
0.35-0.1*0.028133669=0.347186633
For the same reason, we can also calculate the weight of the new W5,w6,w7,w8
But how to calculate and with the new w1,w2,w3,w4 weight?
The weight of 2.3.2 hidden layer and new
The approximate algorithm is similar to the previous one, as shown in the following figure:
Calculation formula:
2.3.2.1 calculation
For Out (H1) Etotal does not rely on out (H1) calculations, it needs to split the total into two Eo1 and EO2 to compute
The formula is as follows:
Then deduce the formula:
Calculation
Similarly, you can calculate
2.3.2.2 Calculation
2.3.2.3 calculation
The last three are multiplied by:
2.3.2.4 Integral formula
According to the previous formula, we can deduce the final formula
The weight of 2.3.2.4 and the new W1.
The same as the weight of the calculated W6:
Set the learning rate and calculate the weight value of the W1
3. Calculate get the best weight
We will get the new weight of the iterative, iterative a certain number of times until close to expectations o1:0.5 o2:0.9, the weight w1...w8, is the weight required.