Stanford UFLDL Tutorial Using reverse conduction thought to take the derivative _stanford

Source: Internet
Author: User
Derivation of Contents with reverse conduction thought [hide] 1 Introduction 2 Example 2.1 Example 1: target function of weight matrix in sparse coding 2.2 Example 2: Smooth terrain in sparse coding L1 sparse penalty Function 2.3 example 3:ica reconstruction cost 3 Chinese translator introduction

In the section of the reverse conduction algorithm, we introduce the method of using the reverse conduction algorithm to find the gradient in the sparse self encoder. It has been proved that the method of combining the reverse conduction algorithm with the matrix operation is very powerful and intuitive for calculating the gradient of complex matrix function (from matrix to real number, or notation as: from).


First, we look back at the idea of reverse conduction, in order to better fit our purposes, the slightly modified to appear in the following: to the NL layer (the last layer) of each output cell I, so that the J (z) is our "objective function" (explained later). Yes, for each node in the L layer, let's calculate the partial derivative we want.


Symbol recap: L is the layer number of the neural network the number of the NL L-layer neurons is the L-layer node to the first (L + 1) layer J The weights of the nodes are the inputs to the L-Tier I units is the Hadamard product of the matrix or the product of the element by the first L node. to the Matrix A and B, their product is a matrix, that is, F (l) is the excitation function of each element in the L-layer

Let's say we have a function F, f, which generates a real number with the matrix x as the parameter. We want to compute the F-X gradient with the reverse conduction idea, that is. The general idea is to consider the function f as a multilayer neural network and use the reverse conduction idea to find the gradient.

To achieve this idea, we take the objective function of J (z), which produces the value F (X) when calculating the output of the last layer of neurons. For the middle layer, we will select the Excitation function f (l).

As we'll see later, using this method, we can easily calculate the derivative of input X and any weight in the network.


Example

To illustrate how to use the reverse conduction idea to compute the derivative of the input, we'll take two functions in the sparse-coded chapter in Example 1, Example 2. In Example 3, we use a function in the Independent component Analysis section to illustrate the method of using this idea to compute the bias of weights, and how to handle the weights that are bundled or duplicated in this particular case.


Example 1: objective function of weight matrix in sparse coding

Looking back at sparse coding, when given a characteristic matrix s, the objective function of the weight matrix A is:


We want to find the gradient of F for a, that is. Because the objective function is the sum of two equations with a, its gradient is the sum of the gradients of each formula. The gradient of the second item is easy to find, so we only consider the gradient of the first item.


The first, which can be seen as an example of a neural network with an input of s, is computed in four steps, with text and graphic descriptions as follows: The weight of a as the first layer to the second. The second layer of excitation is reduced by x, and the second layer uses the unit excitation function. The result is transferred to the third layer by the unit weight. The square function is used as the excitation function in the third layer. Add all the incentives in the third layer.


The weights and excitation functions of the network are shown in the following table: Layer weight excitation function f 1 A f (zi) = Zi (unit function) 2 I (unit vector) f (zi) = Zi−xi 3 N/A

In order to make J (Z (3)) = F (x), we can make.

Once we look at F as a neural network, gradients are easy to find--using reverse conduction: the derivative of the layer excitation function f ' Delta this layer input Z 3 f ' (zi) = 2zi f ' (zi) = 2zi as−x 2 f ' (zi) = 1 as 1 f ' (zi) = 1 s


So


Example 2: Smooth terrain L1 sparse penalty function in sparse coding

The smoothing terrain L1 sparse penalty function for S is reviewed in the Sparse coding section:

Where V is the grouping matrix, S is the feature matrix, and ε is a constant.

We hope. As above, we look at this as an example of a neural network:


The weights and excitation functions of the network are shown in the following table: Layer weight excitation function f 1 I 2 V f (zi) = Zi 3 I f (zi) = Zi +ε4 N/A


To make J (Z (4)) = F (x), we can make.

Once we think of F as a neural network, the gradient becomes very easy to compute--using the reverse conduction to get: the derivative of the layer excitation function F ' Delta this layer input Z 4 (vsst +ε) 3 F ' (zi) = 1 vsst 2 f ' (zi) = 1 SsT 1 F ' (zi) = 2zi s


So


Example 3:ica rebuilding costs

Review of the Independent component Analysis (ICA) section reconstruction costs: where w is the weight matrix, X is input.

We would like to compute the derivative of the weighting matrix, not the derivative of the first two examples for the input. But we still deal with it in a similar way, treating it as an example of a neural network:


The weights and excitation functions of the network are shown in the following table: Layer weight excitation function f 1 W f (zi) = Zi 2 WT f (zi) = Zi 3 I f (zi) = zi−xi 4 N/A

To make J (Z (4)) = F (x), we can make.

Now that we can consider F as a neural network, we can compute the gradient. However, the problem we face now is that W has appeared in the network two times. Fortunately, it can be shown that if w appears multiple times on the network, the gradient for W is a simple addition to the gradient of each W instance in the network (you need to give yourself a rigorous proof of this fact to convince yourself). Know this, after we will first compute the delta: the derivative of the layer excitation function f ' Delta this layer input Z 4 F ' (zi) = 2zi f ' (zi) = 2zi (wtwx−x) 3 F ' (zi) = 1 wtwx 2 f ' (zi) = 1 Wx 1 F ' (zi) = 1 x

To compute the gradient for W, first compute the gradient for each W instance in the network.

For WT:

For W:

Finally, we get the final gradient for W, and note that we need to transpose the WT gradient to get a gradient about W (excuse me for abusing the symbol here):


In Chinese and English the reverse conduction backpropagation sparse coding sparse coding weight matrix weight Matrix objective function Objective smoothing Terrain L1 sparse penalty function smoothed topographic the L1 Sparsi Ty penalty reconstruction costs reconstruction cost sparse self encoder sparse autoencoder gradient Gradient neural network neural network neuron neuron excitation activation excitation letter Independent component analysis of number activation function independent component Analytical unit excitation function identity activation functions squared function square functions grouping matrices Grouping matrix feature matrices

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.