Stanford UFLDL Tutorial Using reverse conduction thought to take the derivative

Stanford UFLDL Tutorial Using reverse conduction thought to take the derivative _stanford

Last Update:2018-08-22 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Derivation of Contents with reverse conduction thought [hide] 1 Introduction 2 Example 2.1 Example 1: target function of weight matrix in sparse coding 2.2 Example 2: Smooth terrain in sparse coding L1 sparse penalty Function 2.3 example 3:ica reconstruction cost 3 Chinese translator introduction

In the section of the reverse conduction algorithm, we introduce the method of using the reverse conduction algorithm to find the gradient in the sparse self encoder. It has been proved that the method of combining the reverse conduction algorithm with the matrix operation is very powerful and intuitive for calculating the gradient of complex matrix function (from matrix to real number, or notation as: from).

First, we look back at the idea of reverse conduction, in order to better fit our purposes, the slightly modified to appear in the following: to the NL layer (the last layer) of each output cell I, so that the J (z) is our "objective function" (explained later). Yes, for each node in the L layer, let's calculate the partial derivative we want.

Symbol recap: L is the layer number of the neural network the number of the NL L-layer neurons is the L-layer node to the first (L + 1) layer J The weights of the nodes are the inputs to the L-Tier I units is the Hadamard product of the matrix or the product of the element by the first L node. to the Matrix A and B, their product is a matrix, that is, F (l) is the excitation function of each element in the L-layer

Let's say we have a function F, f, which generates a real number with the matrix x as the parameter. We want to compute the F-X gradient with the reverse conduction idea, that is. The general idea is to consider the function f as a multilayer neural network and use the reverse conduction idea to find the gradient.

To achieve this idea, we take the objective function of J (z), which produces the value F (X) when calculating the output of the last layer of neurons. For the middle layer, we will select the Excitation function f (l).

As we'll see later, using this method, we can easily calculate the derivative of input X and any weight in the network.

Example

To illustrate how to use the reverse conduction idea to compute the derivative of the input, we'll take two functions in the sparse-coded chapter in Example 1, Example 2. In Example 3, we use a function in the Independent component Analysis section to illustrate the method of using this idea to compute the bias of weights, and how to handle the weights that are bundled or duplicated in this particular case.

Example 1: objective function of weight matrix in sparse coding

Looking back at sparse coding, when given a characteristic matrix s, the objective function of the weight matrix A is:

We want to find the gradient of F for a, that is. Because the objective function is the sum of two equations with a, its gradient is the sum of the gradients of each formula. The gradient of the second item is easy to find, so we only consider the gradient of the first item.

The first, which can be seen as an example of a neural network with an input of s, is computed in four steps, with text and graphic descriptions as follows: The weight of a as the first layer to the second. The second layer of excitation is reduced by x, and the second layer uses the unit excitation function. The result is transferred to the third layer by the unit weight. The square function is used as the excitation function in the third layer. Add all the incentives in the third layer.

The weights and excitation functions of the network are shown in the following table: Layer weight excitation function f 1 A f (zi) = Zi (unit function) 2 I (unit vector) f (zi) = Zi−xi 3 N/A

In order to make J (Z (3)) = F (x), we can make.

Once we look at F as a neural network, gradients are easy to find--using reverse conduction: the derivative of the layer excitation function f ' Delta this layer input Z 3 f ' (zi) = 2zi f ' (zi) = 2zi as−x 2 f ' (zi) = 1 as 1 f ' (zi) = 1 s

Example 2: Smooth terrain L1 sparse penalty function in sparse coding

The smoothing terrain L1 sparse penalty function for S is reviewed in the Sparse coding section:

Where V is the grouping matrix, S is the feature matrix, and ε is a constant.

We hope. As above, we look at this as an example of a neural network:

The weights and excitation functions of the network are shown in the following table: Layer weight excitation function f 1 I 2 V f (zi) = Zi 3 I f (zi) = Zi +ε4 N/A

To make J (Z (4)) = F (x), we can make.

Once we think of F as a neural network, the gradient becomes very easy to compute--using the reverse conduction to get: the derivative of the layer excitation function F ' Delta this layer input Z 4 (vsst +ε) 3 F ' (zi) = 1 vsst 2 f ' (zi) = 1 SsT 1 F ' (zi) = 2zi s

Example 3:ica rebuilding costs

Review of the Independent component Analysis (ICA) section reconstruction costs: where w is the weight matrix, X is input.

We would like to compute the derivative of the weighting matrix, not the derivative of the first two examples for the input. But we still deal with it in a similar way, treating it as an example of a neural network:

The weights and excitation functions of the network are shown in the following table: Layer weight excitation function f 1 W f (zi) = Zi 2 WT f (zi) = Zi 3 I f (zi) = zi−xi 4 N/A

To make J (Z (4)) = F (x), we can make.

Now that we can consider F as a neural network, we can compute the gradient. However, the problem we face now is that W has appeared in the network two times. Fortunately, it can be shown that if w appears multiple times on the network, the gradient for W is a simple addition to the gradient of each W instance in the network (you need to give yourself a rigorous proof of this fact to convince yourself). Know this, after we will first compute the delta: the derivative of the layer excitation function f ' Delta this layer input Z 4 F ' (zi) = 2zi f ' (zi) = 2zi (wtwx−x) 3 F ' (zi) = 1 wtwx 2 f ' (zi) = 1 Wx 1 F ' (zi) = 1 x

To compute the gradient for W, first compute the gradient for each W instance in the network.

For WT:

For W:

Finally, we get the final gradient for W, and note that we need to transpose the WT gradient to get a gradient about W (excuse me for abusing the symbol here):

In Chinese and English the reverse conduction backpropagation sparse coding sparse coding weight matrix weight Matrix objective function Objective smoothing Terrain L1 sparse penalty function smoothed topographic the L1 Sparsi Ty penalty reconstruction costs reconstruction cost sparse self encoder sparse autoencoder gradient Gradient neural network neural network neuron neuron excitation activation excitation letter Independent component analysis of number activation function independent component Analytical unit excitation function identity activation functions squared function square functions grouping matrices Grouping matrix feature matrices

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Stanford UFLDL Tutorial Using reverse conduction thought to take the derivative _stanford

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Stanford UFLDL Tutorial Using reverse conduction thought to take the derivative _stanford

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support