So far, supervised learning of neural networks has been described, that is, learning samples are labeled. Now let's assume we have a training set without tags, where. an automatic encoder is a neural network that uses reverse propagation for unsupervised learning. The purpose of learning is to make the output value equal to the input value. the following is an automatic Encoder:
The automatic encoder tries to learn a function. In other words, it tries to approach an equality function so that the output of the function is very similar to that of the input function. For example, if the input is a gray value from an image (100 pixels in total), there is a hidden layer node in the layer. note the output. because there are only 50 hidden layer nodes, the network must learn the input compression representation, that is, the element vector is given with the hidden layer node activation value, it needs to reconstruct the gray value input of 100 pixels. if the input is completely random, This compression learning task will be very difficult, but the data has a certain structure, such as some input features associated with each other, then this algorithm can discover these associations. In fact, the automatic encoder always learns a lower-dimensional representation.
We believe that the above discussion relies on a small number of nodes in the hidden layer. However, even if the number of nodes on the hidden layer is large (maybe more than the number of input pixels), by adding certain constraints to the network, we can still find structures of interest. In particular, sparse constraints are imposed on Hidden Layer Nodes.
In short, if the output of a neuron is 1, we think that the neuron is activated. If the output of a neuron is 0, we think that the neuron is inhibiting. We want to constrain neurons so that they are in the suppression state most of the time. Assume that the activation function is the sigmoid function.
In the above network, it indicates the activation value of the hidden layer (Layer 2) node. However, this expression does not show which input is given, resulting in this activation value. Therefore, describe the activation value of the hidden layer node when the input is used. Further
Indicates the average activation value of the hidden layer unit node (for each input sample, the node outputs an activation value, so the average value of all the samples corresponding to the node is obtained ). Then constraints:
Here, it is a sparse parameter, which is usually set to a very small value close to 0 (for example ). In other words, we want the average activation value of neurons in the hidden layer to be close to 0.05. To meet this constraint, the activation value of the hidden layer node must mostly be close to 0.
To achieve this goal, we will punish those with large deviations from sparse parameters when optimizing the target function. We will use the following penalty items:
The number of neurons in the hidden layer. The constraint can also be written as follows:
Among them, it is easy to verify. If, and, if the difference is large, the item will become larger. Meets the punishment method we want. For example, we make
Changes:
It can be seen that at the time, the function reaches the minimum value of 0. But when it is far from the left or right side (0.2), the function value increases significantly.
Therefore, our overall cost function (loss function) becomes:
Here is the previous blog
Automatic sparse Encoding Based back propagation algorithm (BP)
Is used to control the weight of sparse penalty items. In fact, it also depends on, because it is the average activation value of the hidden layer node, and the average activation value needs to be calculated first all the activation values of the node, the activation value depends on (the training set must be ).
Add the penalty item to the cost function and calculate the derivative. You only need to make minor changes in the original code. For example, in the back-propagation layer 2nd, we have calculated the following in the previous blog:
Use the following statement instead:
Note that to calculate this item, you need to know. Therefore, in the process of feed-forward propagation, the average activation value of all neurons in the hidden layer is calculated first. When the training set is very small, the activation and average values of all neurons are stored in the memory during feed-forward propagation, then, the calculated activation values can be extracted during Reverse propagation. If the training set is too large to store intermediate results in less memory, you can calculate the activation value of each node for all samples in the feed-forward propagation process to calculate the average activation value of the node, when calculating the average activation value of the next node, You can discard the activation value of the previous node for all samples and store its average activation value. After that, before reverse propagation, it is necessary to obtain the activation value of neurons through feed-forward propagation again, which will reduce the computing efficiency, but ensure that the memory is sufficient.
The pseudocode for the entire gradient descent solution process is
Automatic sparse Encoding Based back propagation algorithm (BP)
But the target function is changed here.
Gradient test of automatic sparse Encoding
The derivative test method described in can verify whether the code is correct or not.
Learning Source: http://deeplearning.stanford.edu/wiki/index.php/Autoencoders_and_Sparsity
automatic encoder and sparsity for sparse automatic encoding