This is an article published in early 2008.
This paper mainly discusses the denosing Autoencoder to learn the intermediate features of robust. In this way, the weights of the neural network can be initialized by using this method. This is equivalent to an unsupervised learning approach to training neural networks.
When we use neural networks to solve a variety of identification tasks, if we want the performance of the network better, we need a deeper or more wider neural network to model, the model of a more complex distribution. After the network becomes deep, how to train is a very important problem, if the training is not good, the performance of deep network really is inferior to shallow layer of neural network.
On the road to training deep network solutions, there are existing methods:
1. Random initialization weights, but this effect is very bad, the network is easy stuck in poor solutions
2. Use stacking's restricted Boltzmann machine pre-training network and then fine-tune with Up-down.
3. Initialize the network weights using the stacking automatic encoder, and then fine-tune with gradient descent.
The basic Autoencoder diagram in Method 3 is this:
Now the question is: can we improve it?? Is it more representative of the intermediate features that it learns? (That is, you can learn the intermediate features that are invariant to the input).
The noise reduction Encoder is presented in this paper. Its main idea is: given an input X, first make certain destroy, get corrupted x, then use it to learn the intermediate features to reconstruct input.
Improved Denosing Autoencoder
Then we can use it to train the initial weights of the network on a level-by-layer basis.
Specific process
1. Training the weight of the first layer: given the input X, add noise to get the-X, and then use this autoencoder to get the first layer of weight;
2. Training the weight of the second layer: fixed the weight of the first layer, then given the input x to get the first layer of output Y, and then the y as the original input of the noise reduction Encoder, and then on the basis of the Y noise, got the-y, and then use Autoencoder to get the second layer of the initial weight;
3. Training the weights of the third tier: fixed the weight of the first two layers, and then given the input x, the second layer of output z, and then the z as the original input of the noise reduction Encoder, the basic noise of Z, ...., the initial weight of the third layer is obtained;
Wait a minute......
This changed, the initial weight of the entire network training completed.
One thing to note is that when we train the weights in the back layers, our input x is not noisy, we just use the output of the previous layer as the original input of the noise-cancelling encoder, adding noise to it;
In this paper, the noise Reduction Encoder is presented from different angles.
Include: What manifold ah, what information theory, generate models and so on related things, I looked at again, did not see how to understand, need very deep mathematics knowledge, statistical knowledge ah, so did not go deep to see;
In this paper, the validity of the experiment is proved.
In addition, the references in this paper are of great value.
Reference: Extracting and composing robust features with denosing autoencoders thesis;
Extracting and composing robust features with denosing autoencoders corresponding PPT
Extracting and composing robust features with denosing Autoencoders thesis