Connect
9. Common models or methods of Deep Learning
9.1 AutoEncoder automatic Encoder
One of the simplest ways of Deep Learning is to use the features of artificial neural networks. Artificial Neural Networks (ANN) itself are hierarchical systems. If a neural network is given, let's assume that the output is the same as the input, and then train and adjust its parameters to get the weight in each layer. Naturally, we get several different representations of input I (each layer represents a representation), which are features. An automatic encoder is a neural network that can reproduce the input signal as much as possible. To achieve this recurrence, the automatic encoder must capture the most important factors that can represent the input data, just like PCA, to find the main components that can represent the original information.
The specific process is described as follows:
1) Learning Features with unsupervised learning for given unlabeled data:
In our previous neural networks, for example, in the first figure, the input sample has a label (input, target). In this way, based on the current output and target (label) the difference between them changes the parameters of the previous layers until convergence. But now we only have unlabeled data, that is, the picture on the right. How can this error be obtained?
For example, if we input into an encoder, we will get a code, which is a representation of input. How do we know that this code represents input? We add a decoder. In this case, decoder will output a message. If the output information is very similar to the input signal (ideally the same ), obviously, we have reason to believe that this code is reliable. Therefore, by adjusting the parameters of encoder and decoder, we can minimize the refactoring error. At this time, we get the first representation of the input signal, that is, the encoding code. Because there is no tag data, the source of the error is to compare it with the original input after direct reconstruction.
2) generate features through the encoder and then train the next layer. Layer-by-layer training:
Then we get the code at the first layer. The minimum reconstruction error makes us believe that this code is a good expression of the original input signal, or it is far-fetched, it is exactly the same as the original signal (the expression is different, reflecting a thing ). There is no difference between the training method at the second layer and the first layer. We regard the code output at the first layer as the input signal at the second layer, and also minimize the reconstruction error, the second layer parameter is obtained, and the code entered in the second layer is obtained, that is, the second expression of the original input information. The other layers can be processed in the same way (training this layer, the parameters of the front layer are fixed, and their decoder is useless and does not need it ).
3) Supervised fine-tuning:
After the above method, we can get a lot of layers. As for the number of layers required (or the depth required, there is no scientific evaluation method at present), it needs to be tested and adjusted by yourself. Each layer gets different expressions of the original input. Of course, we think it is more abstract and better, just like a human's visual system.
At this point, the AutoEncoder cannot be used to classify data, because it has not learned how to link an input and a class. It just learns how to reconstruct or reproduce its input. Or, it just learns to obtain a feature that can well represent the input, which can represent the original input signal to the maximum extent. To achieve classification, we can add a classifier (such as Rogers regression and SVM) to the top encoding layer of AutoEncoder ), then, training is performed using the standard multi-layer Neural Network supervised training method (gradient descent method.
That is to say, at this time, we need to input the feature code at the last layer to the final classifier, and fine-tune it through labeled samples and supervised learning. There are also two types, one is to adjust only the classifier (black part ):
Another one: fine-tune the entire system through a tag sample: (if there is enough data, this is the best. End-to-end learning)
Once supervised training is completed, the network can be used for classification. The top layer of the neural network can be used as a linear classifier, and then we can replace it with a classifier with better performance.
During the study, we can find that adding the features obtained by automatic learning to the original features can greatly improve the accuracy, and even make the classification problem better than the current best classification algorithm!
There are some variants of AutoEncoder. Here we will briefly introduce two:
Sparse AutoEncoder Sparse automatic Encoder:
Of course, we can add some constraints to obtain new Deep Learning methods, such: if the Regularity limit of L1 is added on the basis of AutoEncoder (L1 is mainly used to restrict most nodes in each layer to 0, and only a few nodes are not 0, this is the source of the Sparse name). We can get the Sparse AutoEncoder method.
For example, it is to restrict the sparse Expression code obtained each time. Because sparse expressions are often more effective than other expressions (the brain seems to be like this. Some input only stimulates some neurons, and most of the other neurons are restrained ).
Denoising AutoEncoders automatic noise reduction Encoder:
The auto-Encoder DA is based on the auto-encoder, and training data is added to noise. Therefore, the auto-Encoder must learn to remove this noise and obtain real input without noise pollution. Therefore, this forces the encoder to learn more robust expression of the input signal, which is also the reason for its strong generalization ability than the general encoder. DA can be trained using a gradient descent algorithm.
Continue
Source: http://blog.csdn.net/zouxy09/article/details/8775524