For the 6.12 mentioned problem of deepening the depth of the network, (gradient diffuse local optimization, etc.) can use the stack Autoencoder method to avoid
Stack Autoencoder is a layered greedy (greedy layer-wise Training) training method, the main idea of layer by step greedy is to train only one layer of the network at a time, that is, first training a network containing only a hidden layer, Start training a network with two hidden layers, and so on, only after this layer of network training has ended. At each step, the pre-trained front layer is fixed and then the first layer is added (that is, the output of the pre-trained outputs as input). Each layer of training can be supervised (for example, the classification error of each step as the objective function), but more often unsupervised methods (such as automatic encoders) are used. The weights obtained by these layers of individual training are used to initialize the weights of the final (or all) deep network, and then "fine-tune" the entire network (i.e. put all the layers together to optimize the training errors on the tagged training set).
The success of a layered greedy training method is attributed to the following:
1) Data acquisition: Although the cost of acquiring tagged data is expensive, it is easy to get a large amount of untagged data. The potential of the self-learning approach (self-taught learning) is that it can learn better models by using a large number of untagged data. The method uses untagged data to learn the best initial weights for all layers (excluding the final classification layer used for prediction labels). Compared with the purely supervised learning method, this self-learning method can utilize much more data and be able to learn and discover patterns that exist in the data. Therefore, this method can usually improve the performance of the classifier.
2) Better local extremum: When the network is trained with untagged data, the initial weights of each layer will be in a better position in the parameter space than the random initialization. Then we can further fine-tune the weights from these locations. From the experience point of view, starting with these positions is more likely to converge to a better local extremum point, because the untagged data already provides a priori information about the patterns contained in the input data.
Stack Autoencoder is a layered greedy training algorithm.
6.13 neurons Networks Stack Auto Encoder