Deep Learning: 16 (deep networks)

Source: Internet
Author: User

 

This section describes how to use building deep networks for classification in http://deeplearning.stanford.edu/wiki/index.php/ufldl_tutorial.pdf. Divided into the following two parts:

  1. From Self-taught to deep networks:

From the previous introduction to self-taught Learning (Deep Learning: Fifteen (self-taught LearningExercise)) We can see that the ML method is completely unsupervised in terms of feature extraction. This article will focus on the above and continue to fine-tune the network parameters with supervised methods, in this way, we can get better results. The structure of combining the two steps of self-taught learning is as follows:

Obviously, the above is a multi-layer neural network with three layers.

Generally, the preceding unsupervised model parameters can be used as the initialization values of supervised learning parameters. In this way, when we use a large amount of labeled data, you can use gradient descent and other methods to continue optimizing the parameters. Because of the initialization parameters just now, the optimization results can generally converge to a better local optimal solution. If it is a parameter value of the random initialization model, it is generally difficult to converge to a local good value in a multi-layer neural network, because the system functions of the multi-layer neural network are non-convex.

So when should I use the fine-tuning Technology to adjust the result of unsupervised learning? Only a large number of labeled samples can be used. Fine-tuning technology is not suitable when we have a large number of unlabeled samples but a small number of labeled samples. If we do not want to use the fine-tuning technology, we should adopt cascade expressions in the design of the layer-3 classifier, that is, the learned results are input together with the original feature values. Of course, if fine-tuning technology is adopted, the effect will be better and you do not need to continue to use cascading features for expression.

 

  2. Summary of deep networks:

If a multi-layer neural network is used, a more complex representation of the input function can be obtained, because each layer of the neural network is a nonlinear transformation of the previous layer. Of course, the activation function of each layer is required to be non-linear at this time, otherwise there is no need to use multiple layers.

Advantages of deep networks:

I. More complex expressions than single-layer neural networks. For example, the function that can be learned by using a K-Layer Neural Network (and the number of nodes in each layer is polynomial) If you want to use a K-1-Layer Neural Network to learn, then the number of neural network nodes in the K-1 layer must be an exponential number.

2. The features learned by networks at different layers gradually increase from the bottom layer to the top layer. For example, in image learning, the first hidden layer network may learn edge features, and the second hidden layer will learn outlines or something, later, it may be a part of the image target, that is, the underlying hidden layer learns the underlying features, and the high-level hidden layer learns the high-level features.

III. The structure of this multilayer neural network is very similar to that of the human cerebral cortex. Therefore, it has a certain biological theoretical basis.

Disadvantages of deep networks:

1. The deeper the network layer, the more training samples are required. If supervised learning is used, the samples are more difficult to obtain because various annotations are required. However, if the number of samples is too small, it is easy to produce overfitting.

2. parameter optimization of multi-layer neural networks is a high-order non-convex optimization problem. This problem is usually converged to a relatively poor local solution, and general optimization algorithms generally do not work well. That is to say, parameter optimization is a difficult issue.

Iii. Gradient Diffusion. When the network layers are deep, the BP algorithm is generally used to calculate the deviation of the loss function. However, these gradient values decrease significantly with the depth increasing, as a result, the previous network contributes little to the final loss function. In this way, the previous weight update speed will be very slow. A better solution theoretically is to increase the number of neurons in the later network structure so that it will not affect the learning of the previous network structure. But isn't it the same as the low-Depth Network Structure? So it is inappropriate.

Therefore, the hierarchical greedy training method is generally used to train network parameters, that is, first training the first hidden layer of the network, then training the second and third... Finally, these trained network parameter values are used as the initial values of the overall network parameter. The advantage is that the data is easier to obtain, because the previous network layers are basically obtained using unsupervised methods, which is easy. Only the last output layer needs supervised data. In addition, because unsupervised learning is actually invisible and provides some prior knowledge of input data, the initial value of the parameter can generally obtain a better local optimal solution. Stacked autoencoders is a common hierarchical greedy training method. Its Encoding formula is as follows:

  

The decoding formula is as follows:

The last step is to use the parameters learned by stacked autoencoders to initialize the entire network. At this time, the entire network can be seen as a single neural network model, but it is only multi-layer, the common BP algorithm is effective for any layer of network. The final parameter adjustment steps are the same as those of the previously learned sparse encoding model. The process is as follows:

 

  References:

Http://deeplearning.stanford.edu/wiki/index.php/UFLDL_Tutorial

Deep Learning: Fifteen (self-taught LearningExercise)

 

 

 

 

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.