Application of depth learning in image recognition--Learning Notes 5_ Depth Study Introduction

Source: Internet
Author: User

Feedback from Neural networks

The criterion function is the measure of the error, in the feedback process, the actual is the process of optimizing the criterion function.

Assuming that the function f (θ) is a convex function, that is, the Hessian matrix is positive definite, the optimal problem of convex function is easy to solve.

In fact, the function f (θ) is a non convex function, and one of the solutions is the gradient descent method.


The principle of the gradient descent method: Each iteration, we calculate the gradient of f (θ) under the current theta. Then let theta add this gradient, which is iterative in the direction of gradient descent.

Problem: Each step is based on the current parameter θ, so this is a greedy algorithm. It is not guaranteed to converge to the global optimal.

Note: Convergence to the global minimum value is meaningless because the global minimum value is based on the minimum training set, and once converging here means that the network can well portray the training set, but it does not mean that it can also be
A good generalization to a test set. , this time is often accompanied by a higher risk of fitting, we should not blindly seek the global minimum, but should be in the training set and test set a trade-off between the two. Gradient descent, because of its simple and effective features, becomes a very common strategy in optimization problems.


Parameter correction of Softmax classifier








Error propagation

Take the three-storey example,







Reverse propagation is also a greedy algorithm, which can lead to some problems. Because each error propagation is to update the parameters and then the error injected back to the previous layer, this does not guarantee that the gradient is the actual gradient, once the network layer is too deep, will lead to the real derivative of the front layer and the use of reverse propagation calculation of the derivative difference is too large.

In addition, if you use the sigmoid function as an activation function, it will be easy to enter saturation, the gradient of the front layer is close to 0, so that the parameters can not be updated, this phenomenon we call the gradient disappears. One way to counteract the gradient extinction is to replace the sigmoid function with the Relu activation function, and the principle of why the Relu can resist the gradient extinction has not yet been studied, but the experimental results show that it does inhibit gradient disappearance.


Depth Confidence Network

The depth Confidence network (Deep belief Networks, abbreviated as DBN) is a traditional neural network.

The difference is: 1, he is a deep neural network. 2. To be limited by the Boltzmann machine as the basis.

Why use the depth structure.

A: Under the premise that the total number of neurons is unchanged, the expressive ability of the depth structure is stronger than that of the shallow structure.


Why the limited Boltzmann machine has been introduced.

A: Because the traditional depth structure can not be trained. For example, the traditional deep structure will appear gradient vanishing. or the traditional neural network initial value will greatly affect the convergence performance of the network, and the limited Boltzmann machine an important contribution is to initialize the depth of neural network parameters to a better value.

Depth Confidence network training is divided into two stages, namely the pre training stage and the parameter fine-tuning stage.

Pre-training stage: During the DBN training stage, consider the adjacent two layers as a limited Boltzmann machine, using the limited Boltzmann training method, the original data as the lowest input, each layer of RBM hidden layer of output as the input of the latter layer, and then a layer of greedy unsupervised training.

Parameter fine-tuning stage: in the parameter fine-tuning stage, then executes the global reverse propagation algorithm to carry on the supervised weight value fine-tuning. In this way, can avoid the simple use of the reverse communication method will appear in the local optimal problem, because the process of identification, the data is layered to carry out dimensional changes, so DBN can also be considered as a feature extraction method, corresponding to the depth of learning is sometimes called "feature learning."



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.