"Wunda deeplearning.ai Note two" popular explanation under the neural network

Source: Internet
Author: User

4 activation function

One of the things to be concerned about when building a neural network is what kind of activation function should be used in each separate layer. In logistic regression, the sigmoid function is always used as the activation function, and there are some better choices.

The expression for the tanh function (hyperbolic Tangent function, hyperbolic tangent) is:


The function image is:


The Tanh function is actually a shifted version of the sigmoid function. For the hidden unit, the Tanh function as the activation function, the effect is always better than the sigmoid function, because the value of the Tanh function in between, the result of the final output is closer to the average, rather than the sigmoid function, which can actually make the next layer of learning easier. For the two classification problem, the Sigmiod function is still used as the output activation function to ensure that the output is in between.


However, one of the drawbacks of the sigmoid function and the Tanh function is that the derivative of the two functions, i.e., the gradient becomes very small when approaching infinity or infinite hours, and the gradient decreases at a very slow rate.


The linear correction unit, which is the Relu function used to explain what a neural network is, is also one of the active functions commonly used in machine learning, and its expression is:

The function image is:

When Z is greater than 0 o'clock, the derivative of the Relu function is always 1, so when the Relu function is used as the activation function, the convergence rate of the random gradient descent is much faster than sigmoid and Tanh, but the data of the negative axis is lost.

The modified version of the ReLU function, called Leaky-relu, has the following expression:

The function image is:


Where Alpha is a very small constant used to preserve the value of a non-negative axis.


It can be found that the above-mentioned activation functions are non-linear, because when using linear activation functions, the output will be the linear combination of inputs, so the use of neural networks and direct use of linear model is equivalent.

At this point the neural network is similar to a simple logistic regression model, which loses its own superiority and value.


5 forward propagation and reverse propagation


In the training process, the final result obtained after the forward propagation is always a certain error with the true value of the training sample, and this error is the loss function.


To reduce this error, one of the most widely used algorithms is the gradient drop, and then use the loss function, from the back forward, in turn, to find the various parameters of the bias, which is called the reverse propagation (back propagation), the general abbreviation for this algorithm for the BP algorithm .

The derivative of the sigmoid function is:

The chain rule in the derivation of complex functions, in the process of reverse propagation:


This is the whole derivation process of reverse propagation.

In the process of implementation of the algorithm, it is necessary to use the method of gradient descent in logistic regression, and the parameters are quantized, averaged and updated continuously.


6 Deep neural networks

Deep Neural Network contains a number of hidden layers, the construction method as described above, the training according to the actual situation of the selection of activation function, forward propagation to obtain cost function and then use the BP algorithm, reverse propagation, gradient decline to reduce the loss value.


Deep neural networks with multiple hidden layers are better able to solve some problems. For example, using a neural network to create a face recognition system, enter a face photo, the first layer of a deep neural network can be a feature detector, which is responsible for finding the edge direction in the photo, convolutional Neural Network (convolutional neural networks,cnn) specifically used to make this identification.

The second layer of the deep neural network can be used to detect each feature part of a face in a photograph, and then a layer can identify different face shapes based on the previously acquired features.


This allows the first layers of the deep neural network to be used as a few simple probing functions, which are then combined to form more complex learning functions. Starting from small details and building larger and more complex models in one step, it is necessary to build a deep neural network to achieve this.


Source:

Http://mp.weixin.qq.com/s?__biz=MzI4MDYzNzg4Mw==&mid=2247487329&idx=1&sn= d5db3df746b6a3f5872894f532948a06&chksm= ebb437b5dcc3bea3bb24d9b476b42197ccc907f230ace9718b912165e10b22210780b71c7d05&mpshare=1&scene=23& Srcid=0320j1rzoy8i4rgrof7auvbz#rd

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.