The activation function of machine learning

Last Update:2018-04-13 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This article and we share the main is the machine learning activation function related content, together look at it, hope to learn from you Machine Learning helpful.

The activation function converts the last layer of the neural network output as input. Also used between two layers of neural networks.

So why should the activation function be made in the neural network?

For example, in logistic regression, the output is converted to 0/1 for classification. Used in neural networks to determine the output is yes/no. or map the output to a range, such as handwritten numeral recognition, to map the output to 0--9 .

activation functions General Classification two classes : linear and nonlinear

linear or identity activation functions

As the above function, the output is not limited to any extent and does not correspond to our purpose above.

Nonlinear activation functions

There are several terms for activation functions to understand: derivative or differential : When the optimization method is related to the gradient, it needs to be directed, Therefore the function must be micro. monotonicity : When the activation function is monotonous, A single-layer network can be guaranteed to be a convex function. output value range

: when the output value of the activation function is Limited , the gradient-based optimization method is more stable, because the representation of the feature is more significantly affected by the finite weights; when the output of the activation function is infinite , the model training will be more efficient, but in this case, smaller learning rate is generally required.

Here are a few common activation functions:

sigmoid function

as above, the output is always between 0--1, where the speed of change is slower when it is close to 0 or 1 . Useful when predicting the likelihood of a model. The function is micro, so the slope can be calculated between two points. The function is monotonous but its conduction function is not monotonous. This activation function causes the neural network to get stuck during training, some of the disadvantages are as follows:

1. when the input is over the hour, the gradient is close to 0. Therefore, when the initial value is very large, the neuron gradient disappears and the training difficulty is increased.

2. The average value of this function output is not 0. Therefore, the latter layer of the neuron will be a non-0 output of the previous layer as a signal input, the gradient is always positive.

Tanh hyperbolic sine activation function

similar to sigmoid, but better than sigmoid , Output is between -1--1 . Unlike sigmoid, the function output has a mean value of 0. commonly used in two classification problems.

Relu ( linear rectification ) activation function

currently this is the most active function used in neural networks, most of which are used in convolutional neural networks and deep neural networks. as above, the range is between 0--Infinity. Where the function and its reciprocal are monotonous. Some of the advantages are:

1. Convergence speed is much faster than sigmoid and Tanh

2. compared to sigmoid and Tanh, due to functional characteristics, only a threshold is required to get the activation value and there are drawbacks, such as a very large gradient flow through a Relu neurons, after updating the parameters, because the activation value is too large, resulting in subsequent data activation difficult.

Softmax activation function

Softmax is used in multi-classification process, it will be the output of multiple neurons, mapped to the (0,1) interval, can be seen as a probability to understand, so as to carry out multi-classification!

Why does the above mentioned derivative or differentiable: When updating gradients in gradient descent, you need to know the slope of the curve and update it, as this is the quickest direction to fall. Therefore, the derivative of the activation function needs to be used in the neural network.

Source: Network

The activation function of machine learning

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

The activation function of machine learning

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

The activation function of machine learning

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support