This article and we share the main is the machine learning activation function related content, together look at it, hope to learn from you Machine Learning helpful.
The activation function converts the last layer of the neural network output as input. Also used between two layers of neural networks.
So why should the activation function be made in the neural network?
For example, in logistic regression, the output is converted to 0/1 for classification. Used in neural networks to determine the output is yes/no. or map the output to a range, such as handwritten numeral recognition, to map the output to 0--9 .
activation functions General Classification two classes : linear and nonlinear
linear or identity activation functions
As the above function, the output is not limited to any extent and does not correspond to our purpose above.
Nonlinear activation functions
There are several terms for activation functions to understand: derivative or differential : When the optimization method is related to the gradient, it needs to be directed, Therefore the function must be micro. monotonicity : When the activation function is monotonous, A single-layer network can be guaranteed to be a convex function. output value range
: when the output value of the activation function is Limited , the gradient-based optimization method is more stable, because the representation of the feature is more significantly affected by the finite weights; when the output of the activation function is infinite , the model training will be more efficient, but in this case, smaller learning rate is generally required.
Here are a few common activation functions:
sigmoid function
as above, the output is always between 0--1, where the speed of change is slower when it is close to 0 or 1 . Useful when predicting the likelihood of a model. The function is micro, so the slope can be calculated between two points. The function is monotonous but its conduction function is not monotonous. This activation function causes the neural network to get stuck during training, some of the disadvantages are as follows:
1. when the input is over the hour, the gradient is close to 0. Therefore, when the initial value is very large, the neuron gradient disappears and the training difficulty is increased.
2. The average value of this function output is not 0. Therefore, the latter layer of the neuron will be a non-0 output of the previous layer as a signal input, the gradient is always positive.
Tanh hyperbolic sine activation function
similar to sigmoid, but better than sigmoid , Output is between -1--1 . Unlike sigmoid, the function output has a mean value of 0. commonly used in two classification problems.
Relu ( linear rectification ) activation function
currently this is the most active function used in neural networks, most of which are used in convolutional neural networks and deep neural networks. as above, the range is between 0--Infinity. Where the function and its reciprocal are monotonous. Some of the advantages are:
1. Convergence speed is much faster than sigmoid and Tanh
2. compared to sigmoid and Tanh, due to functional characteristics, only a threshold is required to get the activation value and there are drawbacks, such as a very large gradient flow through a Relu neurons, after updating the parameters, because the activation value is too large, resulting in subsequent data activation difficult.
Softmax activation function
Softmax is used in multi-classification process, it will be the output of multiple neurons, mapped to the (0,1) interval, can be seen as a probability to understand, so as to carry out multi-classification!
Why does the above mentioned derivative or differentiable: When updating gradients in gradient descent, you need to know the slope of the curve and update it, as this is the quickest direction to fall. Therefore, the derivative of the activation function needs to be used in the neural network.
Source: Network
The activation function of machine learning