Origins: A study of traditional activation functions and neurons activation frequency
The most commonly used two activation functions in traditional neural networks, the Sigmoid system (logistic-sigmoid, tanh-sigmoid) are regarded as the core of neural networks.
Mathematically, the nonlinear sigmoid function has a great effect on the signal gain of the Central, small signal gain on the two sides, and the characteristic space mapping of the signal.
From the point of view of neuroscience, the Central region resembles the excited state of neurons, and the two regions resemble the inhibitory states of neurons, so in the study of neural networks, the key features can be pushed to the central area and the non-key features pushed to the two sides.
Whatever the explanation is, it's actually a lot smarter than the early linear activation function (Y=X), the Step activation function ( -1/1,0/1).
In 2001, neuroscientists Dayan and Abott simulated a more accurate activation model for neurons receiving signals, the model:
This model
Relu activation function