Sparse Autoencoder is a three-layer structure of the network, respectively, the input and output and the hidden layer, the description of the front self-encoder, the neuron in the neural network uses the same excitation function, Linear decoders modified the definition of self-encoder, The unused excitation function is adopted for the output layer and the hidden layer, so the model obtained by Linear Decoder is more easy to apply, and the parameter changes of the model are more robust.
Formulas in the forward conduction process in the network:
Where a(3) is the output. In the self-encoder, a(3) approximately reconstructs the input x = a(1).
for Autoencoder with the last layer of SIGMOD (tanh) activation function, the data is normalized directly to [0,1], so when F(Z(3)) is Sigmod (tanh) when you c0> a function, you restrict or scale the input so that it is in the [0,1] range. But for input data x, such as MNIST, it is difficult to meet the requirements of x also in [0,1]. For example, the input of the PCA whitening process does not meet the [0,1] scope requirements.
Another a(3) = Z(3) can be a simple solution to the above problem. That is, using the identity function f(z) = Z as the excitation function at the output, there is a(3) = f(Z (3)) = Z(3). The special excitation function is called the linear excitation (identity excitation) function .
Neurons in the hidden layer of Linear Decoder still use the SIGMOD (tanh) excitation function. The excitation formula of the hidden unit is that the S-type function, x is into, W(1) and b(1) are the weights and deviations of the hidden elements respectively. That is, only linear excitation functions are used in the output layer. This is a self-encoder consisting of an S-type or a tanh hidden layer and a linear output layer, called a linear decoder .
In the linear decoder, the. Because the output is a linear function of the implicit unit excitation output, changing W(2) allows the output value a(3) to be greater than 1 or less than 0. This avoids scaling the value of the output layer to [0,1] at sigmod.
With the change of the excitation function of the output unit, the gradient of the output unit changes correspondingly. Each of the previous output unit error items is defined as:
where y = x is the desired output, is the output of the self-encoder, is the excitation function. Because the excitation function at the output layer is f(z) = Z , so that F' (z) = 1, so the above formula can be simplified to
Of course, if the inverse propagation algorithm is used to calculate the error term for the hidden layer:
Because the hidden layer employs an S-type (or Tanh) excitation function f, in the above formula, the derivative of the S-type (or Tanh) function remains. That is, only the output layer residuals in linear decoder are different from those of Autoencoder.
Liner Decoder Code:
(vi) 6.16 neurons Networks linear decoders and its implements