Deep learning over the past few years, the feature extraction capability of convolutional neural Networks has made this algorithm fire again, in fact, many years ago, but because of the computational complexity of deep learning problems, has not been widely used.
As a general rule, the convolution layer is calculated in the following form:
wherein, x respectively represents the current convolution layer of the J features, the first layer of the first I feature; K represents the convolution kernel between the J feature of the current layer and the I feature of the previous layer M represents the set of features that require the previous layer of the convolution, and b represents the offset of the J convolution cores in the current convolution layer. f is the activation function.
Weights and thresholds in convolutional layers are obtained by means of a random gradient descent method:
In the formula, A is the learning rate.
The gradient of the loss function to the convolution layer parameter can be obtained by chain derivation, as follows:
Represents the gradient of the previous layer.
There are several forms of activation functions in convolutional neural networks:
A is a fixed parameter in the formula.
In the formula, each batch training sample is randomly sampled from the distribution of the mean value, which is taken in the test.
From the above convolution neural network, we can see that gradient iteration is needed in the learning process, and the time complexity is very high in practical applications such as industrial detection, so the academic field is optimized and a single-layer neural network limit learning machine is optimized to solve this problem, which has been widely used in the past.
To solve the above problems, the ultimate learning machine has emerged.
A special result solved by using least squares method, which is equivalent to a form of matrix inversion
For the moore-penrose generalized inverse.
1) because the limit learning machine is to calculate a generalized inverse, the training speed is much faster than the gradient-based learning algorithm.
2) There are many problems in gradient-based learning algorithms, such as difficult to determine the learning rate, local network minimization, and so on, the limit learning machine effectively improved this kind of problem, in the classification process has achieved better results;
3) Unlike other neural network algorithms, the Limit learning machine can select the non-micro function during the course of the activation function. ;
4) Limit Learning machine algorithm the training process is not complicated. The limit learning machine only needs three steps to complete the learning process.
Use the R code to explain the limit learning machine
# # #训练过程如下:
The training process can be 4 steps.
Elmtrain.Default <-
function (X,y,nhid,actfun,...) {
Require (MASS)
if (Nhid <1) Stop ("Error:number of hidden neurons must be >= 1")
########1. Select data, x and Y
T <-t (y)
P <-t (x)
########2. Randomly generated weights to change the X value
Inpweight <-Randommatrix (Nrow (P), nhid,-1,1)
temph <-inpweight%*% P
Biashid <-runif (nhid,min=-1,max=1)
Biasmatrix <-Matrix (Rep (Biashid, Ncol (P)), Nrow=nhid, Ncol=ncol (p), Byrow = F)
temph = temph + Biasmatrix
########3. High-dimensional mapping of the changed X-values, most commonly the SIG function
if (Actfun = ="sig") H =1/(1 +Exp (-1*temph))
else {
if (Actfun = ="Sin") H =Sin (temph)
else {
if (Actfun = ="Radbas") H =Exp (-1* (temph^2))
else {
if (Actfun = ="Hardlim") H = Hardlim (temph)
else {
if (Actfun = ="Hardlims") H = Hardlims (temph)
else {
if (Actfun = ="Satlins") H = Satlins (temph)
else {
if (Actfun = ="Tansig") H =2/(1 +Exp (-2*temph))-1
else {
if (Actfun = ="Tribas") H = Tribas (temph)
else {
if (Actfun = ="Poslin") H = Poslin (temph)
else {
if (Actfun = ="Purelin") H =temph
else stop (paste (" ERROR: ", Actfun," is not a valid activation function. ", Sep=}
}
}
}
}
}
}
}
}
####### #4. The model coefficients are fitted, i.e. a
Outweight <-ginv in Y=ax (t (H), tol = sqrt (. Machine$double.eps))%*% T (t)
Y <-t (t (H)%*% outweight)
model = list (inpweight=inpweight,biashid=biashid,outweight=outweight,actfun=actfun,nhid=nhid,predictions=t (Y ))
Model$fitted.values <-t (Y)
model$residuals <-y-model$fitted.values
Model$call <-match.call ()
class (model) <- "ELMNN"
Model
}
Test process, the process of 4 steps can be.
Function (object, NewData =NULL, ...)
{
if (Is.null (NewData))
Predictions <-fitted (object)
else {
if (!is.null (Object$formula)) {
X <-Model.matrix (Object$formula, NewData)
}
else {
X <-NewData
}
########1. Get the parameters in the training model
Inpweight <-Object$inpweight
Biashid <-Object$biashid
Outweight <-Object$outweight
Actfun <-Object$actfun
Nhid <-Object$nhid
TV. P <-t (x)
########2. Change the x value by the parameter
Tmphtest = Inpweight%*% TV. P
Biasmatrixte <-Matrix (Rep (Biashid, Ncol (TV). P)), Nrow = Nhid,
Ncol = Ncol (TV. P), Byrow = F)
Tmphtest = tmphtest + biasmatrixte
########3. High dimensional mapping, usually select SIG function
if (Actfun = ="Sig")
Htest =1/(1 +Exp (-1 * tmphtest))
else {
if (Actfun = ="Sin")
Htest =Sin (tmphtest)
else {
if (Actfun = ="Radbas")
Htest =Exp (-1 * (tmphtest^2))
else {
if (Actfun = ="Hardlim")
Htest = Hardlim (tmphtest)
else {
if (Actfun = ="Hardlims")
Htest = Hardlims (tmphtest)
else {
if (Actfun = ="Satlins")
Htest = Satlins (tmphtest)
else {
if (Actfun = ="Tansig")
Htest =2/(1 +Exp (-2 * tmphtest))-
1
else {
if (actfun = "Tribas")
Htest = Tribas (tmphtest)
else {
if (actfun = "Poslin")
Htest = Poslin (tmphtest)
else {
if (Actfun = = "Purelin")
Htest = tmphtest
else Span class= "Hljs-title" >stop (paste ( "ERROR:", Actfun,
Span class= "hljs-string" > "is not a valid activation function.",
Sep = }
}
}
}
}
}
}
}
}
####### #4. Calculate the value of the prediction, that is, Y (forecast) =ax
TY = t (t (htest)%*% outweight)
Predicti ONS <-t (TY)
}
Predictions
}
The internal structure of the limit learning machine is described by R, and the following is an example of the R-Band: prediction by the Limit learning machine
Library (ELMNN)
Set.Seed1234)
Var1 <-runif (50,0,100)
sqrt.data <-data.frame (Var1, Sqrt =sqrt (Var1))
model <-elmtrain.formula (sqrt~var1, data=sqrt. Data, Nhid=10, Actfun= "SIG")
New <-data.frame (sqrt=0,Var1 = runif (50,0,100))
p <-predict (model,newdata=new)
Turn from: https://ask.hellobi.com/blog/Zason/4543
R language Fast deep learning for regression prediction (GO)