Origin: Automatic encoder
Single automatic encoder, at best, is a hardened patch version of PCA, only one time not enjoyable.
So Bengio and other people in the 2007 greedy layer-wise Training of Deep Networks,
Modeled after the dbn of stacked RBM, stacked autoencoder is put forward, which adds Reggie to the application of non-supervised learning in the deep network.
This has to be referred to as "layered initialization" (Layer-wise pre-training), with the aim of pre-training of unsupervised learning by layers,
To initialize the parameters of the deep network, instead of the traditional random small value method. After the training is completed, the training parameters are used to supervise the learning and training.
Part I principle
Unsupervised Learning Network Training mode and the way to supervise the learning network is the opposite.
In the supervised learning network, the parameter w of each layer is constrained by the error function of the output layer, so the gradient of the Layeri parameter depends on the gradient of layeri+1, and the reverse propagation of "one iteration-update whole network" is formed.
But in unsupervised learning, the parameter W of each encoder is restricted to the input of the current layer, so it can train the Encoderi, transfer the parameters to Layeri, use the advantage parameters to propagate to layeri+1, and then start training.
The new training mode of "All iterations-update single layer" is formed. In this way, the layeri+1 benefit is very high, because it absorbs the essence of the full training dedication of layeri input.
Part II code and implementation
Main reference http://deeplearning.net/tutorial/SdA.html
The stacking machine constructs each layer, Encoder in the constructor function, and saves it.
Theano in the construction of the stacking machine, the easy error point is the encoder, layer of the parameter transfer.
We know that the list of Python has a shallow copy. Theano all shared tagged variables are shallow copies.
So first there is the wrong way to do this:
def __init__ (self,rng,input,n_in,n_out,layersize): ... for inch xrange (len (layersize)): ... Da. W=hidenlayer. W da.bout=hidenlayer.b
Then you make an error when you grad the outside for DA, suggesting that the params and cost functions do not match.
This is because the tensor expression of the cost function is determined when the cost function is written, when the Da object is just constructed, and thus the da.w in the tensor expression is constructed with random values.
Then, after the DA Construction, the hands of the DA. The memory that W points to has changed (shallow copy is equivalent to reference), so the calculated Grad is not right at all.
In fact, this is reversed, and changed into this
def __init__ (self,rng,input,n_in,n_out,layersize): ... for inch xrange (len (layersize)): ... Hidenlayer. W=da. W hidenlayer.b=da.bout
Well, this will not be an error, and each training a encoder, with get_value to see the value of the layer has indeed changed. But, when training encoderi+1, how does feeling have no effect?
In fact, it really does not work, because layeri parameters are not propagated to layeri+1.
The reason is that Theano uses Python, c dual memory area design, the parameter does not go to layeri when training encoderi in C code. But we set up a shallow copy?
The reason is that the updates function is not aware of a shallow copy relationship in the C memory area, because it is in the Python memory area.
The correct approach is to create a shallow copy relationship in the DA Construction, and when the C code is compiled, all Python objects are reconstructed in the C memory area, which naturally triggers a shallow copy in the C memory area.
Da=da (Rng,layerinput,inputsize,self.layersize[i],hidenlayer. W,HIDENLAYER.B)
Stack-type automatic encoder (stacked autoencoder)