The multilayer self-encoder consists of a plurality of sparse self-encoders and a softmax classifier, wherein the weights of each sparse self-encoder can be obtained using a non-tagged training sample, and the Softmax classifier parameters can be obtained by a tagged training sample. multi-layer self-encoder trimmer Refers to the multi-layer self-encoder as a multi-layered neural network, using a tagged training sample set, the weight of the neural network to adjust.
1 multi-layer self-encoder structure
The multilayer self-encoder is shown in structure 1, it contains a 2 hidden layer of the stack of self-encoder and a softmax model, the output of the last hidden layer of the stack self-encoder as input of the Softmax model, the output of the Softmax model as the output of the entire network (the output is conditional probability vector).
Figure 1 Structure of multi-layer self-encoder
The process of fine-tuning a multilayer self-encoder is shown in Flowchart 2, which consists of three main parts:
(1) initializing parameter vectors for optimization
(2) Call the optimization function, calculate the optimization parameter vector
(3) The optimal parameter vector can be converted to the corresponding parameters of the network structure.
Among them, the minimization cost function mainly uses the Minfunc function, the optimization function format is as follows:
In order to realize the optimization process, the most important problem is to write the Stackedaecost function.
Figure 2 Fine-tuning process for multi-layer self-encoder
2 initialization of the entire network parameter
The parameter stackedaetheta (column vector form) of the whole network consists of two parts: Softmax classifier parameter vector + sparse self-encoder parameter vector; Their initialization values are obtained by sparse self-coding and Softmax learning:
3 Stackedaecost function
3.1 Excitation value of the sparse self-encoder section
3.1.1 Sparse self-encoder section structure diagram
The sparse self-encoder portion of a multilayer network is shown in the
Figure 3 Sparse self-encoder portion of a multilayer network
3.1.2 Sparse Self-encoder section excitation value (output)
Single sample |
multiple samples |
|
|
3.1.3 Softmax classifier excitation value (output)
Single sample |
multiple samples |
|
|
3.1.4 Program
3.2 Cost function
Calculation formula of 3.2.1 cost function
The cost function of the multilayer network is calculated exactly according to the cost function of the Softmax model and joins the regular term, but note that the regular item added here must punish all the parameters of the entire network!
The 3.2.2 program is as follows
3.3 Gradient Calculation
3.3.1 Softmax Model
The gradient calculation of the model is the same as the formula using the SOFTMXA model alone, namely:
Only the x here is the output H(2) of the last layer of the Softmax self-encoder.
3.1.2 Stack self-encoder layers
/colgroup>
Single sample |
multiple samples |
|
|
3.3.3 the gradient of the entire network
Finally, the entire network gradient (Softmaxthetagrad and Stackgrad) is stored in a column vector
Multi-layer self-encoder tuning