Stanford UFLDL tutorials from self learning to deep network _stanford

Source: Internet
Author: User
From self learning to deep network

In the previous section, we used the self encoder to learn the characteristics of input to the Softmax or logistic regression classifier. These features are only learned using data that is not annotated. In this section, we describe how to fine-tune these features using the annotated data for further refinement. If you have a large number of tagged data, you can significantly improve the performance of the classifier by fine-tuning it.


In Self-learning, we first train a sparse self encoder using the undated data. Then, given a new sample, we extract the feature from the hidden layer. The above process is illustrated as follows:


We are interested in the classification of the problem, the goal is to predict the category of the sample label. We have a callout dataset that contains a callout sample. Previously, we have explained that you can replace the original features with features obtained from a sparse encoder. This allows you to get the training dataset. Finally, we trained a logistic classifier from feature to class label. To illustrate this process, we describe the logistic regression unit (orange) in the following diagram in the way of the Neural network section.


Consider the classifier (input-output mapping) that is learned using this method. It describes a function that maps a test sample to a predictive value. By combining the two previous pictures, you get a graphical representation of the function. In other words, the final classifier can be expressed as:


The parameters of the model are obtained through two steps: in the first layer of the network, the weights of the input mapping to the hidden cell activation amount can be obtained by the sparse Self encoder training process. In the second layer, the weights that map hidden units to outputs can be obtained by logistic regression or Softmax regression training.


This final classifier is obviously a large neural network. Therefore, after training to obtain the initial parameters of the model (using the automatic encoder to train the first layer, using the Logistic/softmax regression to train the second layer), we can further modify the model parameters, and then reduce the training error. Specifically, we can fine-tune the parameters and use gradient descent or l-bfgs on the basis of existing parameters to reduce the training errors on the labeled sample sets.


When using fine-tuning, the initial unsupervised feature learning steps (i.e. automatic encoder and logistic classifier training) are sometimes referred to as pre training. The effect of fine-tuning is that the labeled DataSet can also be used to correct weights, so that the features extracted by the hidden units can be adjusted further.


So far, when we describe the above process, we assume that the "alternative (replacement)" representation is used instead of the cascade (concatenation) representation. In alternative representations, the training sample format seen by the logistic classifier is; in the cascade representation, the training sample format that the classifier sees is. The cascade representation can also be fine-tuned (in cascaded representations of neural networks, input values are also directly entered into the logistic classifier.) A sketch of the previous neural network diagram can be obtained with a slight change. Specifically, the input node in the first layer, in addition to the hidden layer connection, will also cross the hidden layer and directly connect to the third-tier output node. But for fine-tuning, cascading representations have little advantage over alternative representations. Therefore, if fine-tuning is needed, we usually use an alternative representation of the network (but the effect of cascading representations is sometimes much better if no fine-tuning is done).


When to apply fine-tuning. It is usually used only if there is a large number of marked training data. In such cases, fine-tuning can significantly improve classifier performance. However, if there is a large number of undated datasets (for unsupervised feature learning/pre-training), only relatively few training sets have been labeled, and the effect of fine-tuning is very limited.


Chinese-English learning self-taught learning deep network deep networks trimmer fine-tune sparse self-encoder sparse Autoencoder gradient descent gradient descent non-supervised characteristics learning UN Supervised feature learning pre-training pre-training from:http://ufldl.stanford.edu/wiki/index.php/%e4%bb%8e%e8%87%aa%e6%88 %91%e5%ad%a6%e4%b9%a0%e5%88%b0%e6%b7%b1%e5%b1%82%e7%bd%91%e7%bb%9c

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.