Self-training and co-Training

Source: Internet
Author: User

The concept of semi-Supervised Learning (semi-supervised learning) is not complex at all, that is, a centralized learning model for training that contains both labeled data and unlabeled data. Semi-Guided Learning is a machine learning method between Guided Learning and unsupervised learning.

In many tasks in the NLP field, it is difficult to obtain labeled data. In particular, training resources such as syntax and semantics often require deep expert knowledge (such as linguistic knowledge) as guidance during tagging, resulting in a high labeling cost. Although everyone is advocating simplification, popularization, and even Entertainment, it is difficult to find a way to use the wisdom of the Group to address this issue. In recent years, the popular crowdsourcing technology has not effectively solved this "expert-level" labeling problem. From this perspective, semi-Guided Learning is very meaningful and worth studying. In the first half of the year, Gao jingxuejie of uiuc went back to school to provide a report on integration of Guided Learning and non-guided learning. I personally felt very inspired.

Since semi-Guided Learning is not difficult to understand, the concepts and definitions will not be described here. The following describes the semi-guided learning methods (models) discussed in this activity ).

Self-Training Models

The basic assumption of the self-learning model is that when the classifier predicts samples, the samples with a high confidence level are likely to be correctly classified. For example, when SVM classifies a sample, the samples that are far away from the classification interface can be considered to be correctly classified. Based on this assumption, the self-learning model is very simple. Suppose I have two stacks of data, A and B, where A is labeled data, that is, with label; B is not labeled data. The self-training method is as follows:

  1. Train A classification model m from labeled data
  2. Use this model to predict B
  3. Add K samples with a high confidence level in the prediction results together with their labels to training data a and delete them from B.
  4. Return to step 2.

Of course, there are many implementation methods in step 1. For example, you can add all samples in B to A and assign different weights to the samples based on the confidence level during prediction.

The self-learning model is the simplest and most easy-to-implement semi-Guidance model, but simple things are often not easy to use. There are not many simple and easy-to-use things in the world. The disadvantage of self-learning is that if a sample of the error classification is added to the original training set, the errors it makes will only get deeper and deeper in the subsequent training process, it will also trick other samples into making mistakes. This outlier is really not enough. The self-learning model is powerless.

Generative Models

How to distinguish between a discriminative model and a generative model remains a key point of debate. We recommend that you take a look at a workshop of NIPs in 2009: http://gen-disc2009.wikidot.com /. The integration of discriminative models and generative models is a topic I have always wanted to do. Although I have already done some experiments on parse reranking, I always feel that I am not deep enough to understand the two.

The Application of generative models in semi-Guided Learning is relatively simple. The Gaussian mixture model (GMM) we often encounter is a typical generative model (of course, there are other mixed density distributions, such as hybrid polynomial distribution ). GMM is always tightly bound with the EM algorithm. We know that if the class (Gaussian distribution) of each sample in GMM is known, it is easy to use MLE to estimate its parameters. If all samples do not know the category, we generally use the EM algorithm to iteratively estimate its parameters. The former is guided learning, and the latter is not guided learning. So it is easy to think of the third case, that is, some samples have known classes, while the remaining sample classes are unknown. In this case, you only need to slightly change the form of the likelihood function to use the EM algorithm.

Cluster-then-label

From the use of the previous EM algorithm in semi-Guided Learning, We can naturally think that non-guided methods can generally be used for semi-Guided Learning. K-means is a simple form of EM algorithm. Can clustering be used for semi-Guided Learning?

Cluster-then-label, as its name implies: cluster first and then classify. The algorithm is as follows:

  1. Clustering of all samples (including labeled data a and unlabeled data B)
  2. Perform steps 3rd and 4 for each cluster in the cluster result. Make s the labeled sample in the Cluster
  3. If S is not empty, a classifier is learned on S and the unlabeled samples in the cluster are predicted.
  4. If S is empty, the unlabeled samples in the cluster are predicted based on all labeled data.

In this way, we will get the classes of all samples. The idea is very simple, but it still needs to be verified by experiment.

Co-Training

The first time I came into contact with the word co-training was in my junior year. At that time, I had almost no idea about machine learning, and my mind was filled with various fantasies about artificial intelligence. At that time, the understanding of co-training was: There were two robots that could interact with each other and learn from each other. It sounds very advanced. It turns out that the real co-training is similar to that of co-training.

Co-training, also known as collaborative training or collaborative learning, is a multiview algorithm. Multiview refers to multiple angles for recognizing things. For example, for the "moon", we will see a bright moon like a mirror in our mind, or a circle, or bright or dark. We may even think of many beautiful sentences from ancient likes. Of course, some people will immediately think of the characteristics of the moon. For example, it is a satellite of the earth, and its surface has a ring-shaped mountain, which runs around the Earth ...... And so on. This is two ways to look at the same thing. From different perspectives, we can get different feature spaces. In different feature spaces, we can obtain different classification models. This is the basic idea of co-training.

The process of collaborative training is as follows: assume that the data has two kinds of feature expressions, than image features (X-1, Y-1) and text features (X-2, Y-2 ). There are two types of views for unlabeled data. The algorithm is as follows:

  1. From (X-1, Y-1), (X-2, Y-2) were trained to obtain two classification model F-1, F-2
  2. Prediction of unlabeled data using F-1 and F-2 respectively
  3. Add the first K samples with the highest confidence level predicted by the F-1 to the training dataset of the F-2
  4. Add the first K samples with the highest confidence level predicted by the F-2 to the training dataset of the F-1
  5. Back to step 2

The basic co-training algorithm is quite simple. There are many research points on how to apply it to NLP tasks. I think the idea of co-training can help you understand AI a little. I always thought that artificial intelligence must be the result of group functions. A closed self-learning system is hard to inspire intelligence unless it has a very powerful knowledge set and induction and deduction system. An intelligent body mainly relies on expert guidance (our parents and family members) to strengthen its Learning System (human brain network) when children grow, we are constantly comparing and learning from each other. In this constant collision and initiation, we gradually have knowledge and accurate judgment on things. Human intelligence acquisition method should be worth learning from machine learning method. Maybe one day, we can really simulate the thinking of the human brain, instead of just simulating human behavior.

Semi-supervised learning methods widely used include:

1. Em with generative Mixture Models

2. self-training

3. Co-Training

4. transductive Support Vector Machines

5. Graph-based methods

Self-training:

A classifier is first traind with the small amount of labeled data. the classifier is then used to classify the unlabeled data. typically the most confident unlabeled data points, together with their predicted labels, are added to the training set. the classifier is re-trained and the procedure repeated.

When the existing supervised classifier is complicated and hard to modify, self-training is a practical wrapper method. applied to several natural language processing tasks, Word Sense Disambiguation, parsing, machine translation and object detection system from images.

Co-Training

Co-training assumes that features can be split into two sets. each sub-features is sufficient to train a good classifier. the two sets are conditionally independent given the class. initially two seperate classifiers are trained with the labeled data, on the two sub-features sets respectively. each classifier then classifies the unlabeled data, and 'teaches' the other classifier with the few unlabeled examples (and the predicted labels) they feel most confident.

Each classifier is retrained with the additional training examples given by the other classifer, and the process repeats.

When the features naturally split into two sets, co-training may be appropriate.

 

References:

Http://www.jiangfeng.me/blog/tag/self-training

Http://blog.csdn.net/bookwormno1/article/details/7024929

 

Original article: http://www.leexiang.com/self-training-and-co-training

Self-training and co-Training

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.