[Mechine Learning] Active Learning

Source: Internet
Author: User

1. Write in front

Supervised learning (supervised learning), unsupervised learning (unsupervised learning), and semi-supervised learning (semi-supervised learning) in the field of machine learning (learning) are three kinds of research, the use of a wide range of learning technology, wiki on the three kinds of learning the simple description of the following:

    • Supervised learning: Through the correspondence between the input data and the output data, a function is generated to map the input to the appropriate output, such as classification.
    • Unsupervised learning: Model input datasets directly, such as clustering.
    • Semi-supervised learning: the use of a class of data and the absence of class-labeled data, to generate the appropriate classification function.

In fact, a lot of machine learning is to solve the problem of category attribution, that is, given some data, to determine which classes each data belongs to, or what other data belong to the same class and so on. Thus, if we come up with some sort of partition (clustering) of the data, we can automatically organize the data into some classes through some intrinsic properties and connections, which is non-supervised learning. If we knew from the beginning that the data contained the categories, and some of the data (training data) has been marked with the class label, we have already marked the class of the data to summarize, a "data-to-class" mapping function to classify the remaining data, which belongs to supervised learning. Semi-supervised learning refers to the fact that the training data is very scarce, and the method of improving the learning accuracy rate by using some non-class data.

2. What is active learning?

In real data analysis scenarios, we can obtain a large amount of data, but these data are unlabeled data, many classical classification algorithms can not be used directly. It would have been said that the data is not labeled, then we will mark the data! This idea is normal and simple, but the cost of data labeling is very large, in time we only labeled thousands of or tens of thousands of training data, the time and money to label data is also huge.

Before introducing the concept of active learning, let's first talk about the problem of sample information.

What is a sample information? Simply put, the sample information is that in the training data set each sample to the model training information is different, that each sample for model training contribution is small, there is a difference between them.

therefore, In order to reduce the training set and labeling cost as much as possible, the active learning (active learning) method is proposed in the field of machine learning to optimize the classification model.

Active learning (active learning), refers to a learning method:

Sometimes, there is a relatively sparse class of data and no class-labeled data is quite rich, but the data are manually labeled and very expensive, at this time, the learning algorithm can proactively put forward some labeling requests , some filtered data submitted to the experts for labeling.

This screening process is where active learning is the main research.

3. The basic idea of active learning

The active learning algorithm can be modeled by the following five components:

$A = (C, L, S, Q, U) $
Where $C $ is one or a set of classifiers; the $L $ is a set of annotated sets of training samples, $Q $ as a query function for querying large volumes of samples in unlabeled samples; $U $ for the entire unlabeled sample set; $S $ is a supervisor, you can label unlabeled samples.

Active learning algorithms are mainly divided into two stages:

The first stage is the initialization stage, and the random a small part of the sample is selected, which is labeled by the supervisor, and the initial classifier model is established as the training set.

The second stage is the circular query phase, $S $ never label the sample set $U $, according to a certain query criteria $Q $, select a certain unlabeled sample for labeling, and add to the training sample set $L $, retrain the classifier until the training stop standard is reached.

The active learning algorithm is an iterative process, and the classifier trains the sample with the feedback of the iteration and improves the classification efficiency continuously.

An example of active learning:face recognition technology in QQ space Photo album

4. The difference between active learning and semi-supervised learning

Many people think that active learning is also a semi-supervised learning category, but in fact is not the same, semi-supervised learning and direct learning (transductive learning) and active learning, all belong to the use of unlabeled data learning technology, but the basic ideas are still different.

As mentioned above, active learning "active" refers to the unsolicited request for labeling, that is, the need for an external entity capable of labeling their requests (usually the relevant field of personnel), that active learning is interactive.

Semi-supervised learning, in particular, is that the learning algorithm does not require manual intervention, and is based on its own use of unlabeled data.

5. References

[1] Active Learning Wiki

[2] 2012, an overview of active learning algorithms

[Mechine Learning] Active Learning

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.