Xue Guirong: Transfer Learning)

Source: Internet
Author: User

This article from: http://hi.baidu.com/jiyeqian/blog/item/8926e81770f16f0b4


In the traditional machine learning framework, the learning task is to learn a classification model based on the given adequate training data. Then, the learned model is used.

To classify and predict test documents. However, we can see that machine learning algorithms have a key problem in the current Web Mining Research: some new

It is very difficult to obtain a large amount of training data in the field. We can see that the development of the Web application field is very fast. A large number of new fields are emerging, from traditional news to the Internet

Pages, images, blogs, podcasts, and so on. Traditional machine learning requires calibration of a large amount of training data for each field, which will consume a lot of manpower and material resources. No

A large amount of labeled data will make it impossible for many research and applications related to learning. Secondly, traditional machine learning assumes that training data and test data are subject to the same

Data Distribution. However, in many cases, this same distribution assumption is not satisfied. Generally, training data may expire. This often requires us to repeat the new logo

Note a large amount of training data to meet our training needs, but labeling new data is very expensive and requires a lot of manpower and material resources. From another perspective, if

We have a large amount of training data in different distributions, and it is a waste to discard the data completely. How to make proper use of the data is the main factor for Migration learning.

Solve the problem. Migration learning can migrate knowledge from existing data to help you learn in the future. The goal of Transfer Learning is

Knowledge learned by the environment is used to help learning tasks in the new environment. Therefore, migration learning does not make the same Distribution Assumption as traditional machine learning.

Our work on migration learning can be divided into the following three parts: instance-based migration learning in homogeneous space, feature-based migration learning in homogeneous space and

Migration learning in heterogeneous space. Our research points out that instance-based migration learning has stronger knowledge migration capabilities, and feature-based migration learning has more extensive knowledge.

Knowledge migration capability, while heterogeneous space migration has extensive learning and scalability. Each of these methods has its own merits.

1. instance-based migration learning in Homogeneous Space

The basic concept of instance-based migration learning is that although the auxiliary training data is more or less different from the source training data, the auxiliary training data should still be stored

Some of them are suitable for training an effective classification model and adapting to the test data. Therefore, our goal is to find out the suitable data from the auxiliary training

Test data instances, and migrate these instances to the learning of source training data. In terms of instance-based migration learning, we have promoted the traditionalAdaBoostComputing

Method, a boosting algorithm with the ability to migrate is proposed: tradw.sting [9], so that it has the ability to migrate and learn, so as to maximize the use of assistance

Training data to help classify the target. Our key idea is to use boosting technology to filter out the data that is most different from the source training data in the auxiliary data.

Among them, boosting is used to establish an automatic weight adjustment mechanism, so the weight of important auxiliary training data will increase, and that of unimportant auxiliary training data

The weight is reduced. After the weight is adjusted, the weighted auxiliary training data will be used as additional training data, which is used together with the source training data to increase the classification model

Reliability.

Instance-based migration learning can only happen when the source data is very similar to the secondary data. However, when the difference between source and auxiliary data is large

It is often difficult to find the knowledge that can be migrated. However, we found that even if the source and target data do not share some public data at the instance level

Common knowledge, which may have overlapping features. Therefore, we have studied feature-based migration learning, which discusses how to use the public

Knowledge to learn.

2. feature-based migration learning in Homogeneous Space

In terms of feature-based migration learning, we propose a variety of learning algorithms, such as CoCC algorithms [7], TPLSA algorithms [4], spectral analysis algorithms [2], and self-learning algorithms.

Method [3. Mutual clustering algorithms are used to generate a common feature representation to help you learn algorithms. Our basic idea is to use the mutual Clustering Algorithm for both source data

Clustering with auxiliary data to obtain a common feature representation. The new feature representation is better than the feature representation based only on source data. The source data is represented in

New Space for Migration learning. Using this idea, we propose feature-based supervised migration learning and feature-based unsupervised migration learning.

2.1 feature-based supervised migration Learning

Our work on feature-based supervised migration learning is cross-domain classification based on mutual clustering [7]. The question of this work is: when a new and different

Fields, how to use a large number of labeled data in the original fields for migration and learning when labeled data is scarce. Cross-Domain classification based on mutual Clustering

In our work, we have defined a unified information theory formal formula for cross-domain classification issues, in which the classification problem based on mutual clustering is transformed into the most objective function

Optimization problems. In our model, the target function is defined as the loss of mutual information between the common feature space and the auxiliary data instance.

2.2 feature-based unsupervised migration learning: self-learning Clustering

The Self-Learning clustering algorithm [3] is a feature-based unsupervised migration learning task. The question we are considering here is: in reality, there may be markup aids.

Data is hard to obtain. In this case, how to use a large amount of unlabeled data to assist data migration and learning. The basic idea of self-learning clustering is

The source data and auxiliary data are clustered to get a common feature representation, and this new feature is better than the source data only because it is based on a large amount of auxiliary data.

The Feature Representation generated by the data to help the cluster.

The two learning strategies proposed above (feature-based supervised migration learning and unsupervised migration learning) solve the problem that the source data and auxiliary data are in the same feature space.

Feature-based migration Learning. When the source and secondary data are not in the same feature space, we also studied feature-based migration learning across feature spaces,

It also belongs to feature-based migration learning.

3 Migration learning in heterogeneous space: translation learning

Our translation learning [1] [5] is designed to solve the problem that the source data and test data belong to two different feature spaces. In [1], we use a lot

The labeled text data helps only a small number of labeled image classification problems, as shown in. Our method is based on the data with two perspectives.

Build a bridge between two feature spaces. Although these multi-view data may not be used for classification training data, they can be used to build Translation

. Through this translator, we combine the nearest neighbor algorithm and feature translation to translate the auxiliary data into the feature space of the source data, using a unified language model.

Type for learning and classification.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.