This article from: http://hi.baidu.com/jiyeqian/blog/item/8926e81770f16f0b4
In the traditional machine learning framework, the learning task is to learn a classification model based on the given adequate training data. Then, the learned model is used.
To classify and predict test documents. However, we can see that machine learning algorithms have a key problem in the current Web Mining Research: some new
It is very difficult to obtain a large amount of training data in the field. We can see that the development of the Web application field is very fast. A large number of new fields are emerging, from traditional news to the Internet
Pages, images, blogs, podcasts, and so on. Traditional machine learning requires calibration of a large amount of training data for each field, which will consume a lot of manpower and material resources. No
A large amount of labeled data will make it impossible for many research and applications related to learning. Secondly, traditional machine learning assumes that training data and test data are subject to the same
Data Distribution. However, in many cases, this same distribution assumption is not satisfied. Generally, training data may expire. This often requires us to repeat the new logo
Note a large amount of training data to meet our training needs, but labeling new data is very expensive and requires a lot of manpower and material resources. From another perspective, if
We have a large amount of training data in different distributions, and it is a waste to discard the data completely. How to make proper use of the data is the main factor for Migration learning.
Solve the problem. Migration learning can migrate knowledge from existing data to help you learn in the future. The goal of Transfer Learning is
Knowledge learned by the environment is used to help learning tasks in the new environment. Therefore, migration learning does not make the same Distribution Assumption as traditional machine learning.
Our work on migration learning can be divided into the following three parts: instance-based migration learning in homogeneous space, feature-based migration learning in homogeneous space and
Migration learning in heterogeneous space. Our research points out that instance-based migration learning has stronger knowledge migration capabilities, and feature-based migration learning has more extensive knowledge.
Knowledge migration capability, while heterogeneous space migration has extensive learning and scalability. Each of these methods has its own merits.
1. instance-based migration learning in Homogeneous Space
The basic concept of instance-based migration learning is that although the auxiliary training data is more or less different from the source training data, the auxiliary training data should still be stored
Some of them are suitable for training an effective classification model and adapting to the test data. Therefore, our goal is to find out the suitable data from the auxiliary training
Test data instances, and migrate these instances to the learning of source training data. In terms of instance-based migration learning, we have promoted the traditionalAdaBoostComputing
Method, a boosting algorithm with the ability to migrate is proposed: tradw.sting [9], so that it has the ability to migrate and learn, so as to maximize the use of assistance
Training data to help classify the target. Our key idea is to use boosting technology to filter out the data that is most different from the source training data in the auxiliary data.
Among them, boosting is used to establish an automatic weight adjustment mechanism, so the weight of important auxiliary training data will increase, and that of unimportant auxiliary training data
The weight is reduced. After the weight is adjusted, the weighted auxiliary training data will be used as additional training data, which is used together with the source training data to increase the classification model
Reliability.
Instance-based migration learning can only happen when the source data is very similar to the secondary data. However, when the difference between source and auxiliary data is large
It is often difficult to find the knowledge that can be migrated. However, we found that even if the source and target data do not share some public data at the instance level
Common knowledge, which may have overlapping features. Therefore, we have studied feature-based migration learning, which discusses how to use the public
Knowledge to learn.
2. feature-based migration learning in Homogeneous Space
In terms of feature-based migration learning, we propose a variety of learning algorithms, such as CoCC algorithms [7], TPLSA algorithms [4], spectral analysis algorithms [2], and self-learning algorithms.
Method [3. Mutual clustering algorithms are used to generate a common feature representation to help you learn algorithms. Our basic idea is to use the mutual Clustering Algorithm for both source data
Clustering with auxiliary data to obtain a common feature representation. The new feature representation is better than the feature representation based only on source data. The source data is represented in
New Space for Migration learning. Using this idea, we propose feature-based supervised migration learning and feature-based unsupervised migration learning.
2.1 feature-based supervised migration Learning
Our work on feature-based supervised migration learning is cross-domain classification based on mutual clustering [7]. The question of this work is: when a new and different
Fields, how to use a large number of labeled data in the original fields for migration and learning when labeled data is scarce. Cross-Domain classification based on mutual Clustering
In our work, we have defined a unified information theory formal formula for cross-domain classification issues, in which the classification problem based on mutual clustering is transformed into the most objective function
Optimization problems. In our model, the target function is defined as the loss of mutual information between the common feature space and the auxiliary data instance.
2.2 feature-based unsupervised migration learning: self-learning Clustering
The Self-Learning clustering algorithm [3] is a feature-based unsupervised migration learning task. The question we are considering here is: in reality, there may be markup aids.
Data is hard to obtain. In this case, how to use a large amount of unlabeled data to assist data migration and learning. The basic idea of self-learning clustering is
The source data and auxiliary data are clustered to get a common feature representation, and this new feature is better than the source data only because it is based on a large amount of auxiliary data.
The Feature Representation generated by the data to help the cluster.
The two learning strategies proposed above (feature-based supervised migration learning and unsupervised migration learning) solve the problem that the source data and auxiliary data are in the same feature space.
Feature-based migration Learning. When the source and secondary data are not in the same feature space, we also studied feature-based migration learning across feature spaces,
It also belongs to feature-based migration learning.
3 Migration learning in heterogeneous space: translation learning
Our translation learning [1] [5] is designed to solve the problem that the source data and test data belong to two different feature spaces. In [1], we use a lot
The labeled text data helps only a small number of labeled image classification problems, as shown in. Our method is based on the data with two perspectives.
Build a bridge between two feature spaces. Although these multi-view data may not be used for classification training data, they can be used to build Translation
. Through this translator, we combine the nearest neighbor algorithm and feature translation to translate the auxiliary data into the feature space of the source data, using a unified language model.
Type for learning and classification.