Semi-supervised learning
As the name implies is a learning paradigm between classification (supervised learning) and clustering (unsupervised learning). Given the class label of a very small number of samples, how to improve the accuracy of clustering by using the data with the class tag is the subject of research. The label Propagation (label propagation) algorithm based on graph is one of the influential algorithms.
UCI Machine Learning Database: http://archive.ics.uci.edu/ml/
Principle: the label (category) of the object of a test case is the same as the label of the object near it, and we can set a weight for two objects based on distance or similarity. If the distance between A and B is smaller or the similarity is greater, the label of a is more likely to be the same as B. Label propagation algorithm Flow: 1. Construction of Similarity matrix W structure diagram
G=(
V,
E): node-set
VRepresents a collection of data points |
V|=
N, Edge Set
EThe set of vertex pairs, the weights on the edges
WRepresents the similarity between the two, can be set as follows: Here, α is a parameter, can be set appropriately 1~10, can also be set by the experiment a suitable value. If you do not use a full diagram, you can construct
k-
NNGraph (that is, a point is only its
kThe nearest neighbor has a side), but this may cause the resulting graph to be not connected, to be processed to make it connected. 2. Construct the transfer probability matrix, which can be set as follows: 3. Construct the Data matrix
- Assuming there are C classes and L labeled samples, we define a l′C label Matrix YL, and line I represents the label indicator vector for the i sample, that is, if the first the class of I samples is J, then the first J element of the row is 1 and the other is 0.
- For a given U-unlabeled sample, construct a u′ C label Matrix Yu (the value is arbitrarily set).
- YL and Yu combine to get an n-C (l+u=n) matrix F=[yl; YU].
4. Propagation algorithms
- (1) Execution propagation: F (i+ 1) = PF (i);
- (2) Reset the label of the labeled Sample in F: fl= YL;
- (3) Repeat steps (1) and (2) until F converges;
- (4) for each data of U, set its class to correspond to the maximum probability value of the corresponding row in F;
UCI label Propagation algorithm