As mentioned earlier in the article, the K-means algorithm, the first step is to find the sample points of the cluster. The following are implemented in two ways, one is the normal cycle, and the other is the full vectorization calculation.
Assume:
X is the MXN sample matrix, each row is a sample, m represents the number of samples, n indicates the number of features;
Centroids is the Kxn Matrix, K represents the number of clusters, n represents the number of features, and each row is the center of a cluster.
The IDX is the MX1 matrix, and the IDX (i) indicates the cluster subscript to which the sample I is a member. (Value range 1..K)
Semi-cyclic semi-vectorization is implemented as follows:
Thought: Loop through each sample point, calculate the value of each sample point distance K Cluster Center, and take the lowest value of the cluster subscript
For i = 1:size (x, 1) dif = Bsxfun (@minus, X (i,:), centroids); [ W, IW] = min (sum (dif. * dif, 2)); IDX (i,:) = iw;endfor
Fully vectorization implementation:
Thought: First, constructs two mxnxk matrix, the first matrix is the sample point value, the second matrix is the cluster center value;
Then, calculate the value of the sample point distance to the cluster center and find the lowest value of the cluster subscript.
X_ext = Bsxfun (@plus, X, zeros ([Size (x), K]); centroids_ext = Permute (Centroids, [3, 2, 1]);d If_ext = Bsxfun (@minus, X_ext , Centroids_ext); [~, ix] = min (sum (dif_ext. * Dif_ext, 2), [], 3); idx = IX;
This completely vectorization code is difficult to understand and less concise than the half-cycle half-vector implementation of the above. Speed efficiency is also not known and will not be faster. Kit Kat kinky techniques.
Octave's Kit Kat. To find out the clustering index of the sample points by the technique vectorization calculation