The sequential algorithm (sequential algorithms) is a very simple clustering algorithm, most of which use all eigenvectors at least once or several times, and the final result depends on the order of the vectors participating in the algorithm. This clustering algorithm generally does not know the number of clusters of k, but it is possible to give a clustering number of the upper bound Q. In this paper, we will mainly introduce the basic order algorithm (sequential algorithmic Scheme,bsas) and several variants, and give the code implementation.
First look at bSAS, which requires user-defined parameters: the non-similarity threshold θ and the maximum allowable number of clusters Q and the clustering order. The basic idea of the algorithm: to consider each new vector, according to the distance from the vector to the existing cluster, it is assigned to an existing cluster, or a newly generated cluster.
Algorithm Example:
There are 10 patterns of sample points: {x1 (0 0), X2 (3 8), X3 (2 2), X4 (1 1), X5 (5 3), X6 (4 8), X7 (6 3), X8 (5 4), X9 (6 4), X10 (7 5)}
First step: Select any of the pattern samples as the first cluster center, such as Z1 = x1
Step Two: Select the distance z1 the farthest sample as the second cluster center.
by Calculation, | | X6-Z1 | | Max, so z2 = x6
Step three: Calculate the distance between each pattern sample {XI, i =,..., N} and {z1, z2}, i.e.
di1= | | XI-Z1 | |
di2= | | XI–Z2 | |
and select the Minimum distance min (Di1, Di2), i =,..., N
Fourth step: Select the maximum distance in the minimum value of all pattern samples, if the maximum value reaches | | Z1-Z2 | | , the corresponding sample points are taken as a third cluster center z3, i.e.
If Max{min (Di1, Di2), i =,..., N} >θ| | z1-z2 | |, then z3 = XI
Otherwise, if a suitable sample is not found as a new cluster center, the process of finding the cluster center ends.
Here, θ can use a heuristic method to take a fixed fraction, such as 1/2.
In this case, when i=7, the above conditions are met, so z3 = X7
Fifth step: If there is a Z3 exists, then calculate Max{min (Di1, Di2, Di3), i =,..., N}. If the value exceeds | | Z1-Z2 | | A certain percentage, there is Z4, otherwise the process of finding a cluster center ends.
In this case, no Z4 satisfies the condition.
Sixth step: Divide the pattern sample {XI, i =,..., N} by the closest distance to the nearest cluster center:
Z1 = x1:{x1, x3, x4} for the first class
z2 = x6:{x2, x6} is the second class
Z3 = x7:{x5, X7, x8, X9, x10} for the third class
Finally, we can calculate the mean value of each sample in each class and get a more representative cluster center.
The algorithm MATLAB implementation of the Code download link, comments very full ~ ~ Download link