Dbscan algorithm is a commonly used data mining algorithm. All clustering methods are divided into several types, the Kmeans algorithm discussed above is clustering based on partitioning, and the Dbscan algorithm mentioned in this paper is based on density. Of course, the other is based on hierarchical cohesion and division of methods, model-based approach, and so on. I first introduce and analyze the Dbscan algorithm implemented in Weka, and then analyze the Dbscan method that I implement in C #. But before you explain a few concepts, if you have not known this algorithm before, it is best to familiarize yourself with several concepts: epsilon-neighborhood, Core objects, (direct) density, density, these concepts can be found in the "Data mining concept and technology" book, Understanding these concepts is important to understanding this algorithm.
Let's take a look at how the Dbscan algorithm is implemented in Weka:
Dbscan algorithm source code in the weka.clusterers of Weka in this package, the file name is Dbscan.java. The two methods of Buildclusterer and Expandcluster are the core methods. Buildclusterer is the interface method for all clustering methods, while Expandcluster is used to extend the high-density connection area of the collection of sample objects. There is also a method called Epsilonrangequery, which is used to query the collection of sample objects in the Epsilon neighborhood of the specified object in the database class.
In the Buildclusterer method, the Expandcluster method is invoked by calling the sample point of each cluster to find the largest collection of sample objects that are connected by the density at which the object begins. The main code processed in this method is as follows: When the Expandcluster method returns True, a cluster is formed and a cluster label is removed.
Weka.dbscan
while (iterator.hasNext()) {
DataObject dataObject = (DataObject) iterator.next();
if (dataObject.getClusterLabel() == DataObject.UNCLASSIFIED) {
if (expandCluster(dataObject)) {
clusterID++;
numberOfGeneratedClusters++;
}
}
}