Analysis of Dbscan algorithm in Weka and its implementation in C #

Source: Internet
Author: User

Dbscan algorithm is a commonly used data mining algorithm. All clustering methods are divided into several types, the Kmeans algorithm discussed above is clustering based on partitioning, and the Dbscan algorithm mentioned in this paper is based on density. Of course, the other is based on hierarchical cohesion and division of methods, model-based approach, and so on. I first introduce and analyze the Dbscan algorithm implemented in Weka, and then analyze the Dbscan method that I implement in C #. But before you explain a few concepts, if you have not known this algorithm before, it is best to familiarize yourself with several concepts: epsilon-neighborhood, Core objects, (direct) density, density, these concepts can be found in the "Data mining concept and technology" book, Understanding these concepts is important to understanding this algorithm.

Let's take a look at how the Dbscan algorithm is implemented in Weka:

Dbscan algorithm source code in the weka.clusterers of Weka in this package, the file name is Dbscan.java. The two methods of Buildclusterer and Expandcluster are the core methods. Buildclusterer is the interface method for all clustering methods, while Expandcluster is used to extend the high-density connection area of the collection of sample objects. There is also a method called Epsilonrangequery, which is used to query the collection of sample objects in the Epsilon neighborhood of the specified object in the database class.

In the Buildclusterer method, the Expandcluster method is invoked by calling the sample point of each cluster to find the largest collection of sample objects that are connected by the density at which the object begins. The main code processed in this method is as follows: When the Expandcluster method returns True, a cluster is formed and a cluster label is removed.

Weka.dbscan

while (iterator.hasNext()) {
    DataObject dataObject = (DataObject) iterator.next();
    if (dataObject.getClusterLabel() == DataObject.UNCLASSIFIED) {
        if (expandCluster(dataObject)) {
            clusterID++;
            numberOfGeneratedClusters++;
         }
    }
}

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.