Clustering Concept

Clustering Concept _ algorithm

Last Update:2018-08-22 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

As the saying goes: "Birds of a feather flock together", in the natural science and social sciences, there are a large number of classification problems. The so-called class, in layman's sense, refers to the collection of similar elements. Cluster analysis, also called Group Analysis, is a statistical analysis method for the study of classification problem (sample or index). Cluster analysis originated from taxonomy, in ancient taxonomy, people rely mainly on experience and expertise to achieve classification, rarely using mathematical tools for quantitative classification. With the development of human science and technology, the requirement for classification is so high that sometimes it is difficult to classify it only by experience and professional knowledge, so people gradually refer the mathematical tools to taxonomy, form the numerical taxonomy, and then introduce the technique of multivariate analysis into the numerical taxonomy to form the cluster analysis. The content of cluster analysis is very rich, such as systematic clustering method, ordered sample clustering method, dynamic clustering method, fuzzy clustering method, Graph theory clustering method, clustering prediction method and so on.

Clustering analysis and calculation methods are as follows:

1. Splitting method (partitioning methods): Given a DataSet with N tuples or records, the splitting method constructs K groupings, each grouping representing a cluster, k<n. And these k groupings meet the following conditions: (1) Each grouping contains at least one data record; (2) Each data record belongs to and belongs to only one group (note: This requirement can be relaxed in some fuzzy clustering algorithms); for a given k, the algorithm first gives an initial grouping method, Later, through iterative methods to change the grouping, so that after each improvement of the grouping scheme is better than the previous one, and the so-called Good standard is: the same group of records closer to the better, and the different groups in the record as far as possible. The algorithm that uses this basic idea has: K-means algorithm, k-medoids algorithm, Clarans algorithm;

2. Hierarchy method (Hierarchical methods): This method decomposes the given dataset hierarchically until a certain condition is satisfied. Concrete can be divided into "bottom-up" and "top-down" two kinds of programs. For example, in a bottom-up scenario, each data record in the initial form consists of a separate group, and in the next iteration, it merges the neighboring groups into one group until all the records are grouped together or some condition is satisfied. The representative algorithm has: Birch algorithm, cure algorithm, chameleon algorithm, etc.

3. A density based approach (density-based methods): A fundamental difference between a density-based approach and other methods is that it is based on density rather than on a variety of distances. In this way, we can overcome the shortcoming that the algorithm based on distance can only find the clustering of "circle-like". The idea of this method is that as long as the density of a point in an area is greater than a certain threshold, it is added to the cluster similar to the one. The representative algorithm has: Dbscan algorithm, optics algorithm, denclue algorithm, etc.

4. Grid based Approach (grid-based methods): This method first divides the data space into the grid structure of a finite unit (cell), and all processing is based on a single unit. One of the outstanding advantages of this process is that it is very fast, usually unrelated to the number of records in the target database, and it only relates to how many units the data space is divided into. The representative algorithm has: Sting algorithm, clique algorithm, Wave-cluster algorithm;

5. Model-based approach (model-based methods): A model-based approach assumes a model for each cluster and then searches for a dataset that satisfies the model. Such a model might be the density distribution function of a data point in space or something else. A potential assumption is that the target dataset is determined by a series of probability distributions. There are usually two ways of trying: a statistical scheme and a neural network scheme.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Clustering Concept _ algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Clustering Concept _ algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support