Cluster Analysis-R language

Source: Internet
Author: User

1. Randomly generate three clusters of points:

> C1<-cbind (Rnorm (30,2,1), Rnorm (30,2,1))

> C2<-cbind (Rnorm (30,3,1), Rnorm (30,20,1))

> C3<-cbind (Rnorm (30,15,1), Rnorm (30,25,1))

> V=rbind (C1,C2,C3)

View distribution Status

> Plot (v)

Figure 1 The resulting random data

2, K cluster

K-centric point algorithms like Pam (Common K-means,k-medois, and so on) run well on small datasets but do not work well for large data sets. To handle large datasets, a sampling-based approach called Clara (Cluster Large application) is often used. Clara does not consider the entire data set, but instead uses a random sample of the dataset, and then uses the Pam method to calculate the best center point from the sample.

This experiment is mainly used in the cluster package inside the Clara function.

> Clara (v,3)
Call:  
Medoids:
[, 1] [, 2]
[1,] 2.067384 1.761579
[2,] 3.037691 20.208036
[3,] 15.310366 25.211417
Objective function:  1.236222
Clustering Vector:int [1:90] 111111111111111111...
Best Sample:
[1] 2 4 5 6 7 11 12 13 23 24 25 26 27 29 32 34 37 41 42 43 44 45 47 49 51 52 53 54 57
[30] 59 60 61 62 63 64 65 67 74 75 77 81 82 83 84 85 89
Available components:

Show Results:

> Cls3<-clara (v,3)
> Clusplot (CLS3)

Fig. 2 The result of the classification number k takes 3

K-means A flaw is the need to manually specify the number of clusters K, if the K value is not well specified, the effect of clustering is not very good.

Fig. 3 The results of the classification K to 2,3,4,5 respectively

3. Hierarchical clustering

Hierarchical clustering method divides data objects into hierarchies or "trees" of clusters, which are divided into two kinds of hierarchical analytic methods, that is, the bottom-up and top-down strategies are used to organize the objects into the hierarchical structure.

A big problem with splitting methods is how to divide a large cluster into smaller clusters. The set of N objects can be divided into two mutually exclusive 2n-1-1 methods, when n is very large, the computational amount is very large, so the splitting method is usually divided by heuristic method, but results are inaccurate, and for efficiency, the splitting method does not backtrack on the division decision that has been made. For these reasons, condensation methods are generally more used than splitting methods.

3.1 Condensed hierarchical Clustering (agglomerative hierarchical clustering)

The Agnes (agglomerative Nesting) hierarchical clustering algorithm in the cluster package is used in this experiment.

Agnsingle<-agnes (Daisy (v), diss=true,method= "single")
> Agncomplete<-agnes (Daisy (v), diss=true,method= "complete")
> Agnaverage<-agnes (Daisy (v), diss=true,method= "average")
> Plot (agnsingle)
> Plot (agncomplete)
> Plot (agnaverage)

Fig. 4 cluster result tree with minimum similarity between clusters

Fig. 5 Clustering Results tree graph with maximum similarity between clusters

Fig. 6 Cluster result tree with average similarity between clusters

3.2 Division hierarchy Clustering (divisive hierarchical clustering)

In this experiment, Diana (divisive analysis) hierarchical clustering algorithm was used in the cluster package.

> Dv<-diana (v)
> Plot (DV)

Fig. 7 Hierarchical clustering result tree graph

Cluster Analysis-R language

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.