MATLAB Clustering analysis (Cluster analyses)

Source: Internet
Author: User

Matlab provides a series of functions for clustering analysis, summed up the specific methods are as follows:
method One: direct clustering, using Clusterdata function to cluster the sample data, its disadvantage is that the user can choose a narrow surface, can not change the calculation method of distance, the user of this method does not need to understand the principle and process of clustering, However, the clustering effect is restricted.
method Two: Hierarchical clustering, the method is more flexible, need to carry out detailed understanding of the clustering principle, the following process needs to be processed: (1) Find the similarity and non-similarity between the variables 22 in the data set, and use the Pdist function to calculate the distance between the variables ; (2) The linkage function is used to define the connection between variables, (3) to evaluate the cluster information with cophenetic function, (4) to create a cluster with the cluster function.
method Three: dividing clustering, including K-means clustering and K-centric clustering, also requires a series of steps to complete the process, requiring users to have a clearer understanding of the principle and process of clustering.
Next, we introduce the related functions and related clustering methods in MATLAB.
1. Introduction to the related functions in MATLAB
1.1 pdist function
Call Format: Y=pdist (X, ' metric ') Description: Calculates the distance between objects in the X data matrix using the method specified by ' metric '. X: A matrix of MXN, which is a dataset of M objects, each of which has a size of N. metric ' values are as follows:' Euclidean ': Euclidean distance (default); ' Seuclidean ': standardized Euclidean distance;' Mahalanobis ': Markov distance; ' Cityblock ': Braddock distance;' Minkowski ': Minkowski distance; ' cosine ':' correlation ':' jaccard ': ' Chebychev ': Chebychev distance.
1.2 squareform function
Call Format: Z=squareform (Y,..)
1.3 Linkage function
Call Format: Z=linkage (Y, ' method ') Input Value Description:y is the pdist function returns the m* (M-1)/2 elements of the line vector, using the ' method ' parameter specified by the algorithm to compute the cluster tree. method: The following values are desirable:' single ': Shortest distance method (default);' complete ': the longest distance method;' average ': non-weighted mean distance method ;' weighted ': Weighted average method;' centroid ': centroid distance method;' median ': weighted centroid distance method;' ward ': Inner square Distance method (minimum variance algorithm)The return value Description: Z is a matrix containing the cluster tree information (m-1) x3, where the first two columns are indexed, indicating which two ordinal samples can be clustered into the same class, The third column is the distance between the two samples. In addition, for each newly generated class, in addition to the M samples, m+1, m+2 are used sequentially 、... to identify. to represent the Z-matrix, we can use a more intuitive clustering number to display, the method is: Dendrogram (Z), the resulting cluster number is an n-type tree, the bottom of the sample, and then the first level of clustering, and eventually become the top class. The vertical axis height represents the distance column. In addition, you can set the number of samples at the bottom of the number of clusters, the default is 30, can be modified according to Dendrogram (z,n) parameter n to achieve, 1<n<m. Dendrogram (z,0) shows all leaf nodes in the case of table n=m.
1.4 Dendrogram Function
call Format: [H,t, ...] =dendrogram (z,p, ...)Description: Generates an icicle Chart (pedigree chart) with only the top p nodes.
1.5 cophenetic function
Call Format: C=cophenet (z,y) Description: Calculates the cophenet correlation coefficients using the z calculation generated by the Y and linkage functions generated by the Pdist function. Cophene the degree of matching between the two-fork cluster tree and the actual condition produced under certain algorithm is to detect the correlation between the distance between each element in the binary cluster tree and the actual distance generated by the pdist calculation. You can also use inconsistent to quantify the difference between nodes in a hierarchical cluster.
1.6 Cluster function
Call Format: T=cluster (Z,...) Description: Creates a classification based on the output Z of the linkage function.
1.7 clusterdata function
Call Format: T=clusterdata (X,...) Description: Creates a classification based on data. T=clusterdata (X,cutoff) is equivalent to the following set of commands:y=pdist (X, ' Euclid ');z=linkage (Y, ' single ');T=cluster (z,cutoff);
2. The design of MATLAB Clustering program
2.1 Method One: One-time clustering methodx=[11978 12.5 93.5 31908;..; 57500 67.6 238.0 15900];T=clusterdata (x,0.9)
2.2 Method Two and method three design flow: Step by stage ClusteringStep1using Pdist function to calculate the similarity matrix, there are many ways to calculate the distance, it is best to standardize the data with Zscore function before calculating. X2=zscore (X);y2=pdist (X2);% calculated distanceSTEP2z2=linkage (Y2);STEP3c2=cophenet (z2,y2);//0.94698STEP4 create clusters, and make genealogy mapsT=cluster (z2,6);
MATLAB provides functions for cophenet, inconsistent, etc. that represent correlations. Cophenet and inconsistent are used to calculate some coefficients, the former is used to test the two-fork cluster tree produced by a certain algorithm and the actual degree of compliance (that is, the detection of the distance between the elements in a binary cluster tree and the actual distance generated by the pdist calculation), Inconsistent is the quantification of the differences between nodes on a hierarchical cluster (which can be used as a clipping standard for cluster).

Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

MATLAB Clustering analysis (Cluster analyses)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.