Use MATLAB for Cluster Analysis

Source: Internet
Author: User

Reprinted 1:

MATLABTwo clustering analysis methods are provided:

1, UtilizationClusterdataThis method is simple and convenient for clustering data samples. It is characterized by a narrow range of values. You cannot set parameters based on your own needs and change the distance calculation method;

2, Step-by-step clustering :(1) UsePdistCalculate the distance between variables and find the similarity and non-similarity between the two variables in the dataset ;(2) UseLinkageFunction defines the connection between variables ;(3) UseCopheneticFunction Evaluation clustering information ;(4) UseClusterFunction for clustering.

The following describes two methods:

1, One-time Clustering

ClusterdataFunctions can be consideredPdist,LinkageAndClusterIs generally relatively simple.

【ClusterdataFunction:

Call format:T = clusterdata (x, cutoff)

Equivalent to Y = pdist (x, 'euclid'); Z = linkage (Y, 'single '); t = cluster (z, cutoff)]

2, Step-by-step Clustering

(1) Obtain the similarity between variables.

UsePdistThe function calculates the similarity matrix, and there are multiple ways to calculate the distance. If the previous data is not infinitely structured, it is available.ZscoreFunction Standardization

【PdistFunction:Call format:Y = pdist (x, 'metric ')

Note:XYesM * nMatrix.MSample composition, each sample hasNFields.

MetircValue :'Euclidean': Euclidean distance (default)'Seuclidean': Standardized Euclidean distance;'Mahalanobis': Markov distance...]

PdistGenerateM * (M-1)/2The row vectors of each element.MThe distance between two samples. This can reduce the storage space, but it is not easy for readers. Therefore, if you want to express it in a simple and intuitive way, you can useSquareformThe function converts it into a square matrix.X (I, j)IndicatesISamples andJThe distance between samples. The diagonal lines are0.

(2) UseLinkageFunction to generate a clustering tree

【LinkageFunction:Call format:Z = linkage (Y, 'method ')

Note:YIsPdistFunction returnM * (M-1)/2Element line vector,

MethodOptional values:'Singles': Shortest Path (default );'Complete': The longest distance method;

'Average': Unweighted average distance method;'Weighted ':Weighted average method

'Centroid': Centroid distance method;'Mediance': Weighted centroid distance method;

'Ward': Inner square distance (minimum variance)Algorithm)]

ReturnedZForM-1 * 3The first two columns are index identifiers, indicating which two serial numbers of samples can be grouped into the same class, and the third column is the distance between the two samples. In additionMFor each new classM + 1,M + 2,....

To indicateZMatrix, we can use a more intuitive number of clusters to display,Method:Dendrogram (z ),The number of clusters generated isNTree, bottom of which indicates the sample, and then first-level clustering to the top of the class. The vertical axis height indicates the distance column.

In addition, you can also set the number of samples at the bottom of the number of clusters. The default value is30, Can be modified accordingDendrogram (z, n)ParametersNTo achieve,1 <n <m.Dendrogram (z, 0)Then the tableN = mAll leaf nodes are displayed.

(3) UseCopheneticFunction evaluation cluster information

【CophenetFunction:Call format:C = cophenetic (z, Y)

Description: ExploitationPdistFunction generationYAndLinkageFunction generationZComputingCophenetCorrelation Coefficient .]

CopheneVerify the consistency between the binary clustering tree generated by a certain algorithm and the actual situation.,It is used to detect the distance andPdistThe correlation between the actual distance produced by the calculation. It can also be usedInconsistentQuantize the differences between nodes on a certain level of clustering.

(4) Finally, useClusterReturns the clustering column.

 

 

Reprint 2:

MATLABTwo clustering analysis methods are provided.

One is to useClusterdataThe function clustering sample data once. The disadvantage is that the user can select a narrow area and the distance cannot be changed;

The other is step-by-step clustering :(1) To find the similarity and non-similarity between the two variables in the dataset.PdistDistance between function compute variables ;(2) UseLinkageFunction defines the connection between variables ;(3) UseCopheneticFunction Evaluation clustering information ;(4) UseClusterFunction creation clustering.

1.MATLABRelated functions

1.1PdistFunction

Call format:Y = pdist (x, 'metric ')

Note: Use'Metric'Calculation of the specified MethodXThe distance between objects in the data matrix.'

X: OneM×NMatrix, which is composedMA dataset composed of objects. The size of each object isN.

Metric'The values are as follows:

'Euclidean': Euclidean distance (default );'Seuclidean': Standardized Euclidean distance;

'Mahalanobis': Markov distance;'Cityblock': Block distance;

'Minkow': Minovsky distance;'Cosine':

'Correlation':               'Hamming':

'Jaccard':                 'Chebychev':ChebychevDistance.

1.2SquareformFunction

     Call format:Z = squareform (Y ,..)

     Note: The distance matrix is converted from the top triangle to the square matrix, or from the square matrix to the top triangle.

1.3LinkageFunction

Call format:Z = linkage (Y, 'method ')

Description   Note: Use'Method'Algorithm specified by the parameter calculates the system clustering tree.

  Y:PdistDistance Vector returned by the function;

  Method: Optional values:

 'Singles': Shortest Path (default ); 'Complete': The longest distance method;

'Average': Unweighted average distance method; 'Weighted': Weighted average method;

'Centroid': Centroid distance method;     'Mediance': Weighted centroid distance method;

'Ward': Inner square distance (Least Variance algorithm)

Return Value:ZIs (M-1) ×3.

1.4DendrogramFunction

Call format:[H,T,…] = Dendrogram (z, P,...)

Note: Only the top part is generated.PNode (Family chart ).

1.5CophenetFunction

Call format:C = cophenetic (z, Y)

Description: ExploitationPdistFunction generationYAndLinkageFunction generationZComputingCophenetCorrelation coefficient.

1.6Cluster Function

Call format:T = cluster (z ,...)

Note: accordingLinkageFunction outputZCreate a category.

1.7ClusterdataFunction

Call format:T = clusterdata (x ,...)

Description: Creates a category based on the data.

T = clusterdata (x, cutoff)It is equivalent to the following command group:

Y = pdist (x, 'euclid ');

Z = linkage (Y, 'singles ');

T = cluster (z, cutoff );

2. MATLABProgram

2.1 One-time Clustering

X = [11978 12.5 93.5 31908 ;...; 57500 67.6 238.0 15900];

T = clusterdata (x, 0.9)

2.2 Step-by-Step Clustering

Step 1 Search for similarity between variables

UsePdistFunction compute similarity matrix, there are multiple ways to calculate the distance, it is best to use the data before CalculationZscoreFunction standardization.

X2 = zscore (X );%Standardized data

Y2 = pdist (X2 );%Computing distance

Step 2  Define the connection between variables

Z2 = linkage (Y2 );

Step 3 Rating clustering Information

   C2 = cophenet (Z2, Y2 );     // 0.94698

Step 4 Create a cluster and make a spectral chart

    T = cluster (Z2, 6 );

    H = dendrogram (Z2 );

Classification Result:{Canada},{China, USA, Australia},{Japan and Indonesia},{Brazil},{Former Soviet Union}

The remainder is a class. From: http://blog.sina.com.cn/s/blog_5a13cf680100aj18.html

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.