Help documentation-Translation-statistics toolbox-exploratory Data analysis-cluster analysis-hierarchical Clustering (cluster,clusterdata) ( 1)

Last Update:2015-09-23 Source: Internet

Author: User

Tags crosstab

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Hierarchical clustering

produce nested sets of clusters

Function

Cluster	Condensation clustering based on aggregation hierarchical clustering tree
Clusterdata	Constructing condensation clustering based on sample data
Cophenet	Cophenet correlation coefficient
Inconsistent	Inconsistent coefficient
Linkage	Condensed Hierarchical clustering Tree
Pdist	22 average of distance between objects
Sequentialfs	Characteristic selection of sequential sequence
Squareform	to the distance matrix format

Cluster

Condensation clustering based on aggregation hierarchical clustering tree

Grammar

T = cluster (Z, ' cutoff ', c)

T = cluster (Z, ' cutoff ', C, ' depth ', D)

T = cluster (Z, ' cutoff ', C, ' criterion ', criterion)

T = cluster (Z, ' Maxclust ', N)

Describe

T = cluster (z, ' Cutoff ', c) constructs a cluster from a condensed hierarchical cluster tree Z, where z is generated by the linkage function. Z is a matrix of m-1 row 3 columns, where M is the number of observations in the original data. C is the threshold for slicing z into clusters. If a node and all its child nodes have inconsistent values less than C, then a cluster is formed. All leaf nodes on or under the node are merged into a cluster. T is an M-dimensional vector that contains the allocation of each observation value.

If c is a vector, T is a cluster allocation matrix. Each of these cutoff values corresponds to a column cluster allocation in the matrix.

T = cluster (Z, ' cutoff ', C, ' depth ', D) find the inconsistent value by finding the D layer below each node. The default number of layers is 2.

T = cluster (Z, ' cutoff ', C, ' criterion ', criterion) uses the established criteria to form a cluster, where criterion is ' inconsistent ' (default) or ' distance '. The ' distance ' standard measures the height of a node by merging the distances between the two child nodes of a node. If the height of all leaf nodes on a node and below it is less than C, they are combined into one cluster.

T = cluster (Z, ' Maxclust ', n) constructs the maximum value of an n cluster using the ' distance ' standard. Cluster found a minimum height at which the horizontal cutting tree had n or smaller clusters.

If n is a vector, T is a matrix, and each maximal value corresponds to a column in the matrix.

Example

Compare the Anderson Iris Floral data set with the species category

Load Fisheriris

D = pdist (MEAs);

Z = linkage (d);

c = Cluster (Z, ' maxclust ', 3:5);

Crosstab (C (:, 1), species)

Ans =

0 0 2

0 50 48

50 0 0

Crosstab (C (:, 2), species)

Ans =

0 0 1

0 50 47

0 0 2

50 0 0

Crosstab (C (:, 3), species)

Ans =

0 4 0

0 46 47

0 0 1

0 0 2

50 0 0

Clusterdata

Aggregation Clustering of data

Grammar

T = Clusterdata (X,cutoff)

T = Clusterdata (x,name,value)

Describe

T = Clusterdata (X,cutoff)

T = Clusterdata (X,name,value) has one or more names, and the value parameter sets the cluster to another special option.

Input parameters

X	A matrix with a row number greater than or equal to 2. Each row represents an observation value, and each column represents a category or dimension.
Cuttoff	When 0<cutoff<2, Clusterdata form a cluster, making all inconsistent values greater than cutoff. When cutoff is an integer greater than or equal to 2, Clusterdata understands cutoff as a cluster generated by linkage, so that the maximum value of the cut can remain in the cluster tree.

Name-numeric parameter pair

Specifies a comma-delimited name, with the value optional parameter pair. Name is the name of the parameter, and value is the appropriate value. The name must be quoted (') outside. You can specify some name-value parameters in any order, such as Name1,value1,..., Namen,valuen.

Input parameters

' Criterion '

' Inconsistent ' or ' distance '

' Cutoff '

The truncated value of the inconsistent or distance metric is a positive scalar. When 0<cutoff<2, Clusterdata form a cluster, making all inconsistent values greater than cutoff. When cutoff is an integer greater than or equal to 2, Clusterdata understands cutoff as a cluster generated by linkage, so that the maximum value of the cut can remain in the cluster tree.

' Depth '

The depth is used to calculate the inconsistent value, which is a positive integer.

' Distance '

Any pdist recognized distance metric name (' Minkowski ' option followed by exponential value p):

Measure	Describe
' Euclidean '	Euclidean distance (default value)
' Seuclidean '	Standardized Euclidean distance. The difference of each coordinate between the X rows is resized by dividing the corresponding value of the standard deviation s=nanstd (x). If you want to specify an additional value for S, use D=pdist (X, ' Seuclidean ', s).
' Cityblock '	Urban block metrics.
' Minkowski '	Minkowski distance. The default index is 2. To also know an exponent, use D=pdist (X, ' Minkowski ', p), where p is the exponential value and is a positive scalar value.
' Chebychev '	Chebyshev Snow Distance (coordinate difference).
' Mahalanobis '	The Markov distance, as Nancov, calculates the sample covariance with X. If you are using another covariance, use d= (X, ' Mahalanobis ', c), where C is a positive definite symmetric matrix.
' Cosine '	1 the Cos value minus the angle between two points (considered vectors)
' Correlation '	1 minus the correlation coefficients between two points (as vectors)
' Spearman '	Spearman rank correlation coefficient between 1 minus two observations (as a sequence of values)
' Hamming '	Hamming distance, the ratio of different value coordinates.
User Distance function	Distance function specified by @: D = Pdist (X, @disfunctional) A distance function must be in the following form: D2 = Distfun (XI,XJ) As a parameter, one is a vector XI of 1 rows n columns, one row for x, and a matrix XJ with a M2 row n column, corresponding to multiple lines of X. Distfun must accept any number of XJ matrix rows. The Distfun must return a 1-dimensional vector of m2 lines at a distance of D2, where the k element is the distance of the Xi and XJ (K,:).

' Linkage '

Any linkage method allowed by the linkage function:

' Average '

' Centroid '

' Complete '

' Median '

' Single '

' Ward '

' Weighted '

' Maxclust '

The maximum number of clusters, which is a positive integer.

' Savememory '

A string that is ' on ' or ' off '. When available, the ' on ' setting allows Cluserdata to construct clusters without calculating the distance matrix. Savememory when the following conditions are available:

Linkage is ' centroid ', ' median ' or ' ward '

Distance is ' Euclidean ' (default)

When Savememory is ' on ', the linkage run time and the number of dimensions (number of columns in x) are proportional. When Savememory is ' off ', the demand for linkage memory is proportional to N2, where n is the number of observations. The best (and least time-consuming) savememory settings for all choices depend on the dimension of the problem, the number of observations, or the available memory. The default Savememory setting is a rough approximation of the optimal setting.

Default: ' On ' When x has less than or equal to 20 columns, or if the computer does not have enough memory to store the distance matrix;

Help documentation-Translation-statistics toolbox-exploratory Data analysis-cluster analysis-hierarchical Clustering (cluster,clusterdata) ( 1)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More