Help documentation-Translation-statistics toolbox-exploratory Data analysis-cluster analysis-hierarchical Clustering (cluster,clusterdata) ( 1)

Source: Internet
Author: User
Tags crosstab

Hierarchical clustering

produce nested sets of clusters

Function

Cluster Condensation clustering based on aggregation hierarchical clustering tree
Clusterdata Constructing condensation clustering based on sample data
Cophenet Cophenet correlation coefficient
Inconsistent Inconsistent coefficient
Linkage Condensed Hierarchical clustering Tree
Pdist 22 average of distance between objects
Sequentialfs Characteristic selection of sequential sequence
Squareform to the distance matrix format

Cluster

Condensation clustering based on aggregation hierarchical clustering tree

Grammar

T = cluster (Z, ' cutoff ', c)

T = cluster (Z, ' cutoff ', C, ' depth ', D)

T = cluster (Z, ' cutoff ', C, ' criterion ', criterion)

T = cluster (Z, ' Maxclust ', N)

Describe

T = cluster (z, ' Cutoff ', c) constructs a cluster from a condensed hierarchical cluster tree Z, where z is generated by the linkage function. Z is a matrix of m-1 row 3 columns, where M is the number of observations in the original data. C is the threshold for slicing z into clusters. If a node and all its child nodes have inconsistent values less than C, then a cluster is formed. All leaf nodes on or under the node are merged into a cluster. T is an M-dimensional vector that contains the allocation of each observation value.

If c is a vector, T is a cluster allocation matrix. Each of these cutoff values corresponds to a column cluster allocation in the matrix.

T = cluster (Z, ' cutoff ', C, ' depth ', D) find the inconsistent value by finding the D layer below each node. The default number of layers is 2.

T = cluster (Z, ' cutoff ', C, ' criterion ', criterion) uses the established criteria to form a cluster, where criterion is ' inconsistent ' (default) or ' distance '. The ' distance ' standard measures the height of a node by merging the distances between the two child nodes of a node. If the height of all leaf nodes on a node and below it is less than C, they are combined into one cluster.

T = cluster (Z, ' Maxclust ', n) constructs the maximum value of an n cluster using the ' distance ' standard. Cluster found a minimum height at which the horizontal cutting tree had n or smaller clusters.

If n is a vector, T is a matrix, and each maximal value corresponds to a column in the matrix.

Example

Compare the Anderson Iris Floral data set with the species category

Load Fisheriris

D = pdist (MEAs);

Z = linkage (d);

c = Cluster (Z, ' maxclust ', 3:5);

Crosstab (C (:, 1), species)

Ans =

0 0 2

0 50 48

50 0 0

Crosstab (C (:, 2), species)

Ans =

0 0 1

0 50 47

0 0 2

50 0 0

Crosstab (C (:, 3), species)

Ans =

0 4 0

0 46 47

0 0 1

0 0 2

50 0 0

Clusterdata

Aggregation Clustering of data

Grammar

T = Clusterdata (X,cutoff)

T = Clusterdata (x,name,value)

Describe

T = Clusterdata (X,cutoff)

T = Clusterdata (X,name,value) has one or more names, and the value parameter sets the cluster to another special option.

Input parameters

X A matrix with a row number greater than or equal to 2. Each row represents an observation value, and each column represents a category or dimension.
Cuttoff When 0<cutoff<2, Clusterdata form a cluster, making all inconsistent values greater than cutoff. When cutoff is an integer greater than or equal to 2, Clusterdata understands cutoff as a cluster generated by linkage, so that the maximum value of the cut can remain in the cluster tree.

Name-numeric parameter pair

Specifies a comma-delimited name, with the value optional parameter pair. Name is the name of the parameter, and value is the appropriate value. The name must be quoted (') outside. You can specify some name-value parameters in any order, such as Name1,value1,..., Namen,valuen.

Input parameters

' Criterion ' ' Inconsistent ' or ' distance '
' Cutoff ' The truncated value of the inconsistent or distance metric is a positive scalar. When 0<cutoff<2, Clusterdata form a cluster, making all inconsistent values greater than cutoff. When cutoff is an integer greater than or equal to 2, Clusterdata understands cutoff as a cluster generated by linkage, so that the maximum value of the cut can remain in the cluster tree.
' Depth ' The depth is used to calculate the inconsistent value, which is a positive integer.
' Distance '

Any pdist recognized distance metric name (' Minkowski ' option followed by exponential value p):

Measure Describe
' Euclidean ' Euclidean distance (default value)
' Seuclidean ' Standardized Euclidean distance. The difference of each coordinate between the X rows is resized by dividing the corresponding value of the standard deviation s=nanstd (x). If you want to specify an additional value for S, use D=pdist (X, ' Seuclidean ', s).
' Cityblock ' Urban block metrics.
' Minkowski ' Minkowski distance. The default index is 2. To also know an exponent, use D=pdist (X, ' Minkowski ', p), where p is the exponential value and is a positive scalar value.
' Chebychev ' Chebyshev Snow Distance (coordinate difference).
' Mahalanobis ' The Markov distance, as Nancov, calculates the sample covariance with X. If you are using another covariance, use d= (X, ' Mahalanobis ', c), where C is a positive definite symmetric matrix.
' Cosine ' 1 the Cos value minus the angle between two points (considered vectors)
' Correlation ' 1 minus the correlation coefficients between two points (as vectors)
' Spearman ' Spearman rank correlation coefficient between 1 minus two observations (as a sequence of values)
' Hamming ' Hamming distance, the ratio of different value coordinates.
User Distance function

Distance function specified by @:

D = Pdist (X, @disfunctional)

A distance function must be in the following form:

D2 = Distfun (XI,XJ)

As a parameter, one is a vector XI of 1 rows n columns, one row for x, and a matrix XJ with a M2 row n column, corresponding to multiple lines of X. Distfun must accept any number of XJ matrix rows. The Distfun must return a 1-dimensional vector of m2 lines at a distance of D2, where the k element is the distance of the Xi and XJ (K,:).

' Linkage '

Any linkage method allowed by the linkage function:

' Average '

' Centroid '

' Complete '

' Median '

' Single '

' Ward '

' Weighted '

' Maxclust ' The maximum number of clusters, which is a positive integer.
' Savememory '

A string that is ' on ' or ' off '. When available, the ' on ' setting allows Cluserdata to construct clusters without calculating the distance matrix. Savememory when the following conditions are available:

Linkage is ' centroid ', ' median ' or ' ward '

Distance is ' Euclidean ' (default)

When Savememory is ' on ', the linkage run time and the number of dimensions (number of columns in x) are proportional. When Savememory is ' off ', the demand for linkage memory is proportional to N2, where n is the number of observations. The best (and least time-consuming) savememory settings for all choices depend on the dimension of the problem, the number of observations, or the available memory. The default Savememory setting is a rough approximation of the optimal setting.

Default: ' On ' When x has less than or equal to 20 columns, or if the computer does not have enough memory to store the distance matrix;

Help documentation-Translation-statistics toolbox-exploratory Data analysis-cluster analysis-hierarchical Clustering (cluster,clusterdata) ( 1)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.