Memory-based data structures for cluster analysis:
12 modulus Matrix: N objects are represented by P variables. (Rows represent properties, columns represent the characteristics of each property for each object) rows and columns represent different entities
2 The heterogeneity matrix: rows and columns represent the same entities, and (single-mode matrices) store approximations between N object 22.
Interval scale variable: weight, height. A continuous measure of a coarse linear scale.
The unit of measure will directly affect the structure of the cluster analysis, the unit should be standardized, the original converted to a value of no unit. (Z-score)
EUCLIDEAN (Euclidean) distance; Minkowski distance
Manhattan Distance
Symmetric two-element variable: Evaluate dissimilarity with simple matching coefficients d (i,j) = (b+c)/(A+B+C+D)
Non -...... : Jaccard factor (no concern for all 0 (d) cases) ... : D (i,j) = (b+c)/(A+B+C)
Nominal variable: simple match: D (i,j) = (p-m)/p; Creates a new two-tuple variable for each state of the M-nominal state, and encodes the nominal variable with an asymmetric two-element variable.
Sequence variable: The sequence number one by one corresponds to a contiguous array of numbers, and the value is mapped to "0,1" (24.2.30)
Data Mining: Clustering