first, hierarchical clustering1) distance and similarity coefficient
The
R language uses dist (x, method = "Euclidean", Diag = False, upper = false, P = 2) to calculate the distance. where x is the sample matrix or data frame. method means which distance is calculated. The value of method is:
euclidean Euclidean distance, is square re-prescribing.
maximum Chebyshev Distance
manhattan Absolute Distance
canberra lance Distance
Minkowski Minkovski distance, when used to specify P-value
binary Qualitative variable distance.
Qualitative variable Distance: The number of 0:0 pairs in M items is M0, 1:1 pairs of M1, can not be paired with M2, distance =m1/(M1+M2); The distance from the diagonal is given when the
Diag is true. When Upper is ture, the values on the upper triangular matrix are given.
The R language uses scale (x, center = true, scale = true) to center the data matrix and normalize the transformation.
In the case of x,scale=f scale only,
The R language uses sweep (x, MARGIN, STATS, fun= "-", ...) to perform operations on the matrix. Margin is 1, which represents the direction of the row in which the operation is performed, and 2 represents the direction of the column. Stats is the parameter of the operation. Fun is an arithmetic function, and the default is subtraction. The following uses the sweep to perform a very poor normalized transformation of the matrix X
?
1 2 3 |
>center <-sweep (x, 2, apply (x, 2, mean)) #在列的方向上减去均值. >r <-apply (x, 2, max)-Apply (x, 2, Min) #算出极差, that is, the maximum value on the column-minimum >x_star <-sweep (center, 2, R, "/") #把减去均值后的矩阵在列的方向上除以极差向量 |