Multidimensional Scale method (multidimensional Scaling,mds) is a multi-data analysis technique for displaying "distance" structure in low-dimensional space, which simplifies the research object (sample or variable) of multidimensional space into the low-dimensional space for locating, analyzing and classifying. At the same time, it preserves the data analysis method of primitive relation between objects.
Multidimensional scaling method is similar to principal component analysis (Principle Component ANALYSIS,PCA) and linear discriminant analysis (Linear discriminent Analysis,lda), which can be used to reduce dimension
goal of the multidimensional scaling method : When the similarity (or distance) between objects in N objects is given, determine the representation of these objects in the low-dimensional (continental) space (known as the perceptual graph, perceptual Mapping), and make it as similar as possible (or distance) to the original Substantially match ", so that any deformation caused by the reduced dimension is minimized.
Each point arranged in a low-dimensional (continental) space represents an object, so the distance between points is highly correlated with the similarity between objects. That is, two similar objects are represented by two points of similar distance in the low-dimensional (continental) space, while two objects that are not similar are represented by a point that is two distant from the low-dimensional (European) space. Low-dimensional space is usually two-dimensional or three-dimensional Euclidean space, but it can also be non-Euclidean three-dimensional space.
Classical MDS:
• Euclidean distance matrices are used for distance matrices in primitive space and in low-dimensional space
• The distance array d is European, that is, there is a positive integer p and the n point X1 of the RP space, ..., xn, makes
The goal is: to find D's (FIT) composition x1, ..., xn, the idea is
– Transform the Euclidean distance array d = (d2ij) of the square to a nonnegative fixed matrix B
– A composition x is obtained from the characteristic roots and eigenvectors of B, and each row of x represents a point in a low-dimensional space.
• For this purpose, remember that the original P-dimensional object (the observation point) is x1, ..., xn (generally unknown), and the distance between 22 is squared
B =−1/2*hdh,h = in−1/n 11′
Wherein, R is determined: pre-determined R = 1, 2 or 3; Or it is determined by calculating the proportions of the previous feature root to the total characteristic root.
ImportNumPy as NpD=np.array ([[0,411,213,219,296,397], [411,0,204,203,120,152], [213,204,0,73,136,245], [219,203,73,0,90,191], [296,120,136,90,0,109], [ 397,152,245,191,109, 0]]) N=d.shape[0]t=Np.zeros ((n,n))#Solution 1#ss = 1.0/n**2*np.sum (d**2)#For i in range (N):#For J in Range (I,n):#T[i,j] = t[j,i] = -0.5* (D[i,j]**2-1.0/n*np.dot (d[i,:],d[i,:]) -1.0/n*np.dot (D[:,j],d[:,j]) +ss)#Solution 2K =Np.dot (D,np.transpose (D)) D2= D**2H= Np.eye (N)-1/NT= -0.5*Np.dot (Np.dot (H,D2), H) Eigval,eigvec=Np.linalg.eig (T) X= Np.dot (Eigvec[:,:2],np.diag (Np.sqrt (eigval[:2]))) Print('Original Distance','\tnew Distance') forIinchRange (N): forJinchRange (i+1, N):Print(Np.str (D[i,j]),'\t\t', Np.str ("%.4f"%np.linalg.norm (X[i]-x[j]))
Operation Result:
Reference documents: Canonical correlation analysis and multidimensional scaling method-Zhang Weiping's handout
Python implementation of multidimensional scaling method (MDS)