Distance metrics and Python implementations (ii)

Source: Internet
Author: User

Next: http://www.cnblogs.com/denny402/p/7027954.html

7. Angle cosine (cosine)

It can also be called cosine similarity. The angle cosine of the geometry can be used to measure the difference in the direction of two vectors, which is borrowed from the machine learning to measure the difference between sample vectors.
(1) The angle cosine formula of vector A (x1,y1) and Vector B (x2,y2) in two-dimensional space:

(2) Angle cosine of two n-dimensional sample points a (x11,x12,..., x1n) and B (x21,x22,..., x2n)
Similarly, for two n-dimensional sample points a (x11,x12,..., x1n) and B (x21,x22,..., x2n), a concept similar to the angle cosine can be used to measure how similar they are to each other.

That

The cosine value range is [ -1,1]. The angle of the two vectors is obtained, and the cosine value corresponding to the angle is obtained, and the cosine value can be used to characterize the similarity of the two vectors. The smaller the angle, the nearer to 0 degrees, the closer the cosine is to the 1, the more similar the direction of their orientation. When the direction of the two vectors is exactly opposite the angle cosine takes the minimum-1. When the cosine value is 0 o'clock, the two vectors are orthogonal and the angle is 90 degrees. It can be seen that the cosine similarity is independent of the amplitude of the vector and is only related to the direction of the vector.

Import NumPy as Npx=np.random.random (x) y=np.random.random (ten)# method One: Solve by formula D1=np.dot (x, y)/(Np.linalg.norm (×) *np.linalg.norm (y))# method Two: Solve from by the SciPy library  Import pdistx=Np.vstack ([x, y]) d2=1-pdist (x,'cosine')

When two vectors are exactly equal, the cosine value is 1, as the following code calculates the d=1.

D=1-pdist ([x,x],'cosine')

8. Pearson correlation coefficient (Pearson correlation)

(1) Pearson's definition of correlation coefficient

The cosine similarity mentioned earlier is only related to the vector direction, but it is affected by the translation of the vector, and if X is shifted to x+1 in the angle cosine formula, the cosine value will change. How can translation invariance be achieved? The Pearson correlation coefficient (Pearson correlation) is used and is sometimes called a correlation coefficient .

If the angle cosine formula is written:

Represents the angle cosine between the vector x and the vector y, the Pearson correlation coefficient can be expressed as:

Pearson correlation coefficients have translational invariance and scale invariance, and the correlation of two vectors (dimensions) is calculated.

Implementations in Python:

Import NumPy as Npx=np.random.random (ten) y=np.random.random (ten)# method One: Solve X by Formula _=x-Np.mean (x) y_=y-np.mean (y) d1=np.dot (x_,y_)/(Np.linalg.norm (x_) *np.linalg.norm (y_))  # method Two: Solve x=np.vstack ([x, y]) D2=np.corrcoef according to NumPy library [0][1]

Correlation coefficient is a method to measure the correlation between x and y of random variables, and the range of correlation coefficients is [ -1,1]. The greater the absolute value of the correlation coefficient, the higher the correlation of x and Y. When x is linearly correlated with y, the correlation coefficient is 1 (positive linear correlation) or 1 (negative linear correlation).

9. Hamming distance (Hamming distance)
(1) Definition of Hamming distance
The Hamming distance between two equal-length strings S1 and S2 is defined as the minimum number of replacements required to change one of them into another. For example, the Hamming distance between the string "1111" and "1001" is 2.
Application: Information coding (in order to enhance the fault tolerance, should make the minimum Hamming distance between the coding as large as possible).

Implementations in Python:

 import   NumPy as NP  from  scipy.spatial.distance  Pdistx  =np.random.random (Ten) >0.5y  =np.random.random (>0.5<) Span style= "color: #000000;" >x  =np.asarray (x,np.int32) y  = #   method one: Solve by formula  D1=np.mean (X!=#   x=np.vstack ([x, y]) D2  =pdist (X) According to the SciPy library.   hamming   ) 

jaccard similarity coefficient (jaccard similarity coefficient)
(1) Jaccard similarity coefficient
The proportion of the intersection elements of two sets a and B in the Jaccard of a A, is called the two-set similarity coefficient, denoted by the symbol J (A, B).

Jaccard similarity coefficient is an indicator of the similarity of two sets.
(2) Jaccard distance
The concept opposite to the Jaccard similarity coefficient is the jaccard distance (jaccard distance). Jaccard distances can be expressed in the following formula:

The Jaccard distance is used to measure the sensitivity of two sets by the proportion of the elements in each of the two sets.
(3) Application of Jaccard similarity coefficient and Jaccard distance
The Jaccard similarity coefficient can be used to measure the similarity of samples.
Sample A and sample B are two n-dimensional vectors, and the values for all dimensions are 0 or 1. For example: A (0111) and B (1011). We treat the sample as a collection, 1 means that the collection contains the element, and 0 indicates that the collection does not contain the element.

Implementations in Python:

ImportNumPy as NP fromScipy.spatial.distanceImportPdistx=np.random.random (Ten) >0.5y=np.random.random (Ten) >0.5x=Np.asarray (x,np.int32) y=Np.asarray (Y,np.int32)#method One: Solve by the formulaUp=np.double (Np.bitwise_and ((x! = y), np.bitwise_or (x! = 0, Y! =0)). SUM ()) down=np.double (np.bitwise_or (x! = 0, Y! =0). SUM ()) D1= (up/Down )#method Two: Solve according to SciPy libraryx=Np.vstack ([x, y]) D2=pdist (X,'Jaccard')

Brecotis distance (Bray Curtis Distance)

Chi - square distance (ki-square Distance)

Distance metrics and Python implementations (ii)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.