Various distance calculations in machine learning

Source: Internet
Author: User

Various distance calculations in machine learning

Original: http://blog.csdn.net/qq_23617681/article/details/51471156

In machine learning, it is often necessary to calculate various distances.
For example, the distance of KNN nearest neighbor, Kmeans distance, the distance in similarity is calculated.
This distance is not necessarily Euclidean distance, for different needs, different characteristics of the data, distance calculation method is different.
The distance calculation method commonly used in machine learning and its application characteristics are given below.

For distances, there are the following basic features:
1, D (x, X) =0//to their own distance of 0
2, D (x, y) >=0//distance non-negative
3, D (x, y) =d (y, x)//distance symmetry, distance from A to B, distance from B to a is equal
4, D (x, z) +d (z, y) >=d (x, y)//triangle law, and the sum of the two sides is greater than the third side

1, Minkowski distance (Minkowski distance)
A common method for measuring distances between numeric points. The formula is as follows.

The most common form of this distance is p=2 or 1, which corresponds to Euclidean distance and Manhattan distance respectively.
See the figure below, the grid represents the city block traffic road, straight line represents the European distance, shows the situation in the presence of tall buildings, not directly through, the line represents the Manhattan distance, its length is equal.

But when p in the company tends to infinity, Minkowski is converted to Chebyshev distance (Chebyshev distance).

We know that the shape of the plane on the distance from the origin Euclidean distance (p=2) to 1 is a circle, but when P takes another value. See figure below.

The Chebyshev distance approximates a square.
Note: When p<1, Minkowski's distance does not conform to the triangle rule. For example: P<1, (0,0) (two) is the distance (^1/p>2), and (0,1) to the two points of the distance is 1.
Features: Minkowski the distance is more intuitive, and the data distribution is irrelevant, has certain limitations. For example, the X-direction amplitude is much larger than the Y-direction value, then the Minkowski distance will magnify the effect of the X-dimension. You can use Z-transform processing to subtract the mean, divided by the variance, when calculating distances.

At this point, the processing reflects the statistical characteristics of the data, but the assumption is that the data is not related. For example, a person's height and weight are related, such data, you need to use the distance of the Markov.
2, Markov distance (Mahalanobis distance)
Take a look at the example above, the Euclidean distance > Black ball of the black green ball, and the black green Ball's distance < black ball.
This is because the Markov distance considers the correlation between the data and uses the Cholesky transformation to eliminate the correlation and size differences between the different dimensions.

Specific Exchange formulas refer to article 1.
For example, the following sample points are shown below.

The XY direction scale is different, the Euclidean distance cannot be easily obtained.
Because the XY implementation is relevant, it is roughly diagonal upward and cannot be normalized: Subtract the mean, divided by the variance.
The most appropriate method is to perform a Cholesky transformation to find the Markov distance. After the transformation, the Markov distance space is as follows:

The red star behind is closer to the origin. It is biased with the beginning of intuitive feeling.
3. Vector inner Product
Common, effective and intuitive means of measuring similarity.
Vector inner product does not have the effect of rejecting vector length, if the internal product of the unit length vector is calculated, it is the cosine similarity (consine similarity), the formula is as follows:

The cosine similarity is only related to the vector direction, regardless of amplitude. It is often seen in the process of TF-IDF and image similarity calculation.
Not enough, the cosine similarity is affected by the vector shift, and if X becomes x+1, the cosine similarity value changes.
The solution to this approach is the Pearson similarity (person correlation), which avoids amplitude effects and avoids vector translation effects.

Pearson correlation coefficients have translational invariance and dimensional invariance.
4, the distance between the classified data
Hamming distance: Two equal length strings, one of which becomes the minimum number of transformations required.
5, the distance between the probability distribution
The distance between the numerical points is discussed earlier, and the distance between the probability distributions is actually calculated.
Common methods include Chi-square validation (chi-square) and KL divergence (kl-divergence)


Summary:
Jaccard calculating the similarity/distance between sets
KL measures the similarity/distance between two distributions
Consine calculating the similarity/distance between vectors

Reference articles
1, http://blog.jobbole.com/84876/

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.