This paper describes the relationship between Euclidean distance, Manhattan distance, and Chebyshev distance from the memory Minkowski.
In general, defining a distance function, d (x, y), needs to meet the following guidelines:
1) d (x,x) = 0//To its own distance of 0
2) d (x, y) >= 0//distance non-negative
3) d (x, y) = d (y,x)//symmetry: If A to B distance is a, then the distance from B to a should also be a
4) d (x,k) d (k,y) >= d (x, y)//Triangle rule: (the sum of both sides is greater than the third side)
Minkowski Distance:
Minkowski distance (Minkowski distance) is a very common way to measure the distance between numerical points, assuming that the value points P and Q coordinates are as follows:
So, the Minkowski distance is defined as:
The most common p for this distance is 2 and 1, the former Euclidean distance (Euclidean distance), and the latter is the Manhattan distance (Manhattan distance). Suppose you take a taxi from P-point to Q-point in Manhattan, white for tall buildings, grey for streets:
The green slash indicates Euclidean distance, which is impossible in reality. The other three lines represent the Manhattan distance, and the lengths of the three polylines are equal.
When p approaches infinity, Minkowski is converted to Chebyshev distance (Chebyshev distance):
We know that the shape of the point at which the Euclidean distance from the plane to the origin (P = 2) is 1 is a circle, when p takes other values?
Note that when P < 1 o'clock, Minkowski is no longer in accordance with the triangle rule, for example: when the distance of P < 1, (0,0) is equal to (1 1) ^{1/p} > 2, and (0,1) the distance between the two points is 1.
Minkowski is more intuitive, but it is not related to the distribution of data, has certain limitations, if the X-direction of the amplitude is far greater than the value of the Y-direction, the distance formula will be over-amplified the role of X-dimension. So, before we calculate the distance, we may also need to z-transform the data, minus the mean, divided by the standard deviation:
As you can see, the above processing begins to embody the statistical characteristics of the data. This method uses the characteristics of the data distribution to calculate different distances, assuming that the dimensions of the data are irrelevant. If the dimensions are related to each other (for example: high-height information is likely to lead to heavier weight information, because the two are related), the Markov distance (Mahalanobis distance) will be used.
"Turn" Minkowski distance