Distance in MATLAB

Source: Internet
Author: User

Using Data Mining and pattern recognition tasks involve calculating abstract "distances" between items or collections of items. some modeling algorithms, such as K-nearest neighbors or radial basis function neural networks) make direct use of Multivariate distances. one very useful distance measure, the Mahalanobis distance, will be explained and implemented here.
Euclidean distance
TheEuclidean distanceIs the geometric distance we are all familiar with in 3 spatial dimensions. the Euclidean distance is simple to caclculate: square the difference in each dimension (variable), and take the square root of the sum of these squared differences. in Matlab:

% Euclidean distance between vectors 'A 'and' B ', original recipe
Euclideandistance = SQRT (sum (a-B). ^ 2 ))

... Or, for the more linear algebra-minded:

% Euclidean distance between vectors 'A 'and' B ', linear algebra Style
Euclideandistance = norm (a-B)

This distance measure has a straightforward geometric interpretation, is easy to code and is fast to calculate, but it has two basic drawbacks:
First, the Euclidean distance is extremely sensitive to the scales of the variables involved. in geometric situations, all variables are measured in the same units of length. with other data, though, this is likely not the case. modeling problems might deal with variables which have very different scales, such as age, height, weight, etc. the scales of these variables are not comparable.
Second, the Euclidean distance is blind to correlated variables. consider a hypothetical data set containing 5 variables, where one variable is an exact duplicate of one of the others. the copied variable and its twin are thus completely correlated. yet, Euclidean distance has no means of taking into account that the copy brings no new information, and will essential weight the copied variable more heavily in its calculations than the other variables.
Mahalanobis distance
TheMahalanobis distanceTakes into account the covariance among the variables in calculating distances. with this measure, the problems of scale and correlation inherent in the Euclidean distance are no longer an issue. to understand how this works, consider that, when using Euclidean distance, the set of points equidistant from a given location is a sphere. the Mahalanobis distance stretches this sphere to correct for the respective scales of the different variables, and to account for correlation among variables.
TheMahalOrPdistFunctions in the statistics toolbox can calculate the Mahalanobis distance. it is also very easy to calculate in base Matlab. I must admit to some embarrassment at the simple-mindedness of my own implementation, once I reviewed what other programmers had crafted. see, for instance:
Mahaldist. m by Peter J. acklam
Note that it is common to calculate the square of the Mahalanobis distance. taking the square root is generally a waste of computer time since it will not affect the order of the distances and any critical values or thresholds used to identify outliers can be squared instead, to save time.
Application
The Mahalanobis distance can be applied directly to modeling problems as a replacement for the Euclidean distance, as in radial basis function neural networks. another important use of the Mahalanobis distance is the detection of outliers. consider the data graphed in the following chart (click the graph to enlarge ):

The point enclosed by the Red Square clearly does not obey the distribution exhibited by the rest of the data points. notice, though, that simple univariate tests for outliers wowould fail to detect this point. although the outlier does not sit at the center of either scale, there are quite a few points with more extreme values of bothVariable 1AndVariable 2. The Mahalanobis distance, however, wocould easily find this outlier.
Further reading
Multivariate Statistical Methods, By manly (ISBN: 0-412-28620-3)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.