K Nearest Neighbor algorithm

Last Update:2015-09-17 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

K Nearest neighbor algorithm is called KNN algorithm, this algorithm is a relatively classical machine learning algorithm, the overall KNN algorithm is relatively easy to understand the algorithm. The k represents the closest to their own K data samples. The KNN algorithm and the K-means algorithm are different, the K-means algorithm is used to cluster, to determine what is a relatively similar type, and the KNN algorithm is used to do the classification, that is, there is a sample space in the sample is divided into a few types, and then, given a data to be classified, Determine which classification the data to classify belongs to by calculating the nearest K sample. you can simply understand that it is up to the nearest K-point to vote on which category of data is classified .

There is a more classic figure in the KNN entry on Wikipedia:

From this we can see that there are two types of sample data in the figure, one is the blue square, the other is the red triangle. And that green circle is the data we're classifying.

If k=3, then the nearest green point has 2 red triangles and a blue square, these 3 points to vote, so the green of the classification point belongs to the red triangle.

If k=5, then the nearest green point has 2 red triangles and 3 blue squares, these 5 points to vote, so the green of the classification point belongs to the Blue Square.

We can see that the nature of machine learning-- is based on a method of data statistics ! So what's the use of this algorithm? Let's look at a few examples.

Product quality judgment

Suppose we need to judge the quality of paper towels, the quality of paper towels can be drawn like two vectors, one is "acid corrosion time", one is "can withstand the pressure." If our sample space is as follows: (so-called sample space, also called training data, which is used for machine learning)

Vector X1 Acid Time (seconds)	Vector X2 Tumbler Strong (kg/m²)	Quality y
7	7	Bad
7	4	Bad
3	4	Good
1	4	Good

So, if X1 = 3 and X2 = 7, what is the quality of this towel? Here you can use the KNN algorithm to judge.

Suppose k=3,k should be an odd number, so that there will be no flat ticket, and here is the distance we calculate (3,7) to all points. (For those distance formulas, refer to the distance formula in the K-means algorithm)

Vector X1 Acid Time (seconds)	Vector X2 Tumbler Strong (kg/m²)	Calculate the distance to (3, 7)	Vector y
7	7		Bad
7	4		N/A
3	4		Good
1	4		Good

So, the final vote, good has 2 votes, bad has 1 votes, the final need to test (3,7) is a qualified product. (Of course, you can also use weights--you can take the distance value as a weight, the closer the weight, the more likely it will be more accurate)

Note: Example from here, k-nearestneighbors Excel table download

Forecast

Suppose we have the following set of data, assuming that X is the number of seconds elapsed, and that the Y value is a value that changes over time (you can imagine a stock value)

So, when the time is 6.5 seconds, what is the Y value? We can use the KNN algorithm to predict.

Here, let's assume k=2, so we can calculate the distance of all X points to 6.5, such as: x=5.1, distance is | 6.5–5.1 | = 1.4, X = 1.2 so the distance is | 6.5–1.2 | = 5.3. So we get the following table:

Note that because of the k=2, so the x=4 and X =5.1 points are obtained recently, the values of Y are 27 and 8 respectively, in which case we can simply use the average value to calculate:

Therefore, the final prediction value is: 17.5

Note: Example from here, knn_timeseries Excel table download

Interpolation, smoothing curves

KNN algorithm can also be used for smoothing the curve, this usage is more alternative. If our sample data is as follows (as in the above):

To smooth these points, we need to insert some values into them, for example, we start with a step of 0.1, starting from 0 to 6, calculating the distance (absolute value) of all x points, and giving the data from 0 to 0.5:

Given the 11 values inserted from 2.5 to 3.5, and then calculating their distance to each x, the false value is k=4, then we use the Y value of the last 4 x, and then the average, get the following table:

So can be from 0.0, 0.1, 0.2, 0.3 .... 1.1, 1.2, 1.3.....3.1, 3.2.....5.8, 5.9, 6.0 a large table, according to the value of K, get the following figure:

Note: Example from here, knn_smoothing Excel table download

Postscript

Finally, I want to say two more things,

1) One is machine learning, the algorithm is basically relatively simple, the most difficult is the mathematical modeling, the characteristics of those services are abstracted into the process of vector, the other is to select the appropriate model of the data sample. These two things are not simple things. The algorithm is rather simple.

2) for KNN algorithm found in the nearest to their K points, is a very classic algorithm interview questions, need to use the data structure is "maximum heap--max heap", a binary tree. You can look at the relevant algorithms.

Http://coolshell.cn/articles/8052.html

K Nearest Neighbor algorithm

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

K Nearest Neighbor algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

K Nearest Neighbor algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support