K Nearest Neighbor

Source: Internet
Author: User
Tags ranges

K Nearest neighbor (K-nearestneighbor) algorithm is abbreviated as KNN. The basic idea is simple and straightforward, for a data instance x that needs to be categorized, calculates the distance between x and all known categories of sample points in the feature space. Take the nearest K-sample point to the X-distance, and count the categories with the largest percentage of these sample points, as a result of the X classification. In the 3 points closest to the Green point, 2 are in the red category, and the X is considered a red class. However, when k=5, the green point is considered to belong to the blue category.

The following examples are from "machine learning combat".

A woman collects three properties of men in dating sites as a feature: annual mileage, percentage of monthly play time, and consumption of ice cream litres per week. She tagged the men she had dated: dislike, General charm, and charm. To decide whether to date a new man, she wants to see if the man is the type he likes before she meets. The following two pictures are {monthly playing time of the ratio, consumption of ice cream litres}, {Monthly playing time of the ratio, the annual flight mileage} of the two-dimensional graphic display. You can see that using {The percentage of each month's playing time, miles per year} can be a good way to divide men into three categories.

Generate the code for the graph:

Datingdatamat, datinglabels = Knn.file2matrix ("DatingTestSet2.txt")#print (Datingdatamat)#print (datinglabels[0:20])Fig=plt.figure () Ax= Fig.add_subplot (111) Ax.scatter (datingdatamat[:,0],datingdatamat[:,1], 20.0*array (datinglabels), 15.0*Array (datinglabels)) Plt.xlabel (U"number of ice cream litres/week", fontproperties='Simhei') Plt.ylabel (U"percentage of game time played", fontproperties='Simhei') plt.show ()

KNN algorithm Flowchart:

It is important to note that at the very beginning of the eigenvalues are normalized, otherwise the calculation of the distance between the sample point and the point of classification will be due to the difference in the relative size of the values of each of the characteristics caused by the error.

The following code selects half of DatingTestSet2.txt as a set of tagged samples and classifies the other half of the samples. Set the k=3, the classification accuracy rate is 92%. When setting the k=10, the classification accuracy rate is almost constant, observing {The percentage of the monthly playing time, the scatter chart of the annual mileage}, I think the point at the classification boundary causes the correct rate of classification cannot be further promoted.

Normmat, ranges, minvals =knn.autonorm (Datingdatamat)#print (' Normmat: ', Normmat)#print (' Ranges: ', ranges)#print (' minvals ', minvals)length=Normmat.shape[0]ratio=0.8Numsample= Int (ratio*length) Numtest= Length-numsampleerrcnt=0 forIinchRange (numtest): Val= Knn.classify0 (Normmat[i,:], normmat[numtest:length,:], Datinglabels[numtest:length], 10)    if(val! =Datinglabels[i]): errcnt+=1Precision= 1-errcnt/numtestPrint('Precision of prediction:', precision)

KNN Algorithm Analysis:

KNN algorithm is simple and effective, insensitive to abnormal data, no data input assumptions. and can handle classification and regression problems (for regression problems, you can use the mean of the nearest K-point as the predicted value for that point). However, the KNN algorithm needs to save the known sample set, the space complexity is high, and the time complexity is high because of the need to calculate the distance to all known sample points for each of the points to be sorted. Moreover, the K nearest neighbor algorithm only focuses on the location information of the classification points in the feature space, and does not really understand the intrinsic meaning of the data.

K Nearest Neighbor

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.