KNN algorithm of Artificial intelligence

Source: Internet
Author: User
Tags ranges square root

Reprinted from: https://www.cnblogs.com/magic-girl/p/python-kNN.html

A KNN algorithm based on Python implementation

The proximity algorithm (K-NEARESTNEIGHBOR) is a classification (classification) algorithm in machine learning and one of the simplest algorithms in machine learning. Simple as it may be, it works well when it comes to solving specific problems. Therefore, learning KNN algorithm is a good way to get started in machine learning.

The KNN algorithm is very simple, and it selects the nearest sample point from the test point, and outputs the label with the highest number of K sample points. We assume that each sample has M eigenvalues (property), then a sample can be represented by an m-dimensional vector: X = (x1,x2, ..., XM), similarly, the characteristic value of the test point can also be expressed as: Y = (y1,y2, ..., ym). So how do we define the "distance" between the two?

In two-dimensional space, there is: D2 = (x1-y1) 2 + (X2-Y2) 2, in three-dimensional space, two points of distance is defined as: D2 = (x1-y1) 2 + (X2-Y2) 2 + (X3-Y3) 2. We can then generalize to m-dimensional space, define the distance of M-dimensional space: D2 = (x1-y1) 2 + (X2-Y2) 2 + ... + (XM-YM) 2. To implement the KNN algorithm, we only need to calculate the distance from each sample point to the test point, select the nearest k samples, get their labels, and then find the most number of k samples in the label, return the label.

Before we begin to implement the algorithm, we have to consider a problem in which the range of eigenvalues of different features can vary greatly, for example, if we want to distinguish a person's gender, A girl's height is 1.70m, weight is 60kg, a boy's height is 1.80m, weight is 70kg, and an unknown gender person's height is 1.81m, weight is 64kg, this person and girl data point of "distance" square D2 = (1.70-1.81) 2 + (60-6 4) 2 = 0.0121 + 16.0 = 16.0121, while the square D2 of the "distance" of the male data point = (1.80-1.81) 2 + (70-64) 2 = 0.0001 + 36.0 = 36.0001. It can be seen that in this case, the square of the difference in height is negligible relative to the weight difference, but height is very important for gender identification. In order to solve this problem, it is necessary to standardize the data (normalize), divide each eigenvalue by the range of the feature, and ensure that each eigenvalue is 0~1 between the normalized values. We write a normdata function to perform the work of standardizing the dataset:

Full code:
1 ImportNumPy as NP2  fromMathImportsqrt3 Importoperator as opt4 5 defNormdata (dataSet):6Maxvals = Dataset.max (axis=0)7Minvals = Dataset.min (axis=0)8ranges = Maxvals-minvals9Retdata = (dataset-minvals)/RangesTen     returnretdata, ranges, minvals One  A  - defKNN (dataSet, labels, testData, k): -Distsquaremat = (dataset-testdata) * * 2#calculates the square of the difference theDistsquaresums = Distsquaremat.sum (Axis=1)#sum of squares of difference of each line -distances = distsquaresums * * 0.5#Open the square root and draw the distance from each sample to the test point -Sortedindices = Distances.argsort ()#sort, get sort of subscript after -indices = Sortedindices[:k]#take the smallest k +Labelcount = {}#stores the number of occurrences of each label -      forIinchIndices: +Label =Labels[i] ALabelcount[label] = labelcount.get (label, 0) + 1#Times plus One atSortedcount = sorted (Labelcount.items (), Key=opt.itemgetter (1), reverse=true)#sort label occurrences from large to small -     returnSORTEDCOUNT[0][0]#returns the most frequently seen label -  -  -  - if __name__=="__main__": inDataSet = Np.array ([[2, 3], [6, 8]]) -Normdataset, ranges, minvals =Normdata (DataSet) toLabels = ['a','b'] +TestData = Np.array ([3.9, 5.5]) -Normtestdata = (testdata-minvals)/Ranges theresult = KNN (normdataset, labels, normtestdata, 1) *     Print(Result)
The result is output a, consistent with the expected results.

KNN algorithm of Artificial intelligence

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.