The K-Nearest neighbor algorithm for machine learning

Last Update:2016-04-05 Source: Internet

Author: User

Tags diff square root

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Machine learning can be divided into supervised learning and unsupervised learning. Supervised learning is a specific classification of information, such as the input is used to determine input [a,b,c] of the class, unsupervised learning is not clear the final classification, and will not give the target value.

The K-Nearest neighbor algorithm belongs to a supervised learning classification algorithm, the idea is that if a sample in the feature space in the K most similar (that is, the most adjacent in the feature space) of the sample is a category, then the sample belongs to this category.

Need to classify, what is the basis of classification, each object has its characteristic point, this is the basis for classification, feature points can be many, the more the more accurate classification.

Machine learning is the way to learn the classification from the sample, then we need to enter our sample, that is, the sample has been divided into a good class, such as the characteristic point is a, B2 features, the input of the sample of a ethyl-propyl, respectively [[1.0, 1.1], [1.0, 1.0], [0, 0]]. Then start to enter the target value, of course, to give characteristics, the ultimate goal is to see the features close to a more or b more, if these as coordinates, a few feature points is a few latitude coordinates, then is the distance between the coordinates. So the question is, how to look closer to a or B is more.

I put the code directly, based on Python, first enter the feature labels and the sample group.

Modules that need to be imported at the outset

1 #  23# Scientific Calculation package 4#fromnumpy import *5 Import NumPy 6 # operator Module   7 Import operator

Data samples and classification simulations

1 #manually establish a data source matrix group, and the data source classification results Labels2 defCreateDataSet ():3Group = Numpy.array ([[1.0, 1.1], [1.0, 1.0], [5., 2.], [5.0, 0.1]])4Labels = ['A','A','B','B'] 5     returnGroup, labels

Then the KNN algorithm is used.

1 #Newinput is the target of the input, the dataset is the matrix of the sample, the label is the classification, K is the number to take2 defknnclassify (newinput, DataSet, labels, k):3     #reads the number of rows of the matrix, i.e. the number of samples4NumSamples =Dataset.shape[0]5     Print 'NumSamples:', NumSamples6 7     #into the same number of rows as the dataset, number of rows = Original *numsamples, number of columns = original *, then subtract each feature point from the sample point8diff = Numpy.tile (newinput, (NumSamples, 1))-DataSet9     Print 'diff:', diffTen  One     #Square ASquareddiff = diff * * 2 -     Print "Squareddiff:", Squareddiff -  the     #axis=0 sum by column, 1 to sum by row -Squareddist = Numpy.sum (squareddiff, Axis = 1)  -     Print "squareddist:", Squareddist -  +     #Open the square root, the distance will come out -Distance = squareddist * * 0.5 +     Print "Distance:", Distance A  at     #Sort by size reverse order -Sorteddistindices =numpy.argsort (distance) -     Print "sorteddistindices:", Sorteddistindices -  -ClassCount = {}  -      forIinchRange (k): in         #return distance (key) corresponding category (value) -Votelabel =Labels[sorteddistindices[i]] to         Print "Votelabel:", Votelabel +  -         #take the first few k values, but the size of the first few values of k is not compared, are equivalent theClasscount[votelabel] = classcount.get (Votelabel, 0) + 1 *     Print "ClassCount:", ClassCount $MaxCount =0Panax Notoginseng     #return the largest share of -Sortedclasscount=sorted (Classcount.iteritems (), Key=operator.itemgetter (1), reverse=True) the  +     returnSORTEDCLASSCOUNT[0][0]

The final Test

1 dataSet, labels = createdataset ()23 testx = Numpy.array ([0, 0])4 k = 35 Outputlabel = knnclassify (testx, DataSet, labels, k)6print" Your input is: " "  ", Outputlabel

Can discover the output

1Numsamples:42diff: [-1. -1.1]3[-1. -1. ]4[-5. -2. ]5[-5. -0.1]]6Squareddiff: [[1.00000000e+00 1.21000000e+00]]7[1.00000000e+00 1.00000000e+00]8[2.50000000e+01 4.00000000e+00]9[2.50000000e+01 1.00000000e-02]]TenSquareddist: [2.21 2. 29.25.01] OneDistance: [1.48660687 1.41421356 5.38516481 5.0009999 ] ASorteddistindices: [1 0 3 2] - votelabel:a - votelabel:a the votelabel:b -ClassCount: {'A': 2,'B': 1} -Your input is: [0 0] andClassified toclass: A

Here I have always had a question about the value of k, the result may be the value of K to change, as long as the value of K within the range of all the characteristics of the distance is not related. So it's called K Nearest neighbor classification algorithm

The K-Nearest neighbor algorithm for machine learning

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More