The K-Nearest neighbor algorithm for machine learning

Source: Internet
Author: User
Tags diff square root

Machine learning can be divided into supervised learning and unsupervised learning. Supervised learning is a specific classification of information, such as the input is used to determine input [a,b,c] of the class, unsupervised learning is not clear the final classification, and will not give the target value.

The K-Nearest neighbor algorithm belongs to a supervised learning classification algorithm, the idea is that if a sample in the feature space in the K most similar (that is, the most adjacent in the feature space) of the sample is a category, then the sample belongs to this category.

Need to classify, what is the basis of classification, each object has its characteristic point, this is the basis for classification, feature points can be many, the more the more accurate classification.

Machine learning is the way to learn the classification from the sample, then we need to enter our sample, that is, the sample has been divided into a good class, such as the characteristic point is a, B2 features, the input of the sample of a ethyl-propyl, respectively [[1.0, 1.1], [1.0, 1.0], [0, 0]]. Then start to enter the target value, of course, to give characteristics, the ultimate goal is to see the features close to a more or b more, if these as coordinates, a few feature points is a few latitude coordinates, then is the distance between the coordinates. So the question is, how to look closer to a or B is more.

I put the code directly, based on Python, first enter the feature labels and the sample group.

Modules that need to be imported at the outset

1 #  23# Scientific Calculation package 4#fromnumpy import *5 Import NumPy 6 # operator Module   7 Import operator

Data samples and classification simulations

1 #manually establish a data source matrix group, and the data source classification results Labels2 defCreateDataSet ():3Group = Numpy.array ([[1.0, 1.1], [1.0, 1.0], [5., 2.], [5.0, 0.1]])4Labels = ['A','A','B','B'] 5     returnGroup, labels

Then the KNN algorithm is used.

1 #Newinput is the target of the input, the dataset is the matrix of the sample, the label is the classification, K is the number to take2 defknnclassify (newinput, DataSet, labels, k):3     #reads the number of rows of the matrix, i.e. the number of samples4NumSamples =Dataset.shape[0]5     Print 'NumSamples:', NumSamples6 7     #into the same number of rows as the dataset, number of rows = Original *numsamples, number of columns = original *, then subtract each feature point from the sample point8diff = Numpy.tile (newinput, (NumSamples, 1))-DataSet9     Print 'diff:', diffTen  One     #Square ASquareddiff = diff * * 2 -     Print "Squareddiff:", Squareddiff -  the     #axis=0 sum by column, 1 to sum by row -Squareddist = Numpy.sum (squareddiff, Axis = 1)  -     Print "squareddist:", Squareddist -  +     #Open the square root, the distance will come out -Distance = squareddist * * 0.5 +     Print "Distance:", Distance A  at     #Sort by size reverse order -Sorteddistindices =numpy.argsort (distance) -     Print "sorteddistindices:", Sorteddistindices -  -ClassCount = {}  -      forIinchRange (k): in         #return distance (key) corresponding category (value) -Votelabel =Labels[sorteddistindices[i]] to         Print "Votelabel:", Votelabel +  -         #take the first few k values, but the size of the first few values of k is not compared, are equivalent theClasscount[votelabel] = classcount.get (Votelabel, 0) + 1 *     Print "ClassCount:", ClassCount $MaxCount =0Panax Notoginseng     #return the largest share of -Sortedclasscount=sorted (Classcount.iteritems (), Key=operator.itemgetter (1), reverse=True) the  +     returnSORTEDCLASSCOUNT[0][0]

The final Test

1 dataSet, labels = createdataset ()23 testx = Numpy.array ([0, 0])4 k = 35 Outputlabel = knnclassify (testx, DataSet, labels, k)6print" Your input is: " "  ", Outputlabel

Can discover the output

1Numsamples:42diff: [-1. -1.1]3[-1. -1. ]4[-5. -2. ]5[-5. -0.1]]6Squareddiff: [[1.00000000e+00 1.21000000e+00]]7[1.00000000e+00 1.00000000e+00]8[2.50000000e+01 4.00000000e+00]9[2.50000000e+01 1.00000000e-02]]TenSquareddist: [2.21 2. 29.25.01] OneDistance: [1.48660687 1.41421356 5.38516481 5.0009999 ] ASorteddistindices: [1 0 3 2] - votelabel:a - votelabel:a the votelabel:b -ClassCount: {'A': 2,'B': 1} -Your input is: [0 0] andClassified toclass: A

Here I have always had a question about the value of k, the result may be the value of K to change, as long as the value of K within the range of all the characteristics of the distance is not related. So it's called K Nearest neighbor classification algorithm

The K-Nearest neighbor algorithm for machine learning

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.