Machine learning can be divided into supervised learning and unsupervised learning. Supervised learning is a specific classification of information, such as the input is used to determine input [a,b,c] of the class, unsupervised learning is not clear the final classification, and will not give the target value.
The K-Nearest neighbor algorithm belongs to a supervised learning classification algorithm, the idea is that if a sample in the feature space in the K most similar (that is, the most adjacent in the feature space) of the sample is a category, then the sample belongs to this category.
Need to classify, what is the basis of classification, each object has its characteristic point, this is the basis for classification, feature points can be many, the more the more accurate classification.
Machine learning is the way to learn the classification from the sample, then we need to enter our sample, that is, the sample has been divided into a good class, such as the characteristic point is a, B2 features, the input of the sample of a ethyl-propyl, respectively [[1.0, 1.1], [1.0, 1.0], [0, 0]]. Then start to enter the target value, of course, to give characteristics, the ultimate goal is to see the features close to a more or b more, if these as coordinates, a few feature points is a few latitude coordinates, then is the distance between the coordinates. So the question is, how to look closer to a or B is more.
I put the code directly, based on Python, first enter the feature labels and the sample group.
Modules that need to be imported at the outset
1 # 23# Scientific Calculation package 4#fromnumpy import *5 Import NumPy 6 # operator Module 7 Import operator
Data samples and classification simulations
1 #manually establish a data source matrix group, and the data source classification results Labels2 defCreateDataSet ():3Group = Numpy.array ([[1.0, 1.1], [1.0, 1.0], [5., 2.], [5.0, 0.1]])4Labels = ['A','A','B','B'] 5 returnGroup, labels
Then the KNN algorithm is used.
1 #Newinput is the target of the input, the dataset is the matrix of the sample, the label is the classification, K is the number to take2 defknnclassify (newinput, DataSet, labels, k):3 #reads the number of rows of the matrix, i.e. the number of samples4NumSamples =Dataset.shape[0]5 Print 'NumSamples:', NumSamples6 7 #into the same number of rows as the dataset, number of rows = Original *numsamples, number of columns = original *, then subtract each feature point from the sample point8diff = Numpy.tile (newinput, (NumSamples, 1))-DataSet9 Print 'diff:', diffTen One #Square ASquareddiff = diff * * 2 - Print "Squareddiff:", Squareddiff - the #axis=0 sum by column, 1 to sum by row -Squareddist = Numpy.sum (squareddiff, Axis = 1) - Print "squareddist:", Squareddist - + #Open the square root, the distance will come out -Distance = squareddist * * 0.5 + Print "Distance:", Distance A at #Sort by size reverse order -Sorteddistindices =numpy.argsort (distance) - Print "sorteddistindices:", Sorteddistindices - -ClassCount = {} - forIinchRange (k): in #return distance (key) corresponding category (value) -Votelabel =Labels[sorteddistindices[i]] to Print "Votelabel:", Votelabel + - #take the first few k values, but the size of the first few values of k is not compared, are equivalent theClasscount[votelabel] = classcount.get (Votelabel, 0) + 1 * Print "ClassCount:", ClassCount $MaxCount =0Panax Notoginseng #return the largest share of -Sortedclasscount=sorted (Classcount.iteritems (), Key=operator.itemgetter (1), reverse=True) the + returnSORTEDCLASSCOUNT[0][0]
The final Test
1 dataSet, labels = createdataset ()23 testx = Numpy.array ([0, 0])4 k = 35 Outputlabel = knnclassify (testx, DataSet, labels, k)6print" Your input is: " " ", Outputlabel
Can discover the output
1Numsamples:42diff: [-1. -1.1]3[-1. -1. ]4[-5. -2. ]5[-5. -0.1]]6Squareddiff: [[1.00000000e+00 1.21000000e+00]]7[1.00000000e+00 1.00000000e+00]8[2.50000000e+01 4.00000000e+00]9[2.50000000e+01 1.00000000e-02]]TenSquareddist: [2.21 2. 29.25.01] OneDistance: [1.48660687 1.41421356 5.38516481 5.0009999 ] ASorteddistindices: [1 0 3 2] - votelabel:a - votelabel:a the votelabel:b -ClassCount: {'A': 2,'B': 1} -Your input is: [0 0] andClassified toclass: A
Here I have always had a question about the value of k, the result may be the value of K to change, as long as the value of K within the range of all the characteristics of the distance is not related. So it's called K Nearest neighbor classification algorithm
The K-Nearest neighbor algorithm for machine learning