Knn=k-nearest neighbour
Principle: We take the most probable category of the first k similar data (sorted), as the type of prediction. Typically, K is not greater than 20.
Below is a simple example of the meaning in the note:
ImportNumPy as NPImportoperatorImportOSdefCreateDataSet (): Group= Np.array ([[1.0, 1.1],[1.0, 1.0],[0, 0],[0, 0.1]]) labels= ['A','A','B','B'] returnGroup, Labelsdefclassify (InX, dataSet, labels, k): Datasetsize= Dataset.shape[0]#lines num; samples numDiffmat = Np.tile (InX, (datasetsize,1))-DataSet#datasize* (1*inx)Sqdiffmat = diffmat**2sqdistances= Sqdiffmat.sum (Axis=1)#add as the first Dimdistances = sqdistances**0.5#return indicies array from min to Max #This is an arraySorteddistanceindices =Distances.argsort ()#classcount={}Classcount=dict ()#Define a dictionary forIinchRange (k): Voteilabel=Labels[sorteddistanceindices[i]] Classcount[voteilabel]= Classcount.get (Voteilabel, 0) + 1#get (Key,default=none) #return a list like [(' C ', 4), (' B ', 3), (' A ', 2)], not a dict #itemgetter (0) is the 1st element #Default:from min to MaxSortedclasscount =Sorted (Classcount.iteritems (), Key=operator.itemgetter (1), reverse=True)returnSORTEDCLASSCOUNT[0][0]
Mlia Study Note (ii) KNN algorithm