The principle of KNN algorithm
The k nearest neighbor (K-nearest Neighbor) algorithm is a relatively simple machine learning algorithm. It is classified by measuring the distance between different eigenvalues, and the idea is simple: if a sample belongs to a category in the K nearest neighbor (most similar) sample in the feature space, the sample belongs to that category as well.
The steps of KNN algorithm
First stage: Determine K value (refers to the number of nearest neighbors), is generally an odd
Second stage: Determine the distance measure formula. The text classification generally uses the angle cosine to derive the sample points for the data points to be classified and all known categories, from which the nearest K samples are selected:
Phase three: Count the number of K samples at O ' clock in each category
Implementation of KNN algorithm in Python
First stage: You can use the initialization nbayes_lib.py in Nbayes, click here to view
Phase II: Distance formula for realizing the cosine of the angle
from NumPy Import *
Import operator
from Nbayes_pre Import*
K=3
# Angle Cosine Distance formula
def cosdist (Vector1,vector2):
return dot (vector1,vector2)/(Linalg.norm (Vector1) *linalg.norm (Vector2))
Phase III: KNN implementation classifier
#kNNclassifier
#测试集 testdata, Training set trainset, category label Listclasses,k number of K neighbors
defClassify (TESTDATA,TRAINSET,LISTCLASSES,K):
DATASETSIZE=TRAINSET.SHAPE[0]#returns the number of rows in a sample set
Distances=array (Zeros (datasetsize))
forIndxinchXrange (datasetsize):
Distances[indx]=cosdist (Testdata,trainset[indx])
#according to the generated angle cosine from large to small sort, the result is the index number
Sorteddistindicies=argsort (-distances)
classcount={}
#get the top K entry with the smallest angle as a reference
forIinchRange (k):
Votelilabel=listclasses[sorteddistindicies[i]]#returns the category label for the Swatch set in sorted order
Classcount[votelilabel]=classcount.get (votelilabel,0) +1#Reorder by value for dictionary ClassCount
#对分类字典classCount按value重新排序
#sorted (Data.iteritems (), Key=operator.itemgetter (1), reverse=true)
#该句是按字典值排序的固定用法
#classCount. Iteritems (): Dictionary iterator
#key: Sorting parameters; Operator.itemgetter (1): Multilevel sorting
Sortedclasscount=sorted (Classcount.iteritems (), Key=operator.itemgetter (1), reversed=True)
returnSORTEDCLASSCOUNT[0][0]#returns one of the highest order items
#评估分类结果
Dataset,listclasses=loaddataset ()
Nb=nbayes ()
Nb.train_set (dataset,listclasses)
# classification using pre-Bayesian classification stage datasets and generated tf vectors
Print (Classify (nb.tf[3],nb.tf,listclasses,k))
Project Source Code
Classification algorithm--k nearest neighbor algorithm (Python implementation) (with project source code at the end of the article)