Classification algorithm--k nearest neighbor algorithm (Python implementation) (with project source code at the end of the article)

Source: Internet
Author: User

The principle of KNN algorithm

The k nearest neighbor (K-nearest Neighbor) algorithm is a relatively simple machine learning algorithm. It is classified by measuring the distance between different eigenvalues, and the idea is simple: if a sample belongs to a category in the K nearest neighbor (most similar) sample in the feature space, the sample belongs to that category as well.

The steps of KNN algorithm

First stage: Determine K value (refers to the number of nearest neighbors), is generally an odd

Second stage: Determine the distance measure formula. The text classification generally uses the angle cosine to derive the sample points for the data points to be classified and all known categories, from which the nearest K samples are selected:

Phase three: Count the number of K samples at O ' clock in each category

Implementation of KNN algorithm in Python

First stage: You can use the initialization nbayes_lib.py in Nbayes, click here to view

Phase II: Distance formula for realizing the cosine of the angle

from NumPy Import *
Import operator
from Nbayes_pre Import*

K=3
# Angle Cosine Distance formula
def cosdist (Vector1,vector2):
return dot (vector1,vector2)/(Linalg.norm (Vector1) *linalg.norm (Vector2))

Phase III: KNN implementation classifier

#kNNclassifier
#测试集 testdata, Training set trainset, category label Listclasses,k number of K neighbors
defClassify (TESTDATA,TRAINSET,LISTCLASSES,K):
DATASETSIZE=TRAINSET.SHAPE[0]#returns the number of rows in a sample set
Distances=array (Zeros (datasetsize))
forIndxinchXrange (datasetsize):
Distances[indx]=cosdist (Testdata,trainset[indx])
#according to the generated angle cosine from large to small sort, the result is the index number
Sorteddistindicies=argsort (-distances)
classcount={}
#get the top K entry with the smallest angle as a reference
forIinchRange (k):
Votelilabel=listclasses[sorteddistindicies[i]]#returns the category label for the Swatch set in sorted order
Classcount[votelilabel]=classcount.get (votelilabel,0) +1#Reorder by value for dictionary ClassCount
#对分类字典classCount按value重新排序
#sorted (Data.iteritems (), Key=operator.itemgetter (1), reverse=true)
#该句是按字典值排序的固定用法
#classCount. Iteritems (): Dictionary iterator
#key: Sorting parameters; Operator.itemgetter (1): Multilevel sorting
Sortedclasscount=sorted (Classcount.iteritems (), Key=operator.itemgetter (1), reversed=True)
returnSORTEDCLASSCOUNT[0][0]#returns one of the highest order items

#评估分类结果
Dataset,listclasses=loaddataset ()
Nb=nbayes ()
Nb.train_set (dataset,listclasses)
# classification using pre-Bayesian classification stage datasets and generated tf vectors
Print (Classify (nb.tf[3],nb.tf,listclasses,k))

Project Source Code

Classification algorithm--k nearest neighbor algorithm (Python implementation) (with project source code at the end of the article)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.