A supervised KNN neighbor algorithm:
(1) Calculate the distance between the points in a well-known category dataset and the current point
(2) Sorting in ascending order of distance
(3) Select K points with a minimum distance from the current point
(4) Determine the frequency of the category in which the first K points are present
(5) Return to the category with the highest frequency of the first K points as the forecast classification of the current point
#数据样例
1 2:a
1 3:a
1 4:a
1 5:b
6 2:b
6 3:b
200:c
101 199:c
444:d
299 50:D
10000:d
#版本0: Pure python
"KNN" from math import sqrtfrom collections import Counterdistance=lambda a,b:sqrt (sum (Map (lambda Ai,bi:pow (ai-bi,2), A, b))) If Len (a) ==len (b) Else "Error0:data length match fail" Distance2=lambda a,b:distance ([Int (i) for I in A.split ()],[int ( i) for I in B.split ()] # for Strings#print (Distance2 (' 1 2 4 7 8 ', ' 2 5 5 6 ")) Readdata=lambda file:{line.split (': ') [0] : Line.strip (). Split (': ') [1] for line in open (file)} #print (ReadData ()) def judgespot (filein= ' test0.txt ', x= ' 1 2 ', num=5) : Distancedict,data={},readdata (Filein) for K in Data:distancedict[str (Distance2 (x,k))]=data[k] # Sortdis Tance=sorted (Distancedict.items (), Key=lambda x:float (x[0])) [: num] # Kinddict=[item[1] for item in sortdistance] Retu RN Sorted (Dict (Counter (item[1] for item in sorted (Distancedict.items (), Key=lambda x:float (x[0])) [: num]). Items (), key =lambda x:x[1],reverse=true) [0][0] #print (Judgespot (' 10000 ', ' Test0.txt '), Def judgeSpot2 (datain,x= ' 1 2 ', num=5) : Distancedict,data={},datain for k in DatA:distancedict[str (Distance2 (x,k))]=data[k] # sortdistance=sorted (Distancedict.items (), Key=lambda x:float (x[0]) ) [: num] # Kinddict=[item[1] for item in sortdistance] return sorted (Dict (Counter (item[1) for item in sorted (distance Dict.items (), Key=lambda x:float (x[0])) [: num]). Items (), Key=lambda x:x[1],reverse=true) [0][0]print (Judgespot (' Test0.txt ', ' 10000 '),) #Rate of Rightdef rateright (filein= ' test0.txt ', num=5): Countright,data=0,readdata (Filein) For k in Data:if JudgeSpot2 (data,k,num) ==data[k]: countright+=1 return countright/float (len (Open (fi leIn). ReadLines ()) print (Rateright ())
#版本1: Version NumPy (to be implemented)
Search
Copy
<Python>< supervised >knn--nearest neighbor classification algorithm