This paper describes the implementation of KNN algorithm Python and the method of simple digital recognition. Share to everyone for your reference. Specific as follows:
KNN algorithm algorithm Advantages and disadvantages:
Advantages: High accuracy, insensitive to outliers, no input data assumptions
Cons: Both time complexity and space complexity are high
Applicable data range: Numerical and nominal type
The idea of the algorithm:
KNN algorithm (full k nearest neighbor algorithm), the idea of the algorithm is very simple, simple is to say that a flock of birds, that is, we from a bunch of known training centers to find K and the goal of the closest, and then see the most of their classification is which, based on this as the basis for classification.
Function parsing:
Library functions:
Tile ()
such as tile (A,N) is to repeat a n times
The code is as follows:
A = Np.array ([0, 1, 2])
Np.tile (A, 2)
Array ([0, 1, 2, 0, 1, 2])
Np.tile (A, (2, 2))
Array ([[0, 1, 2, 0, 1, 2],[0, 1, 2, 0, 1, 2]])
Np.tile (A, (2, 1, 2))
Array ([[[[[0, 1, 2, 0, 1, 2]],[[0, 1, 2, 0, 1, 2]]]
b = Np.array ([[1, 2], [3, 4]])
Np.tile (b, 2)
Array ([[1, 2, 1, 2],[3, 4, 3, 4]])
Np.tile (b, (2, 1))
Array ([[1, 2],[3, 4],[1, 2],[3, 4]]) '
Self-fulfilling functions
CreateDataSet () generate test array
Knnclassify (INPUTX, DataSet, labels, k) Classification functions
INPUTX Input Parameters
DataSet Training Set
Marking of labels Training set
Number of K nearest neighbors
The code is as follows:
#coding =utf-8
From numpy Import *
Import operator
Def createdataset ():
Group = Array ([[1.0, 0.9], [1.0, 1.0], [0.1, 0.2], [0.0, 0.1]])
Labels = [' A ', ' a ', ' B ', ' B ']
Return Group,labels
#inputX表示输入向量 (that is, we have to judge which category it belongs to)
#dataSet表示训练样本
#label表示训练样本的标签
#k是最近邻的参数, choose the nearest K
def knnclassify (INPUTX, DataSet, labels, k):
Datasetsize = dataset.shape[0] #计算有几个训练数据
#开始计算欧几里得距离
Diffmat = Tile (INPUTX, (datasetsize,1))-DataSet
Sqdiffmat = Diffmat * * 2
Sqdistances = Sqdiffmat.sum (Axis=1) #矩阵每一行向量相加
distances = sqdistances * * 0.5
#欧几里得距离计算完毕
Sorteddistance = Distances.argsort ()
ClassCount = {}
For I in Xrange (k):
Votelabel = Labels[sorteddistance[i]]
Classcount[votelabel] = Classcount.get (votelabel,0) + 1
res = max (ClassCount)
return res
def main ():
Group,labels = CreateDataSet ()
t = knnclassify ([0,0],group,labels,3)
Print T
If __name__== ' __main__ ':
Main ()
KNN Application Example
The realization of handwriting recognition system
Data set:
Two data sets: Training and test. The label of the classification is in the file name. Pixel-32*32. The data is probably like this:
Method:
KNN use, but this distance is relatively complex (1024 characteristics), mainly to deal with how to read the data of the problem, the comparison of direct call on it can be.
Speed:
The speed is still relatively slow, here the data set is: Training 2000+,test 900+ (i5 CPU)
32s+ when you k=3.
The code is as follows:
#coding =utf-8
From numpy Import *
Import operator
Import OS
Import time
Def createdataset ():
Group = Array ([[1.0, 0.9], [1.0, 1.0], [0.1, 0.2], [0.0, 0.1]])
Labels = [' A ', ' a ', ' B ', ' B ']
Return Group,labels
#inputX表示输入向量 (that is, we have to judge which category it belongs to)
#dataSet表示训练样本
#label表示训练样本的标签
#k是最近邻的参数, choose the nearest K
def knnclassify (INPUTX, DataSet, labels, k):
Datasetsize = dataset.shape[0] #计算有几个训练数据
#开始计算欧几里得距离
Diffmat = Tile (INPUTX, (datasetsize,1))-DataSet
#diffMat = Inputx.repeat (datasetsize, Aixs=1)-DataSet
Sqdiffmat = Diffmat * * 2
Sqdistances = Sqdiffmat.sum (Axis=1) #矩阵每一行向量相加
distances = sqdistances * * 0.5
#欧几里得距离计算完毕
Sorteddistance = Distances.argsort ()
ClassCount = {}
For I in Xrange (k):
Votelabel = Labels[sorteddistance[i]]
Classcount[votelabel] = Classcount.get (votelabel,0) + 1
res = max (ClassCount)
return res
def img2vec (filename):
Returnvec = Zeros ((1,1024))
FR = open (filename)
For I in range (32):
Linestr = Fr.readline ()
For j in Range (32):
RETURNVEC[0,32*I+J] = Int (linestr[j])
Return Returnvec
def handwritingclasstest (trainingfloder,testfloder,k):
Hwlabels = []
Trainingfilelist = Os.listdir (Trainingfloder)
m = Len (trainingfilelist)
Trainingmat = Zeros ((m,1024))
For I in range (m):
FileName = Trainingfilelist[i]
Filestr = Filename.split ('. ') [0]
classnumstr = Int (Filestr.split ('_') [0])
Hwlabels.append (CLASSNUMSTR)
Trainingmat[i,:] = Img2vec (trainingfloder+ '/' +filename)
Testfilelist = Os.listdir (Testfloder)
Errorcount = 0.0
Mtest = Len (testfilelist)
For I in Range (mtest):
FileName = Testfilelist[i]
Filestr = Filename.split ('. ') [0]
classnumstr = Int (Filestr.split ('_') [0])
Vectorundertest = Img2vec (testfloder+ '/' +filename)
Classifierresult = Knnclassify (Vectorundertest, Trainingmat, Hwlabels, K)
#print Classifierresult, ", Classnumstr
If Classifierresult! = classnumstr:
Errorcount +=1
print ' tatal error ', Errorcount
print ' Error rate ', errorcount/mtest
def main ():
T1 = Time.clock ()
Handwritingclasstest (' trainingdigits ', ' testdigits ', 3)
T2 = Time.clock ()
print ' Execute ', t2-t1
If __name__== ' __main__ ':
Main ()
Hopefully this article will help you with Python programming.