In this paper, we describe the method of the KNN algorithm Python implementation and simple digital recognition. Share to everyone for your reference. Specifically as follows:
KNN algorithm algorithm Advantages and disadvantages:
Advantages: High precision, insensitive to abnormal values, no input data assumptions
Disadvantages: Time complexity and space complexity are very high
Range of applicable data: numerical and nominal
The idea of the algorithm:
KNN algorithm (full k nearest neighbor algorithm), the idea of the algorithm is very simple, simple is like a flock of birds, that is, we from a bunch of known training focus to find K and the target closest to, and then see the most of their classification is which, based on this classification.
Function parsing:
Library functions:
Tile ()
Like tile (a,n) is to repeat A n times
Copy Code code as follows:
A = Np.array ([0, 1, 2])
Np.tile (A, 2)
Array ([0, 1, 2, 0, 1, 2])
Np.tile (A, (2, 2))
Array ([[0, 1, 2, 0, 1, 2],[0, 1, 2, 0, 1, 2]])
Np.tile (A, (2, 1, 2))
Array ([[[0, 1, 2, 0, 1, 2]],[[0, 1, 2, 0, 1, 2]])
b = Np.array ([[1, 2], [3, 4]])
Np.tile (b, 2)
Array ([[1, 2, 1, 2],[3, 4, 3, 4]])
Np.tile (b, (2, 1))
Array ([[1, 2],[3, 4],[1, 2],[3, 4]]) '
Functions implemented by yourself
CreateDataSet () to generate a test array
Knnclassify (INPUTX, DataSet, labels, k) Classification function
INPUTX Input Parameters
DataSet Training Set
Marking of labels Training set
The number of nearest neighbors K
Copy Code code as follows:
#coding =utf-8
From numpy Import *
Import operator
Def createdataset ():
Group = Array ([[1.0, 0.9], [1.0, 1.0], [0.1, 0.2], [0.0, 0.1]]
Labels = [' A ', ' a ', ' B ', ' B ']
Return Group,labels
#inputX表示输入向量 (that is, we have to judge which category it belongs to)
#dataSet表示训练样本
#label表示训练样本的标签
#k是最近邻的参数, select the nearest K
def knnclassify (INPUTX, DataSet, labels, k):
Datasetsize = dataset.shape[0] #计算有几个训练数据
#开始计算欧几里得距离
Diffmat = Tile (INPUTX, (datasetsize,1))-DataSet
Sqdiffmat = Diffmat * * 2
Sqdistances = Sqdiffmat.sum (Axis=1) #矩阵每一行向量相加
distances = sqdistances * * 0.5
#欧几里得距离计算完毕
Sorteddistance = Distances.argsort ()
ClassCount = {}
For I in Xrange (k):
Votelabel = Labels[sorteddistance[i]]
Classcount[votelabel] = Classcount.get (votelabel,0) + 1
res = max (ClassCount)
return res
def main ():
Group,labels = CreateDataSet ()
t = knnclassify ([0,0],group,labels,3)
Print T
If __name__== ' __main__ ':
Main ()
A case study of KNN
The realization of handwriting recognition system
Data set:
Two datasets: Training and test. The label of the category is in the filename. Pixel-32*32. The data might look like this:
Method:
The use of KNN, but this distance is more complex (1024 features), mainly to deal with how to read the data of the problem, the comparison of direct call on it.
Speed:
The speed is still relatively slow, where the dataset is: Training 2000+,test 900+ (i5 CPU)
When you k=3, you 32s+.
Copy Code code as follows:
#coding =utf-8
From numpy Import *
Import operator
Import OS
Import time
Def createdataset ():
Group = Array ([[1.0, 0.9], [1.0, 1.0], [0.1, 0.2], [0.0, 0.1]]
Labels = [' A ', ' a ', ' B ', ' B ']
Return Group,labels
#inputX表示输入向量 (that is, we have to judge which category it belongs to)
#dataSet表示训练样本
#label表示训练样本的标签
#k是最近邻的参数, select the nearest K
def knnclassify (INPUTX, DataSet, labels, k):
Datasetsize = dataset.shape[0] #计算有几个训练数据
#开始计算欧几里得距离
Diffmat = Tile (INPUTX, (datasetsize,1))-DataSet
#diffMat = Inputx.repeat (datasetsize, Aixs=1)-DataSet
Sqdiffmat = Diffmat * * 2
Sqdistances = Sqdiffmat.sum (Axis=1) #矩阵每一行向量相加
distances = sqdistances * * 0.5
#欧几里得距离计算完毕
Sorteddistance = Distances.argsort ()
ClassCount = {}
For I in Xrange (k):
Votelabel = Labels[sorteddistance[i]]
Classcount[votelabel] = Classcount.get (votelabel,0) + 1
res = max (ClassCount)
return res
def img2vec (filename):
Returnvec = Zeros ((1,1024))
FR = open (filename)
For I in range (32):
Linestr = Fr.readline ()
For j in Range (32):
RETURNVEC[0,32*I+J] = Int (linestr[j])
Return Returnvec
def handwritingclasstest (trainingfloder,testfloder,k):
Hwlabels = []
Trainingfilelist = Os.listdir (Trainingfloder)
m = Len (trainingfilelist)
Trainingmat = Zeros ((m,1024))
For I in range (m):
FileName = Trainingfilelist[i]
Filestr = Filename.split ('. ') [0]
classnumstr = Int (Filestr.split ('_') [0])
Hwlabels.append (CLASSNUMSTR)
Trainingmat[i,:] = Img2vec (trainingfloder+ '/' +filename)
Testfilelist = Os.listdir (Testfloder)
Errorcount = 0.0
Mtest = Len (testfilelist)
For I in Range (mtest):
FileName = Testfilelist[i]
Filestr = Filename.split ('. ') [0]
classnumstr = Int (Filestr.split ('_') [0])
Vectorundertest = Img2vec (testfloder+ '/' +filename)
Classifierresult = Knnclassify (Vectorundertest, Trainingmat, Hwlabels, K)
#print Classifierresult, "classnumstr
If Classifierresult!= classnumstr:
Errorcount +=1
print ' tatal error ', Errorcount
print ' Error rate ', errorcount/mtest
def main ():
T1 = Time.clock ()
Handwritingclasstest (' trainingdigits ', ' testdigits ', 3)
T2 = Time.clock ()
print ' Execute ', t2-t1
If __name__== ' __main__ ':
Main ()
I hope this article will help you with your Python programming.