A method of _python and simple digital recognition based on KNN algorithm

Source: Internet
Author: User

In this paper, we describe the method of the KNN algorithm Python implementation and simple digital recognition. Share to everyone for your reference. Specifically as follows:

KNN algorithm algorithm Advantages and disadvantages:

Advantages: High precision, insensitive to abnormal values, no input data assumptions
Disadvantages: Time complexity and space complexity are very high
Range of applicable data: numerical and nominal

The idea of the algorithm:

KNN algorithm (full k nearest neighbor algorithm), the idea of the algorithm is very simple, simple is like a flock of birds, that is, we from a bunch of known training focus to find K and the target closest to, and then see the most of their classification is which, based on this classification.

Function parsing:

Library functions:

Tile ()
Like tile (a,n) is to repeat A n times

Copy Code code as follows:
A = Np.array ([0, 1, 2])
Np.tile (A, 2)
Array ([0, 1, 2, 0, 1, 2])
Np.tile (A, (2, 2))
Array ([[0, 1, 2, 0, 1, 2],[0, 1, 2, 0, 1, 2]])
Np.tile (A, (2, 1, 2))
Array ([[[0, 1, 2, 0, 1, 2]],[[0, 1, 2, 0, 1, 2]])
b = Np.array ([[1, 2], [3, 4]])
Np.tile (b, 2)
Array ([[1, 2, 1, 2],[3, 4, 3, 4]])
Np.tile (b, (2, 1))
Array ([[1, 2],[3, 4],[1, 2],[3, 4]]) '

Functions implemented by yourself

CreateDataSet () to generate a test array
Knnclassify (INPUTX, DataSet, labels, k) Classification function

INPUTX Input Parameters
DataSet Training Set
Marking of labels Training set
The number of nearest neighbors K

Copy Code code as follows:

#coding =utf-8
From numpy Import *
Import operator

Def createdataset ():
Group = Array ([[1.0, 0.9], [1.0, 1.0], [0.1, 0.2], [0.0, 0.1]]
Labels = [' A ', ' a ', ' B ', ' B ']
Return Group,labels
#inputX表示输入向量 (that is, we have to judge which category it belongs to)
#dataSet表示训练样本
#label表示训练样本的标签
#k是最近邻的参数, select the nearest K
def knnclassify (INPUTX, DataSet, labels, k):
Datasetsize = dataset.shape[0] #计算有几个训练数据
#开始计算欧几里得距离
Diffmat = Tile (INPUTX, (datasetsize,1))-DataSet

Sqdiffmat = Diffmat * * 2
Sqdistances = Sqdiffmat.sum (Axis=1) #矩阵每一行向量相加
distances = sqdistances * * 0.5
#欧几里得距离计算完毕
Sorteddistance = Distances.argsort ()
ClassCount = {}
For I in Xrange (k):
Votelabel = Labels[sorteddistance[i]]
Classcount[votelabel] = Classcount.get (votelabel,0) + 1
res = max (ClassCount)
return res

def main ():
Group,labels = CreateDataSet ()
t = knnclassify ([0,0],group,labels,3)
Print T

If __name__== ' __main__ ':
Main ()

A case study of KNN

The realization of handwriting recognition system

Data set:

Two datasets: Training and test. The label of the category is in the filename. Pixel-32*32. The data might look like this:

Method:

The use of KNN, but this distance is more complex (1024 features), mainly to deal with how to read the data of the problem, the comparison of direct call on it.

Speed:

The speed is still relatively slow, where the dataset is: Training 2000+,test 900+ (i5 CPU)

When you k=3, you 32s+.

Copy Code code as follows:

#coding =utf-8
From numpy Import *
Import operator
Import OS
Import time

Def createdataset ():
Group = Array ([[1.0, 0.9], [1.0, 1.0], [0.1, 0.2], [0.0, 0.1]]
Labels = [' A ', ' a ', ' B ', ' B ']
Return Group,labels
#inputX表示输入向量 (that is, we have to judge which category it belongs to)
#dataSet表示训练样本
#label表示训练样本的标签
#k是最近邻的参数, select the nearest K
def knnclassify (INPUTX, DataSet, labels, k):
Datasetsize = dataset.shape[0] #计算有几个训练数据
#开始计算欧几里得距离
Diffmat = Tile (INPUTX, (datasetsize,1))-DataSet
#diffMat = Inputx.repeat (datasetsize, Aixs=1)-DataSet
Sqdiffmat = Diffmat * * 2
Sqdistances = Sqdiffmat.sum (Axis=1) #矩阵每一行向量相加
distances = sqdistances * * 0.5
#欧几里得距离计算完毕
Sorteddistance = Distances.argsort ()
ClassCount = {}
For I in Xrange (k):
Votelabel = Labels[sorteddistance[i]]
Classcount[votelabel] = Classcount.get (votelabel,0) + 1
res = max (ClassCount)
return res

def img2vec (filename):
Returnvec = Zeros ((1,1024))
FR = open (filename)
For I in range (32):
Linestr = Fr.readline ()
For j in Range (32):
RETURNVEC[0,32*I+J] = Int (linestr[j])
Return Returnvec

def handwritingclasstest (trainingfloder,testfloder,k):
Hwlabels = []
Trainingfilelist = Os.listdir (Trainingfloder)
m = Len (trainingfilelist)
Trainingmat = Zeros ((m,1024))
For I in range (m):
FileName = Trainingfilelist[i]
Filestr = Filename.split ('. ') [0]
classnumstr = Int (Filestr.split ('_') [0])
Hwlabels.append (CLASSNUMSTR)
Trainingmat[i,:] = Img2vec (trainingfloder+ '/' +filename)
Testfilelist = Os.listdir (Testfloder)
Errorcount = 0.0
Mtest = Len (testfilelist)
For I in Range (mtest):
FileName = Testfilelist[i]
Filestr = Filename.split ('. ') [0]
classnumstr = Int (Filestr.split ('_') [0])
Vectorundertest = Img2vec (testfloder+ '/' +filename)
Classifierresult = Knnclassify (Vectorundertest, Trainingmat, Hwlabels, K)
#print Classifierresult, "classnumstr
If Classifierresult!= classnumstr:
Errorcount +=1
print ' tatal error ', Errorcount
print ' Error rate ', errorcount/mtest

def main ():
T1 = Time.clock ()
Handwritingclasstest (' trainingdigits ', ' testdigits ', 3)
T2 = Time.clock ()
print ' Execute ', t2-t1
If __name__== ' __main__ ':
Main ()

I hope this article will help you with your Python programming.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.