Tag:float gdi Add two value image shape ext Environment color presentation
This paper mainly uses K-nearest neighbor classifier to realize handwriting recognition system, training data set about 2000 samples, each number has about 200 samples, each sample is saved in a TXT file, the handwriting image itself is a 32x32 two value image, as shown in: Test code for handwritten numeral recognition system: From numpy import *import operatorfrom os import listdir #inX The data to be detected #dataset DataSet #labels result set #k length to compare def Classify0 (inx, dataset, labels, k): datasetsize = dataset.shape[0] #计算有多少行 # tile (inx, (datasetsize,1)) Generate a matrix of corresponding InX dimensions for easy Diffmat = tile (inx, (datasetsize,1)) - dataSet sqdiffmat = diffmat**2 #差求平方 sqdistances = sqdiffmat.sum (Axis=1) # axis=0, represents a column axis=1, represents a row. distances = sqdistances**0.5 &NBsp #开方 sorteddistindicies = Distances.argsort () #argsort () sort, subscript classcount={} For i in range (k): voteilabel = labels[sorteddistindicies[i]] #通过下标索引分类 # by constructing a dictionary, recording classification frequency Classcount[voteilabel] = classcount.get (voteilabel,0) + 1 # sort fields by value (from big to small) sortedclasscount = sorted (Classcount.items (), Key=lambda classcount:classcount [1], reverse=true) return sortedclasscount[0][0] #手写字体识别 # First, we need to format the image as a vector, # Converts a 32x32 binary image matrix through the Img2vector () function to the 1x1024 vector: def img2vector (filename): returnvect = zeros (1,1024) fr = open (filename) For i in range (+): &NBSP, #图片矩阵为32 *32 Linestr = fr.readline () #数据量大, so use readline For j in range (+): returnvect[0,32*i+j]& Nbsp;= int (Linestr[j]) return returnvect #手写字体识别def handwritingclasstest (): hwlabels = [] Trainingfilelist = listdir (R ' trainingdigits ') #指定文件夹 M = len (trainingfilelist) & nbsp #获取文件夹个数 trainingmat = zeros (m,1024) &N BSP, #构造m个1024比较矩阵 For i in range (m): filenamestr = trainingfilelist[i] #获取文件名 &NBSp Filestr = filenamestr.split ('. ') [0] #按点把文件名字分割 Classnumstr = int (Filestr.split ('_') [0] ) #按下划线把文件名字分割 Hwlabels.append (CLASSNUMSTR) & nbsp #实际值添加保存 Trainingmat[i,:] = img2vector (R ' trainingdigits/%s ' % FILENAMESTR) Testfilelist = listdir (' testdigits ') #测试数据 errorcount = 0.0 Mtest = len ( testfilelist) For i in range (mtest): #同上, processing test data FILENAMESTR = testFileList[i] filestr = filenamestr.split ('. ') [0] #take off .txt Classnumstr = int (filestr.split (‘_‘) [0]) &NBSp Vectorundertest = img2vector (R ' testdigits/%s ' % filenamestr) Classifierresult = classify0 (vectorundertest, trainingmat, hwlabels, 3) print ("Calculated value: %d, actual value: %d" % (classifierresult, CLASSNUMSTR) if (CLASSIFIERRESULT != CLASSNUMSTR): errorcount += 1.0 print ("\ n error occurrences: %d" % errorcount) print ("\ N Error Rate: %f " % (Errorcount/float (mtest))) handwritingclasstest () result: calculated value: 9, actual value: 9 Calculated value: 9, actual value: 9 Calculated value: 9, actual value: 9 Calculated value: 9, actual value: 9 Calculated value: 9, actual value: 9 Calculated value: 9, actual value: 9 error occurred: 10 error Rate: 0.010571 can see the KNN algorithm for memory consumption is very large (i 12G), Chinese environment recognition I can't imagine.
Python's handwriting recognition system based on KNN algorithm