This thing is similar to the one that improved the dating site, which is to convert all the numbers into black and white images of 32*32 pixels in advance, then convert them into character metalized (denoted by 0,1), save all 1024 pixels in a one-dimensional matrix so that Euclidean distances can be calculated by KNN to get the closest answer.
1 ImportOS2 Importoperator3 fromNumPyImport*4 5 defclassify0 (InX, DataSet, labels, k):6Datasetsize =Dataset.shape[0]7Diffmat = Tile (InX, (datasetsize,1))-DataSet#Unified matrix for Add and subtract8Sqdiffmat = diffmat**29Sqdistances = Sqdiffmat.sum (Axis=1)#to accumulate, axis=0 is by column, Axis=1 is by rowTendistances = sqdistances**0.5#Open Radicals OneSorteddistindicies = Distances.argsort ()#sort in ascending order, return the original subscript AClassCount = {} - forIinchRange (k): -Voteilabel =Labels[sorteddistindicies[i]] theClasscount[voteilabel] = classcount.get (Voteilabel, 0) + 1#get is a method in the dictionary, preceded by the value to be obtained, followed by the default value if the value does not exist -Sortedclasscount = sorted (Classcount.items (), Key=operator.itemgetter (1), reverse=True) - returnSortedclasscount[0][0] - + - defimg2vector (filename): +f =open (filename) AReturnvect = Zeros ((1,1024)) at forIinchRange (32): -line =F.readline () - forJinchRange (32): -RETURNVECT[0,I*32+J] =Int (line[j]) - returnReturnvect - in - defhandwritingclasstest (): toFileList = Os.listdir ('trainingdigits') +m =Len (fileList) -Traingmat = Zeros ((M, 1024)) theHwlabels = [] * forIinchRange (m): $FileName =Filelist[i]Panax Notoginsengprefix = Filename.split ('.') [0] -Number = Int (Prefix.split ('_') [0]) the hwlabels.append (number) +Traingmat[i,:] = Img2vector ('trainingdigits/%s'%fileName) ATestfilelist = Os.listdir ('testdigits') them =Len (testfilelist) +ErrorNum = 0.0 - forIinchRange (m): $Testfilename =Testfilelist[i] $prefix = Testfilelist[i].split ('.') [0] -realnumber = Int (Prefix.split ('_') [0]) -Testmat = Img2vector ('testdigits/%s'%testfilename) theTestResult = Classify0 (Testmat, Traingmat, Hwlabels, 3) - ifTestResult! =RealNumber:WuyiErrorNum + = 1 the Print('The classifier came back with:%d, the real answer is:%d'%(TestResult, realnumber)) - Print('error rate is%f'% (errornum/float (m))) Wu - if __name__=='__main__': AboutHandwritingclasstest ()
K-Nearest neighbor algorithm for machine learning Combat (handwriting recognition system)