Previously wrote the KNN classification algorithm code, want to use KNN to set the number of handwriting, look at the correct rate.
General idea: Get pictures (You can write, I have written before black and white pictures to the text of the code, can also find online, anyway, the data volume assembly better)-> into the text-> set up a large number of training data sets-> set up a good training data and categories of association-> test
Note: The training data must be explicitly given the category. In this experiment, the handwriting numbers are 10 categories, 0-9
Get the picture written before the text, skip it, and start by building the training data directly.
First load the data, the picture is stored in the text, not convenient processing, to the number of groups. Here's 32 is to save the picture width and height (px), according to the size of the specific picture decided.
Then store all the training data in an array, a text stored in a row of the array, how much text there is, how many rows there are, and the number of columns is fixed, 32*32=1024
The diagram above is a link between the category and the training data.
Test data, use KNns algorithm to classify test data.
Single handwriting digital file recognition:
Trainarray,labels=traindata ()
Tfile= "1_32.txt" #注意: 1 is the true category of the text, 32 is the 32nd data in the category
Tarray=datatoarray ("D:/xx/testdata" +tfile)
RESULT=KNN (4,tarray,trainarray,labels)
Print (Result)
Batch handwriting digital file recognition:
Results:
A total of 964 files, set error 11, K is 4, you can see that the KNN correct rate is still possible.
Source:
From numpy Import *
Import operator
From OS import listdir
Def KNN (K, testdata, Traindata, labels):
Traindatasize = traindata.shape[0]
dif = Tile (testdata, (traindatasize, 1))-Traindata
SQDIF = dif * * 2
Sumsqdif = Sqdif.sum (Axis=1)
Distance = sumsqdif * * 0.5
Sortdistance = Distance.argsort ()
Count = {}
For I in range (0, K):
Vote = Labels[sortdistance[i]]
Count[vote] = count.get (vote, 0) + 1
Sortcount = sorted (Count.items (), Key=operator.itemgetter (1), reverse=true)
return sortcount[0][0]
From PIL import Image
Im=image.open ("C:/xx/xx/3.jpg")
Fh=open ("C:/xx/xx/3_20.txt", "a")
WIDTH=IM.SIZE[0]
HEIGHT=IM.SIZE[1]
For I in Range (0,width):
For j in Range (0,height):
Cl=im.getpixel ((I,J))
CLALL=CL[0]+CL[1]+CL[2]
if (clall==0):
Fh.write ("1")
Else
Fh.write ("0")
Fh.write ("\ n")
Fh.close ()
def datatoarray (fname):
arr = []
FH = open (fname)
For I in range (0, 32):
Thisline = Fh.readline ()
For j in range (0, 32):
Arr.append (int (thisline[j]))
Return arr
# Create a function to prefix the filename
def seplabel (fname):
Filestr = Fname.split (".") [0]
label = Int (Filestr.split ("_") [0])
Return label
Def traindata ():
Labels = []
Trainfile = Listdir ("D:/xx/traindata")
num = Len (trainfile)
Trainarr = zeros ((num, 1024))
For I in range (0, num):
Thisfname = Trainfile[i]
Thislabel = Seplabel (thisfname)
Labels.append (Thislabel)
Trainarr[i,:] = Datatoarray ("d:/xx/traindata/" + thisfname)
Return Trainarr, Labels
Def datatest ():
Trainarr, labels = traindata ()
Testlist = Listdir ("D:/xx/testdata")
Tnum = Len (testlist)
Count = 0
For I in range (0, Tnum):
Thistestfile = Testlist[i]
Reallabel = Seplabel (thistestfile)
Testarr = Datatoarray ("d:/xx/" + thistestfile)
RKNN = KNN (3, Testarr, Trainarr, labels)
if (RKNN!= Reallabel):
Count = Count + 1
Print ("KNN identifies" + str (RKNN) + "error, True category is" + str (Reallabel))
Print ("KNN correct rate:" + str (tnum-count)/Tnum)
Datatest ()
'''
#抽某一个测试文件出来进行试验
Trainarr,labels=traindata ()
Testfile=listdir ("D:/pythonlianxi/result/traindata")
For I in range (0,len (testfile)):
Thisfname=testfile[i]
Reallabel=seplabel (Thisfname)
Testarr[i,:]=datatoarray ("d:/pythonlianxi/result/testdata/" +testfile[i])
RKNN=KNN (4,testarr,trainarr,labels)
Print (RKNN)
'''