PYTHON__ algorithm for handwritten numeral recognition based on KNN classification algorithm

Source: Internet
Author: User

Previously wrote the KNN classification algorithm code, want to use KNN to set the number of handwriting, look at the correct rate.

General idea: Get pictures (You can write, I have written before black and white pictures to the text of the code, can also find online, anyway, the data volume assembly better)-> into the text-> set up a large number of training data sets-> set up a good training data and categories of association-> test

Note: The training data must be explicitly given the category. In this experiment, the handwriting numbers are 10 categories, 0-9

Get the picture written before the text, skip it, and start by building the training data directly.

First load the data, the picture is stored in the text, not convenient processing, to the number of groups. Here's 32 is to save the picture width and height (px), according to the size of the specific picture decided.


Then store all the training data in an array, a text stored in a row of the array, how much text there is, how many rows there are, and the number of columns is fixed, 32*32=1024


The diagram above is a link between the category and the training data.

Test data, use KNns algorithm to classify test data.


Single handwriting digital file recognition:

Trainarray,labels=traindata ()

Tfile= "1_32.txt" #注意: 1 is the true category of the text, 32 is the 32nd data in the category

Tarray=datatoarray ("D:/xx/testdata" +tfile)

RESULT=KNN (4,tarray,trainarray,labels)

Print (Result)


Batch handwriting digital file recognition:


Results:


A total of 964 files, set error 11, K is 4, you can see that the KNN correct rate is still possible.

Source:

From numpy Import *

Import operator

From OS import listdir

Def KNN (K, testdata, Traindata, labels):

Traindatasize = traindata.shape[0]

dif = Tile (testdata, (traindatasize, 1))-Traindata

SQDIF = dif * * 2

Sumsqdif = Sqdif.sum (Axis=1)

Distance = sumsqdif * * 0.5

Sortdistance = Distance.argsort ()

Count = {}

For I in range (0, K):

Vote = Labels[sortdistance[i]]

Count[vote] = count.get (vote, 0) + 1

Sortcount = sorted (Count.items (), Key=operator.itemgetter (1), reverse=true)

return sortcount[0][0]

From PIL import Image

Im=image.open ("C:/xx/xx/3.jpg")

Fh=open ("C:/xx/xx/3_20.txt", "a")

WIDTH=IM.SIZE[0]

HEIGHT=IM.SIZE[1]

For I in Range (0,width):

For j in Range (0,height):

Cl=im.getpixel ((I,J))

CLALL=CL[0]+CL[1]+CL[2]

if (clall==0):

Fh.write ("1")

Else

Fh.write ("0")

Fh.write ("\ n")

Fh.close ()

def datatoarray (fname):

arr = []

FH = open (fname)

For I in range (0, 32):

Thisline = Fh.readline ()

For j in range (0, 32):

Arr.append (int (thisline[j]))

Return arr

# Create a function to prefix the filename

def seplabel (fname):

Filestr = Fname.split (".") [0]

label = Int (Filestr.split ("_") [0])

Return label

Def traindata ():

Labels = []

Trainfile = Listdir ("D:/xx/traindata")

num = Len (trainfile)

Trainarr = zeros ((num, 1024))

For I in range (0, num):

Thisfname = Trainfile[i]

Thislabel = Seplabel (thisfname)

Labels.append (Thislabel)

Trainarr[i,:] = Datatoarray ("d:/xx/traindata/" + thisfname)

Return Trainarr, Labels

Def datatest ():

Trainarr, labels = traindata ()

Testlist = Listdir ("D:/xx/testdata")

Tnum = Len (testlist)

Count = 0

For I in range (0, Tnum):

Thistestfile = Testlist[i]

Reallabel = Seplabel (thistestfile)

Testarr = Datatoarray ("d:/xx/" + thistestfile)

RKNN = KNN (3, Testarr, Trainarr, labels)

if (RKNN!= Reallabel):

Count = Count + 1

Print ("KNN identifies" + str (RKNN) + "error, True category is" + str (Reallabel))

Print ("KNN correct rate:" + str (tnum-count)/Tnum)

Datatest ()

'''

#抽某一个测试文件出来进行试验

Trainarr,labels=traindata ()

Testfile=listdir ("D:/pythonlianxi/result/traindata")

For I in range (0,len (testfile)):

Thisfname=testfile[i]

Reallabel=seplabel (Thisfname)

Testarr[i,:]=datatoarray ("d:/pythonlianxi/result/testdata/" +testfile[i])

RKNN=KNN (4,testarr,trainarr,labels)

Print (RKNN)

'''


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.