Machine learning Combat-K nearest Neighbor algorithm (KNN) 03-Handwriting recognition system __ algorithm

Source: Internet
Author: User
a handwritten recognition system using K-Nearest neighbor algorithm

The system constructed here can only recognize digital 0~9.
The numbers that need to be identified are already using graphics processing software, processed to have the same color and size: a Black-and-white image with a width high of 32 pixels x32 pixels. Example: Handwriting recognition system steps using the K-Nearest neighbor algorithm

(1) collecting data: providing a text file.
(2) prepare the data: write the function classify0 (), convert the image format to the list format used by the classifier.
(3) analyze data: Check the data at the Python command prompt to make sure it meets the requirements.
(4) training data: This step does not apply to K-nearest neighbor algorithm.
(5) test algorithm: write a function to use the provided part of the DataSet as a test sample, the difference between the test sample and the non-test sample is that the test sample is the data that has been completed, if the forecast classification is different from the actual category, it is marked as an error.
(6) using the algorithm: This example does not complete this step, if interested can build their own complete application, from the image to extract the number, and complete the digital recognition, the United States mail sorting system is a practical operation of a similar system. Preparing data: Converting an image to a test vector

To use the classifier of the previous two examples, you need to format the image as a vector. We will convert a 32x32 binary image matrix to a 1*1024 vector so that the classifier used earlier can process digital image information.
First, write a function img2vector, convert the image to a vector, create a 1x1024 numpy array, open the given file, loop through the first 32 lines of the file, store the first 32 characters of each row in the NumPy array, and then return the array.
Add the following to the knn.py file:

#将图像转换为向量
def img2vector (filename):
    returnvector = Zeros ((1,1024))
    fr = open (filename) for
    i in range (
        linestr = Fr.readline () for J in
        Range (k):
            returnvector[0,32*i+j] = Int (linestr[j))
    return Returnvector

In the Python command line, enter the following command to test the Img2vector function:

IN[2]: Import KNN
backend Tkagg is interactive backend. Turning interactive mode on.
IN[3]: Testvector = knn.img2vector (' testdigits/0_13.txt ')
in[4]: testvector[0,0:31]
out[4]: 

Array ([ 0.,  0.,  0., 0., 0., 0., 0.,  0.,  0  ., 0., 0., 0., 0. ,
        0.,  1.,  1.,  1.,  1.,  0  ., 0., 0., 0., 0., 0., 0.,  0.,
        0.,  0.,  0.,  0.,  0.]


IN[5]: testvector[0,32:63]
out[5]: 

Array ([0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0., 0., 1., 1., 1.,  1.,  1  ., 1., 1., 0., 0. ,  0.,  0.,  0.,  0., 0., 0., 0., 0.  ,  0.,  0.]

Test algorithm: Using K-Nearest neighbor algorithm to recognize handwritten digits

The following self-contained function handwritingclasstest () is the code for testing the test classifier, before which the starting part of the file must be added to the From OS import Listdir, whose primary function is to import function Listdir from the OS module. It can list the file name of a given directory.

#手写数字识别系统的测试代码 def handwritingclasstest (): Hwlabels = [] #获取目录内容 trainingfilelist = Listdir (' trainingdigits ')
        File name m = Len (trainingfilelist) Trainingmat = Zeros ((m,1024)) for I in range (m) #列出文件夹 ' trainingdigits ': #从文件名解析分析数字 filenamestr = trainingfilelist[i] Filestr = Filenamestr.split ('. ') [0] #将文件名用split ('. ') According to. Split, take the first part classnumstr = Int (Filestr.split ('_') [0]) hwlabels.append (CLASSNUMSTR) #将标签数据存放在hwL
    Abels vector trainingmat[i,:] = Img2vector (' trainingdigits/%s '%filenamestr) testfilelist = Listdir (' testDigits ')
        Errorcount = 0.0 mtest = Len (testfilelist) for I in Range (mtest): Filenamestr = Testfilelist[i] Filestr = Filenamestr.split ('. ')
        [0] classnumstr = Int (Filestr.split ('_') [0]) Vectorundertest = Img2vector (' testdigits/%s '%filenamestr) Classifierresult = Classify0 (vectorundertest,trainingmat,hwlabels,3) print "The classifier CAMe back with:%d, the real answer is:%d "% (CLASSIFIERRESULT,CLASSNUMSTR) if (Classifierresult!= classnumstr): Err Orcount +=1 print "\nthe total number of errors are:%d"%errorcount print "\nthe total error rate is:%f" (Errorc Ount/float (Mtest))

Result output:

IN[20]: Reload (KNN) out[20]: <module ' KNN ' from '/home/vickyleexy/pycharmprojects/handwriting_knn/knn.py ' > In[ : Knn.handwritingclasstest () The classifier came back with:6, the real answer is:6 the classifier came-back with:4, TH E Real answer is:4 the classifier came back with:6, the real answer is:6 the classifier came-back with:2, the real answe R Is:2 The classifier came back with:5, the real answer is:5 the classifier came-back with:1, the real answer is:1 the Classifier came back with:3, the real answer is:3 the classifier came back with:1, the real answer is:1 the classifier C Ame back with:2, "real answer is:2" classifier came back with:1, "real answer is:1" classifier came back wit H:1, the real answer is:1 the classifier came-back with:7, the real answer is:7 the classifier came-back with:0, the Rea L Answer is:0 ... the classifier came back with:5, the real answer is:5 the classifier came-back with:7, "real A.". Nswer Is:7 the classifier CAMe back with:9, "real answer is:9" classifier came back with:0, "real answer is:0" classifier came back with  : 8, the real answer is:8 the classifier came back with:3, the real answer is:3 the classifier came Answer Is:7 The total number of errors is:11 the total error rate is:0.011628

Error rate is 1.2%, change k value, modify function handwritingclasstest () randomly select training samples, change the number of training samples, will have an impact on K-Nearest algorithm error rate. Call handwriting recognition system

#手写字体识别的调用代码
def handwritingclass ():
    file = ' testdigits/0_2.txt ' #mark do not know how to open the new
    Vector = img2vector (file )
    trainingmat,hwlabels = Handwritingclasstest ()
    Classifierresult = Classify0 (vector,trainingmat,hwlabels,3 )
    print "The handwriting number is:%d"%classifierresult

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.