Handwritten recognition of KNN in Machine Learning Practice

Last Update:2018-12-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

KNNAlgorithmIt is an excellent entry-level material for machine learning. The book explains as follows: "There is a sample data set, also known as a training sample set, and each data in the sample set has tags, that is, we know the correspondence between each piece of data in the sample set and its category. After entering new data without tags, compare each feature of the new data with the features corresponding to the data in the sample set. The algorithm extracts the classification tags of the most similar feature data (nearest neighbor) in the sample set. Generally, we only select the first K most similar data in the sample dataset. This is the source of K in the K-Nearest Neighbor Algorithm. Generally, K is an integer not greater than 20. Finally, select the K categories with the most frequent occurrences of the most similar data as the category of the new data ".

Advantages: high precision, insensitive to abnormal values, and no data input assumptions.

Disadvantages: high computing complexity and high space complexity.

Applicable data range: numeric or nominal.

Python Implementation of algorithms:

Def KNN (data, dataset, datalabel, K = 3, similarity = sim_distance): Scores = [(sim_distance (data, dataset [I]), datalabel [I]) for I in range (LEN (Dataset)] sortedscore = sorted (scores, key = Lambda D: d [0], reverse = false) scores = sortedscore [0: k] classcount ={} for score in scores: classcount [score [1] = classcount. get (score [1], 0) + 1 sortedclasscount = sorted (classcount. items (), Key = Lambda D: d [1], reverse = true) return sortedclasscount [0] [0]

The following steps are used to learn this algorithm:

(1) prepare data

(2) test Algorithms

First, we will introduce a handwriting recognition system. For the sake of simplicity, this system can only recognize numbers 0-9. The numbers to be recognized have already been processed with the same color and size using graphic processing software: 32*32 pixels of black and white photos. The trainingdigits directory contains about 2000 training samples, and the testdigits directory contains about 900 test samples.

Step 1: Prepare the data: Convert the image data into a test vector.This step is to convert the 32*32 binary image matrix to a 1*1024 vector.

 Def img2vector (filename): VEC = [] file = open (filename) for I in range (32): line = file. readline () for J in range (32): Vec. append (INT (line [J]) return VEC

Step 2: Test the algorithm accuracy. We useTraining samples under the trainingdigits directory to testSamples in the testdigits directory to calculate the accuracy.

Def test (): traindata, trainlabel = [], [] trainfilelist = OS. listdir ('digits/trainingdigits/') for filename in trainfilelist: traindata. append (img2vector ('digits/trainingdigits/% s' % filename) trainlabel. append (INT (filename. split ('_') [0]) succcnt, failcnt = 0, 0 testfilelist = OS. listdir ('digits/testdigits ') for filename in testfilelist: Data = img2vector ('digits/testdigits/% s' % filename) num = KNN (data, traindata, trainlabel) if num = int (filename. split ('_') [0]): succcnt + = 1 print 'succ' else: failcnt + = 1 print 'fail' print "error rate is: % F "% (failcnt/float (failcnt + succcnt ))

I tested here. K takes the default value 3 and the error rate is 0.013742,

Does not upload files, soCodePaste the test data below in chapter 2 of http://download.csdn.net/detail/wyb_009/5649337.

Import OS, mathdef sim_distance (A, B): sum_of_squares = sum ([Pow (A [I]-B [I], 2) for I in range (LEN (A)]) return sum_of_squares def KNN (data, dataset, datalabel, K = 3, similarity = sim_distance): Scores = [(sim_distance (data, dataset [I]), datalabel [I]) for I in range (LEN (Dataset)] sortedscore = sorted (scores, key = Lambda D: d [0], reverse = false) scores = sortedscore [0: K] classcount ={} for score in scores: classcount [score [1] = classcount. get (score [1], 0) + 1 sortedclasscount = sorted (classcount. items (), Key = Lambda D: d [1], reverse = true) return sortedclasscount [0] [0] def img2vector (filename ): VEC = [] file = open (filename) for I in range (32): line = file. readline () for J in range (32): Vec. append (INT (line [J]) return vecdef test (): traindata, trainlabel = [], [] trainfilelist = OS. listdir ('digits/trainingdigits/') for filename in trainfilelist: traindata. append (img2vector ('digits/trainingdigits/% s' % filename) trainlabel. append (INT (filename. split ('_') [0]) print "load train data OK" succcnt, failcnt = 0, 0 testfilelist = OS. listdir ('digits/testdigits ') for filename in testfilelist: Data = img2vector ('digits/testdigits/% s' % filename) num = KNN (data, traindata, trainlabel) if num = int (filename. split ('_') [0]): succcnt + = 1 print 'succ' else: failcnt + = 1 print 'fail: KNN get % lD, real is % ls' % (Num, INT (filename. split ('_') [0]) print "error rate is: % F" % (failcnt/float (failcnt + succcnt )) if _ name _ = "_ main _": Test ()

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Handwritten recognition of KNN in Machine Learning Practice

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Handwritten recognition of KNN in Machine Learning Practice

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support