KNN algorithm python implementation and simple digital recognition

Last Update:2014-11-16 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

KNN algorithm python implementation and simple digital recognition
Advantages and disadvantages of kNN algorithms: high accuracy, no sensitivity to abnormal values, and no input data assumption disadvantages: time complexity and space complexity are both high. applicable data range: Ideas of numeric and nominal algorithms: KNN algorithm (K-Nearest Neighbor Algorithm), the idea of the algorithm is very simple, simply put, it is a collection of things, that is, we find k from a bunch of known training sets closest to the target, then, let's see which of them is the most classification, and use this as the basis for classification. Function parsing: the Library Function tile () such as tile (A, n) is to repeat A n times a = np. array ([0, 1, 2]) np. tile (a, 2) array ([0, 1, 2, 0, 1, 2]) np. tile (a, (2, 2) array ([0, 1, 2, 0, 1, 2], [0, 1, 2, 0, 1, 2]) np. tile (a, (2, 1, 2) array ([[0, 1, 2, 0, 1, 2], [[0, 1, 2, 0, 1, 2]) B = np. array ([[1, 2], [3, 4]) np. tile (B, 2) array ([[1, 2, 1, 2], [3, 4, 3, 4]) np. tile (B, (2, 1) array ([[1, 2], [3, 4], [1, 2], [3, 4]) 'The self-implemented function createDataSet () generates the test array kNNclassif Y (inputX, dataSet, labels, k) classification function inputX input parameter dataSet training set labels training set label k Nearest Neighbor number copy code 1 # coding = UTF-8 2 from numpy import * 3 import operator 4 5 def createDataSet (): 6 group = array ([[1.0, 0.9], [1.0, 1.0], [0.1, 0.2], [0.0, 0.1]) 7 labels = ['A ', 'A', 'B', 'B'] 8 return group, labels 9 # inputX indicates the input vector (that is, we need to determine which type) 10 # dataSet indicates training sample 11 # label indicates the training sample label 12 # k is the nearest neighbor parameter, select the latest k 13 def kNNclassify (inputX, dataSet, Labels, k): 14 dataSetSize = dataSet. shape [0] # compute several training data 15 # Start to calculate Euclidean distance 16 diffMat = tile (inputX, (dataSetSize, 1 )) -dataSet17 18 sqDiffMat = diffMat ** 219 sqDistances = sqDiffMat. sum (axis = 1) # Add 20 distances = sqDistances to each vector in the matrix * 0.521 # The Euclidean distance is 22 sortedDistance = distances. argsort () 23 classCount = {} 24 for I in xrange (k): 25 voteLabel = labels [sortedDistance [I] 26 classCount [voteLabel] = classCo Unt. get (voteLabel, 0) + 127 res = max (classCount) 28 return res29 30 def main (): 31 group, labels = createDataSet () 32 t = kNNclassify ([], group, labels, 3) 33 print t34 35 if _ name __= = '_ main _': 36 main () 37. Copy the implementation dataset of The kNN application instance's handwriting recognition system: two datasets: training and test. The classification label is in the file name. Pixel 32*32. The data looks like this: Method: kNN is used, but the distance is complicated (1024 features), mainly to solve the problem of how to read data, you can directly call this method for comparison. Speed: the speed is still relatively slow. Here the dataset is: training 2000 +, test 900 + (i5 CPU) when k = 3, it will take 32 s + to copy the Code 1 # coding = UTF-8 2 from numpy import * 3 import operator 4 import OS 5 import time 6 7 def createDataSet (): 8 group = array ([[1.0, 0.9], [1.0, 1.0], [0.1, 0.2], [0.0, 0.1]) 9 labels = ['A ', 'A', 'B', 'B'] 10 return group, labels11 # inputX indicates the input vector (that is, we need to determine which type of vector it belongs) 12 # dataSet indicates training sample 13 # label indicates the training sample label 14 # k is the nearest neighbor parameter, select the latest k 15 def kNNclassify (input X, dataSet, labels, k): 16 dataSetSize = dataSet. shape [0] # Calculation of several training data 17 # Start to calculate Euclidean distance 18 diffMat = tile (inputX, (dataSetSize, 1)-dataSet19 # diffMat = inputX. repeat (dataSetSize, aixs = 1)-dataSet20 sqDiffMat = diffMat ** 221 sqDistances = sqDiffMat. sum (axis = 1) # add 22 distances = sqDistances to each vector in the matrix * 0.523 # The Euclidean distance is 24 sortedDistance = distances. argsort () 25 classCount = {} 26 for I in xrange (k): 27 voteLa Bel = labels [sortedDistance [I] 28 classCount [voteLabel] = classCount. get (voteLabel, 0) + 129 res = max (classCount) 30 return res31 32 def img2vec (filename): 33 returnVec = zeros () 34 fr = open (filename) 35 for I in range (32): 36 lineStr = fr. readline () 37 for j in range (32): 38 returnVec [0, 32 * I + j] = int (lineStr [j]) 39 return returnVec40 41 def handwritingClassTest (trainingFloder, testFloder, k): 42 hw Labels = [] 43 trainingFileList = OS. listdir (trainingFloder) 44 m = len (trainingFileList) 45 trainingMat = zeros (m, 1024) 46 for I in range (m ): 47 fileName = trainingFileList [I] 48 fileStr = fileName. split ('. ') [0] 49 classNumStr = int (fileStr. split ('_') [0]) 50 hwLabels. append (classNumStr) 51 trainingMat [I,:] = img2vec (trainingFloder + '/' + fileName) 52 testFileList = OS. listdir (testFloder) 53 errorCount = 0.05 4 mTest = len (testFileList) 55 for I in range (mTest): 56 fileName = testFileList [I] 57 fileStr = fileName. split ('. ') [0] 58 classNumStr = int (fileStr. split ('_') [0]) 59 vectorUnderTest = img2vec (testFloder + '/' + fileName) 60 classifierResult = kNNclassify (vectorUnderTest, trainingMat, hwLabels, K) 61 # print classifierResult, '', classNumStr62 if classifierResult! = ClassNumStr: 63 errorCount + = 164 print 'tatal error', errorCount65 print 'errorrate', errorCount/mTest66 67 def main (): 68 t1 = time. clock () 69 handwritingClassTest ('trainingdigits ', 'testdigits', 3) 70 t2 = time. clock () 71 print 'execute ', t2-t172 if _ name __= =' _ main _ ': 73 main () 74

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

KNN algorithm python implementation and simple digital recognition

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

KNN algorithm python implementation and simple digital recognition

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support