標籤:
特徵集分析
資料集為letter-recognition.data,一共為20000條資料,以逗號分隔,資料執行個體如下所示,第一列為字母標記,剩下的為不同的特徵。T,2,8,3,5,1,8,13,0,6,6,10,8,0,8,0,8
學習方法
1、讀入資料,並去除分隔號
2、將資料第一列作為標記,剩下的為訓練資料
3、初始化分類器並利用訓練資料進行訓練
4、利用測試資料驗證準確率
代碼
<span style="font-size:14px;"></span><span style="font-family:Courier New;font-size:12px;">import cv2import numpy as npimport matplotlib.pyplot as pltprint 'load data'data = np.loadtxt('letter-recognition.data',dtype = 'float32',delimiter = ',', converters= {0: lambda ch: ord(ch)-ord('A')})print 'split as train,test'train,test = np.vsplit(data,2)print 'train.shape:\t',train.shapeprint 'test.shape:\t',test.shapeprint 'split train as the response,trainData'response,trainData = np.hsplit(train,[1])print 'response.shape:\t',response.shapeprint 'trainData.shape:\t',trainData.shapeprint 'split the test as response,trainData'restest,testData = np.hsplit(test,[1])print 'Init the knn'knn = cv2.KNearest()knn.train(trainData,response)print 'test the knn'ret,result,neighbours,dist = knn.find_nearest(testData,5)print 'the rate:'correct = np.count_nonzero(result == restest)accuracy = correct*100.0/10000print 'accuracy is',accuracy,'%'</span>
結果
load datasplit as train,testtrain.shape:(10000, 17)test.shape:(10000, 17)split train as the response,trainDataresponse.shape:(10000, 1)trainData.shape:(10000, 16)split the test as response,trainDataInit the knntest the knnthe rate:accuracy is 93.22 %
資料集
http://download.csdn.net/detail/licong_carp/8612383
【Python-Opencv】KNN英文字母識別