Opencv Python版學習筆記(八)字元識別-分類器(SVM,KNearest,RTrees,Boost,MLP)__Python

來源:互聯網
上載者:User

Opencv提供了幾種分類器,常式裡通過字元識別來進行說明的

1、支援向量機(SVM):給定訓練樣本,支援向量機建立一個超平面作為決策平面,使得正例和反例之間的隔離邊緣被最大化。

函數原型:訓練原型 cv2.SVM.train(trainData, responses[, varIdx[, sampleIdx[, params]]])

                   其中 trainData 為訓練資料,responses為對應資料的標識,

2、K近鄰(Knearest):K近鄰是移動惰性學習法,當給定大量資料集時,該演算法是計算密集的。最近鄰方法是基於類比學習,即通過將給定的檢驗元組與和它相似的訓練元組進行比較來學習。訓練元組用n個屬性來表示。當給定位置元組時,K近鄰找出最接近未知元組的k個訓練元組,未知元組被分配到k個最近鄰中最多的類。

函數原型:cv2.KNearest.train(trainData, responses[, sampleIdx[, isRegression[, maxK[, updateBase]]]])

                   其中,trainData為訓練資料,responses為對應的資料標識,isRegression表示迴歸運算還是訓練,maxK為最大鄰居數

3、隨機樹(RTrees):個體決策樹的每個節點使用隨機播放屬性決定劃分,每一棵樹依賴於獨立的抽樣,並與森林中所有的樹具有相同的分布的隨即向量的值。分類時,每棵樹都投票並且返回得票最多的類。

函數原型:cv2.RTrees.train(trainData, tflag, responses[, varIdx[, sampleIdx[, varType[, missingDataMask[, params]]]]])

                   其中trainData為訓練資料,responses為對應的資料標識,tflag表示特徵向量是行還是列表示,responses為表示對應資料標識

4、提升(Boost):權重賦予每個訓練元組。迭代的學習k個分類器,學習到分類器Mi後,更新權重,使得其後的分類器Mi+1更關注誤分類的訓練元組。Adaboost是一種流行的提升演算法。給定資料集D,它包含d個類標記的元組。開始對每個訓練元組賦予相等的權重1/d。為組合分類器產生k個基分類器。在第i輪,從D中元組進行抽樣,形成大小為d的訓練集Di。使用有放回抽樣--同一個元組可能被選中多次。每個元組被選中的機會由它的權重決定。從訓練集Di匯出分類器Mi。然後使用Di作為檢驗集計算Mi的誤差。如果元組不正確的分類,則它的權重增加。如果元組正確的分類,則它的權重減少。權重越高越可能錯誤地分類。使用這些權重為下一輪分類器產生訓練樣本。

函數原型:cv2.Boost.train(trainData, tflag, responses[, varIdx[, sampleIdx[, varType[, missingDataMask[, params[, update]]]]]])

5、多層感知(MLP):多層感知器用於解決單層神經網路不能解決非線性分類問題而提出的,訓練多層感知器的流行方法是反向傳播,通過多層感知能夠通過多個輸入產生單一的輸出達到分類的結果。

函數原型:cv2.ANN_MLP.train(inputs, outputs, sampleWeights[, sampleIdx[, params[, flags]]])

 

程式及注釋:

#decoding:utf-8import numpy as npimport cv2def load_base(fn):    a = np.loadtxt(fn, np.float32, delimiter=',', converters={ 0 : lambda ch : ord(ch)-ord('A') })#匯入的字母特徵資料,並將字母轉化為數字類別    samples, responses = a[:,1:], a[:,0]#將類別給responses,特徵給samples    return samples, responsesclass LetterStatModel(object):    class_n = 26    train_ratio = 0.5    def load(self, fn):        self.model.load(fn)    def save(self, fn):        self.model.save(fn)        def unroll_samples(self, samples):        sample_n, var_n = samples.shape#擷取特徵維數和特徵個數        new_samples = np.zeros((sample_n * self.class_n, var_n+1), np.float32)        new_samples[:,:-1] = np.repeat(samples, self.class_n, axis=0)        new_samples[:,-1] = np.tile(np.arange(self.class_n), sample_n)        return new_samples        def unroll_responses(self, responses):        sample_n = len(responses)        new_responses = np.zeros(sample_n*self.class_n, np.int32)        resp_idx = np.int32( responses + np.arange(sample_n)*self.class_n )        new_responses[resp_idx] = 1        return new_responsesclass RTrees(LetterStatModel):    def __init__(self):        self.model = cv2.RTrees()    def train(self, samples, responses):        sample_n, var_n = samples.shape        var_types = np.array([cv2.CV_VAR_NUMERICAL] * var_n + [cv2.CV_VAR_CATEGORICAL], np.uint8)        #CvRTParams(10,10,0,false,15,0,true,4,100,0.01f,CV_TERMCRIT_ITER));        params = dict(max_depth=10 )        self.model.train(samples, cv2.CV_ROW_SAMPLE, responses, varType = var_types, params = params)    def predict(self, samples):        return np.float32( [self.model.predict(s) for s in samples] )        class KNearest(LetterStatModel):    def __init__(self):        self.model = cv2.KNearest()    def train(self, samples, responses):        self.model.train(samples, responses)    def predict(self, samples):        retval, results, neigh_resp, dists = self.model.find_nearest(samples, k = 10)        return results.ravel()class Boost(LetterStatModel):    def __init__(self):        self.model = cv2.Boost()        def train(self, samples, responses):        sample_n, var_n = samples.shape        new_samples = self.unroll_samples(samples)        new_responses = self.unroll_responses(responses)        var_types = np.array([cv2.CV_VAR_NUMERICAL] * var_n + [cv2.CV_VAR_CATEGORICAL, cv2.CV_VAR_CATEGORICAL], np.uint8)        #CvBoostParams(CvBoost::REAL, 100, 0.95, 5, false, 0 )        params = dict(max_depth=5) #, use_surrogates=False)        self.model.train(new_samples, cv2.CV_ROW_SAMPLE, new_responses, varType = var_types, params=params)    def predict(self, samples):        new_samples = self.unroll_samples(samples)        pred = np.array( [self.model.predict(s, returnSum = True) for s in new_samples] )        pred = pred.reshape(-1, self.class_n).argmax(1)        return predclass SVM(LetterStatModel):    train_ratio = 0.1    def __init__(self):        self.model = cv2.SVM()    def train(self, samples, responses):        params = dict( kernel_type = cv2.SVM_LINEAR,                        svm_type = cv2.SVM_C_SVC,                       C = 1 )        self.model.train(samples, responses, params = params)    def predict(self, samples):        return np.float32( [self.model.predict(s) for s in samples] )class MLP(LetterStatModel):    def __init__(self):        self.model = cv2.ANN_MLP()    def train(self, samples, responses):        sample_n, var_n = samples.shape        new_responses = self.unroll_responses(responses).reshape(-1, self.class_n)        layer_sizes = np.int32([var_n, 100, 100, self.class_n])        self.model.create(layer_sizes)                # CvANN_MLP_TrainParams::BACKPROP,0.001        params = dict( term_crit = (cv2.TERM_CRITERIA_COUNT, 300, 0.01),                       train_method = cv2.ANN_MLP_TRAIN_PARAMS_BACKPROP,                        bp_dw_scale = 0.001,                       bp_moment_scale = 0.0 )        self.model.train(samples, np.float32(new_responses), None, params = params)    def predict(self, samples):        ret, resp = self.model.predict(samples)        return resp.argmax(-1)if __name__ == '__main__':    import getopt    import sys    models = [RTrees, KNearest, Boost, SVM, MLP] # NBayes    models = dict( [(cls.__name__.lower(), cls) for cls in models] )#將名字之母字母轉為小寫    print 'USAGE: letter_recog.py [--model <model>] [--data <data fn>] [--load <model fn>] [--save <model fn>]'    print 'Models: ', ', '.join(models)    print        args, dummy = getopt.getopt(sys.argv[1:], '', ['model=', 'data=', 'load=', 'save='])    args = dict(args)    args.setdefault('--model', 'boost')    args.setdefault('--data', '../letter-recognition.data')    print 'loading data %s ...' % args['--data']    samples, responses = load_base(args['--data'])    Model = models[args['--model']]    model = Model()    train_n = int(len(samples)*model.train_ratio)#擷取訓練資料的數目    if '--load' in args:        fn = args['--load']        print 'loading model from %s ...' % fn        model.load(fn)    else:        print 'training %s ...' % Model.__name__        model.train(samples[:train_n], responses[:train_n])    print 'testing...'    train_rate = np.mean(model.predict(samples[:train_n]) == responses[:train_n])#前一半進行訓練,並得到訓練準確率    test_rate  = np.mean(model.predict(samples[train_n:]) == responses[train_n:])#後一半進行測試,並得到測試準確率    print 'train rate: %f  test rate: %f' % (train_rate*100, test_rate*100)    if '--save' in args:        fn = args['--save']        print 'saving model to %s ...' % fn        model.save(fn)    cv2.destroyAllWindows() 


 

                    

相關文章

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.