標籤:python sklearn 交叉驗證 k折驗證 k-fold
本文K折驗證擬採用的是 Python 中 sklearn 包中的 StratifiedKFold 方法。
方法思想詳見:http://scikit-learn.org/stable/modules/cross_validation.html
StratifiedKFold is a variation of k-fold which returns stratified folds: each set contains approximately the same percentage of samples of each target class as the complete set.
【譯】
StratifiedKFold 是一種將資料集中每一類樣本的資料成分,按均等方式拆分的方法。
其它劃分方法詳見:http://scikit-learn.org/stable/modules/cross_validation.html
閑言少敘,直接上代碼。
【屌絲源碼】
import numpyimport h5pyimport sklearnfrom sklearn import cluster,cross_validationfrom sklearn.cluster import AgglomerativeClusteringfrom sklearn.cross_validation import StratifiedKFold## 產生一個隨機矩陣並儲存#arr = numpy.random.random([200,400])#labvec = []#for i in numpy.arange(0,200):# j = i%10# arr[i,j*20:j*20+20] = arr[i,j*20:j*20+20]+10# labvec.append(j)#arr = arr.T#file = h5py.File('arr.mat','w') #file.create_dataset('arr', data = arr)#file.close()#file = h5py.File('labvec.mat','w') #file.create_dataset('labvec', data = labvec)#file.close()# 讀方式開啟檔案myfile=h5py.File('arr.mat','r')arr = myfile['arr'][:]myfile.close()arr = arr.Tmyfile=h5py.File('labvec.mat','r')labvec = myfile['labvec'][:]myfile.close()skf = StratifiedKFold(labvec, 4)train_set = []test_set = []for train, test in skf: train_set.append(train) test_set.append(test)
詳見:
http://scikit-learn.org/stable/modules/cross_validation.html
著作權聲明:本文為博主原創文章,未經博主允許不得轉載。
Python 之 sklearn 交叉驗證 資料拆分