Cross verification in sklearn

Last Update:2018-11-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Cross-validation in sklearn)

Sklearn is a very comprehensive and useful third-party library for machine learning using python. Today, I will record the usage of cross-validation in sklearn. I will mainly explain sklearn official documents cross-validation: Evaluating estimator performance. I suggest you read the official documents for good English skills, the knowledge points are detailed.

1. cross_val_score
Perform a specified number of cross-validation on the dataset and evaluate the effectiveness of each verification.
The score is evaluated based on scoring = 'f1 _ macro 'by default. For classification or regression, the remainder is as follows:

This requires the from sklearn import metrics to set the evaluation standard by specifying parameters in cross_val_score;
When CV is set to int type, kfold or stratifiedkfold is used by default to disrupt the dataset. The following describes kfold and stratifiedkfold.

In [15]: from sklearn.model_selection import cross_val_scoreIn [16]: clf = svm.SVC(kernel=‘linear‘, C=1)In [17]: scores = cross_val_score(clf, iris.data, iris.target, cv=5)In [18]: scoresOut[18]: array([ 0.96666667,  1.        ,  0.96666667,  0.96666667,  1.        ])In [19]: scores.mean()Out[19]: 0.98000000000000009

In addition to the default cross-validation method, you can specify the cross-validation method, such as the number of verifications and the proportion of the training set test set.

In [20]: from sklearn.model_selection import ShuffleSplitIn [21]: n_samples = iris.data.shape[0]In [22]: cv = ShuffleSplit(n_splits=3, test_size=.3, random_state=0)In [23]: cross_val_score(clf, iris.data, iris.target, cv=cv)Out[23]: array([ 0.97777778,  0.97777778,  1.        ])

2. cross_val_predict
Cross_val_predict is very similar to cross_val_score, but unlike the returned result, cross_val_predict returns the estimator classification result (or regression value), which is important for later model improvement, the prediction output can be used to compare the actual target values and accurately locate the predicted error. This is very important for Parameter Optimization and troubleshooting.

In [28]: from sklearn.model_selection import cross_val_predictIn [29]: from sklearn import metricsIn [30]: predicted = cross_val_predict(clf, iris.data, iris.target, cv=10)In [31]: predictedOut[31]: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,       0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,       1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1,       1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2,       2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2,       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])In [32]: metrics.accuracy_score(iris.target, predicted)Out[32]: 0.96666666666666667

3. kfold

K-fold cross-verification is an official solution for dividing a dataset into k parts. K-fold means that a dataset is divided into k parts, so that all data has been present in the training set, it has appeared in the test set again. Of course, there will be no overlap in each split. This is equivalent to sampling without replacement.

In [33]: from sklearn.model_selection import KFoldIn [34]: X = [‘a‘,‘b‘,‘c‘,‘d‘]In [35]: kf = KFold(n_splits=2)In [36]: for train, test in kf.split(X):    ...:     print train, test    ...:     print np.array(X)[train], np.array(X)[test]    ...:     print ‘\n‘    ...:     [2 3] [0 1][‘c‘ ‘d‘] [‘a‘ ‘b‘][0 1] [2 3][‘a‘ ‘b‘] [‘c‘ ‘d‘]

4. leaveoneout
Leaveoneout is actually a special case of kfold. Because there are many times of use, it is defined independently and can be fully implemented through kfold.

In [37]: From sklearn. model_selection import leaveoneoutin [38]: x = [1, 2, 4] in [39]: loo = leaveoneout () in [41]: for train, test in loo. split (x ):...: Print train, test...: [1 2 3] [0] [0 2 3] [1] [0 1 3] [2] [0 1 2] [3] # Use kfold to implement leaveoneotutin [42]: kf = kfold (n_splits = Len (x) in [43]: for train, test in KF. split (x ):...: Print train, test...: [1 2 3] [0] [0 2 3] [1] [0 1 3] [2] [0 1 2] [3]

5. leavepout
This is also a special case of kfold. It is a little complicated to implement it with kfold, which is similar to leaveoneout.

In [44]: from sklearn.model_selection import LeavePOutIn [45]: X = np.ones(4)In [46]: lpo = LeavePOut(p=2)In [47]: for train, test in lpo.split(X):    ...:     print train, test    ...:     [2 3] [0 1][1 3] [0 2][1 2] [0 3][0 3] [1 2][0 2] [1 3][0 1] [2 3]

6. shufflesplit
Shufflesplit: its usage is similar to leavepout. In fact, the two are completely different. leavepout is a set of elements that appear in all test sets after the dataset is divided several times, that is, sampling without replacement, while shufflesplit Is Sampling with replacement. It can only be said that after a large enough number of samples, the test set has a multiple of the completed data sets.

In [48]: from sklearn.model_selection import ShuffleSplitIn [49]: X = np.arange(5)In [50]: ss = ShuffleSplit(n_splits=3, test_size=.25, random_state=0)In [51]: for train_index, test_index in ss.split(X):    ...:     print train_index, test_index    ...:     [1 3 4] [2 0][1 4 3] [0 2][4 0 2] [1 3]

7. stratifiedkfold

Sampling the test set without replacement

In [52]: from sklearn.model_selection import StratifiedKFoldIn [53]: X = np.ones(10)In [54]: y = [0,0,0,0,1,1,1,1,1,1]In [55]: skf = StratifiedKFold(n_splits=3)In [56]: for train, test in skf.split(X,y):    ...:     print train, test    ...:     [2 3 6 7 8 9] [0 1 4 5][0 1 3 4 5 8 9] [2 6 7][0 1 2 4 5 6 7] [3 8 9]

Original: 71915259

Cross verification in sklearn

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Cross verification in sklearn

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Cross verification in sklearn

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support