Paper 35: Cross-validation (crossvalidation) method ideas

Last Update:2016-03-06 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Introduction to the idea of cross-validation (Crossvalidation) method

The following is referred to as cross-validation (crosses Validation) for CV.CV is used to verify the performance of the classifier a statistical analysis method, the basic idea is to put the original data (dataset) in a sense of grouping, part of the training set (train set), the other part as a validation set (validation set), first training the classifier with the training set, using the validation set to test the trained model, as the performance index of the evaluation classifier. Common CV methods are as follows:

1). Hold-out Method

The original data is randomly divided into two groups, one set as a training set, a set as a validation set, training classifier using training set, and then using validation set to verify the model, record the final classification accuracy rate for this Hold-outmethod classifier performance index. The benefits of this approach are simple to handle, Just randomly divide the original data into two groups, in fact, strictly speaking Hold-out method is not a CV, because this method does not meet the idea of intersection, because it is randomly grouped raw data, so the final verification set classification accuracy rate and the original data of the group has a great relationship, So the results of this approach are not persuasive.

2). K-fold Cross Validation (recorded as K-CV)

The original data is divided into K-groups (generally, evenly), each subset of data to do a validation set, the rest of the K-1 group subset of data as a training set, so that the K model, with the K model of the final validation set of the average classification accuracy of the K-CV under the classifier performance indicators. K is generally greater than or equal to 2, The actual operation is generally starting from 3, only when the original data collection of small amount of data will try to fetch 2. K-CV can effectively avoid the occurrence of learning and lack of learning, and finally the results are more persuasive.

3). Leave-one-out Cross Validation (recorded as LOO-CV)

If the original data has n samples, then LOO-CV is N-CV, that is, each sample as a validation set, the remaining N-1 samples as a training set, so LOO-CV will get n models, The average of the classification accuracy of the final verification set of these n models is used as the performance index of the lower LOO-CV classifier. There are two distinct advantages compared to the previous K-CV,LOO-CV:

①
A. Almost all of the samples in each round are used to train the model, so it is closest to the distribution of the original sample, which results in a more reliable evaluation.

Ii
B. There are no random factors in the experimental process that can affect the experimental data and ensure that the experimental process is reproducible.

But the disadvantage of LOO-CV is that the computational cost is high, because the number of models that need to be built is the same as the number of raw data samples, and when the number of raw data samples is quite large, the difficulty of LOO-CV in practice is almost not shown, unless each training classifier gets a fast model, Or you can use parallelization to calculate the time required to reduce the computation.

Paper 35: Cross-validation (crossvalidation) method ideas

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Paper 35: Cross-validation (crossvalidation) method ideas

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Paper 35: Cross-validation (crossvalidation) method ideas

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support