Paper 35: Cross-validation (crossvalidation) method ideas

Source: Internet
Author: User

Introduction to the idea of cross-validation (Crossvalidation) method

The following is referred to as cross-validation (crosses Validation) for CV.CV is used to verify the performance of the classifier a statistical analysis method, the basic idea is to put the original data (dataset) in a sense of grouping, part of the training set (train set), the other part as a validation set (validation set), first training the classifier with the training set, using the validation set to test the trained model, as the performance index of the evaluation classifier. Common CV methods are as follows:

1). Hold-out Method

The original data is randomly divided into two groups, one set as a training set, a set as a validation set, training classifier using training set, and then using validation set to verify the model, record the final classification accuracy rate for this Hold-outmethod classifier performance index. The benefits of this approach are simple to handle, Just randomly divide the original data into two groups, in fact, strictly speaking Hold-out method is not a CV, because this method does not meet the idea of intersection, because it is randomly grouped raw data, so the final verification set classification accuracy rate and the original data of the group has a great relationship, So the results of this approach are not persuasive.

2). K-fold Cross Validation (recorded as K-CV)

The original data is divided into K-groups (generally, evenly), each subset of data to do a validation set, the rest of the K-1 group subset of data as a training set, so that the K model, with the K model of the final validation set of the average classification accuracy of the K-CV under the classifier performance indicators. K is generally greater than or equal to 2, The actual operation is generally starting from 3, only when the original data collection of small amount of data will try to fetch 2. K-CV can effectively avoid the occurrence of learning and lack of learning, and finally the results are more persuasive.

3). Leave-one-out Cross Validation (recorded as LOO-CV)

If the original data has n samples, then LOO-CV is N-CV, that is, each sample as a validation set, the remaining N-1 samples as a training set, so LOO-CV will get n models, The average of the classification accuracy of the final verification set of these n models is used as the performance index of the lower LOO-CV classifier. There are two distinct advantages compared to the previous K-CV,LOO-CV:


A. Almost all of the samples in each round are used to train the model, so it is closest to the distribution of the original sample, which results in a more reliable evaluation.

Ii
B. There are no random factors in the experimental process that can affect the experimental data and ensure that the experimental process is reproducible.

But the disadvantage of LOO-CV is that the computational cost is high, because the number of models that need to be built is the same as the number of raw data samples, and when the number of raw data samples is quite large, the difficulty of LOO-CV in practice is almost not shown, unless each training classifier gets a fast model, Or you can use parallelization to calculate the time required to reduce the computation.

Paper 35: Cross-validation (crossvalidation) method ideas

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.