cross-validation (crossvalidation) method of Thinking
The following are referred to as cross-validation (Cross Validation) for the CV.CV is used to validate the performance of classifiers a statistical analysis method, the basic idea is to put the original data (dataset) in a sense of grouping, as part of the training set (train set), the other part as a validation set (validation set), the classifier is trained by training set, and the model is tested by using the validation set, which is used as the performance index of the classifier. Common CV methods are as follows:
1). Hold-out method
The original data were randomly divided into two groups, one set as the training set, the other as the validation set, the training classifier was used, then the verification model was used to record the final classification accuracy rate and the performance index of the classifier was hold-outmethod. The benefits of this approach are simple to handle, Just randomly divided into two groups of raw data, in fact, strictly speaking Hold-out method is not a CV, because this approach does not reach the intersection of the idea, because it is random to the original data grouping, so the final validation set classification accuracy of the high and the original data grouping has a great relationship, So the results of this approach are not persuasive.
2). K-fold Cross Validation (recorded as K-CV)
Divide the original data into K groups (usually divided evenly), each subset data is validated separately, the remainder of the K-1 group subset data is used as the training set, so the K model is obtained, and the average of the classification accuracy of the final verification set of the K model is used as the performance index of the K-CV classifier. K is generally greater than or equal to 2, The actual operation usually starts from 3, only then tries to take 2 when the raw data collection data quantity is small. K-CV can effectively avoid the occurrence of learning and the state of lack of learning, the final results are more persuasive.
3). Leave-one-out Cross Validation (recorded as LOO-CV)
If you set the original data to have n samples, then LOO-CV is N-CV, that is, each sample as a validation set, the rest of the N-1 sample as a training set, so LOO-CV will get n models, The average of the classification accuracy of the final verification set of these n models is used as the performance index of the lower LOO-CV classifier. There are two obvious advantages over the previous K-CV,LOO-CV:
①a. Almost all of the samples in each round are used in the training model, so the closest distribution to the original sample is obtained, so the results are more reliable.
②b. There are no random factors in the experiment to affect the experimental data to ensure that the experimental process can be replicated.
The disadvantage of LOO-CV, however, is that the computational costs are high because the number of models required is the same as the number of original data samples, and when the number of raw data samples is quite large, the LOO-CV is virtually impossible to show in practice, unless each training classifier gets the model fast Or you can use parallelization to reduce the amount of time it takes to compute.
from:http://www.ilovematlab.cn/viewthread.php?tid=49143