The application of big data in education Part2 notes

Last Update:2014-12-14 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

K-Layer cross-examination is to randomly divide the original data into K-parts. In the K section, choose one as the test data, the remaining K-1 as the training data.

The process of cross-examination is actually to repeat the experiment K times, each experiment from the K part of the selection of a different part of the test data (to ensure that the K-part of the data are tested separately), the rest of the K-1 as training data for the experiment, and finally the results of the k experiments averaged.

Http://www.ilovematlab.cn/thread-49143-1-1.html

Introduction to the idea of cross-validation (crossvalidation) method

The following abbreviation for cross-validation (crosses Validation) is CV. CV is used to verify the performance of classifiers a statistical analysis method , the basic idea is to put the original data (dataset) in a sense of grouping , part as a training set (train set), another part as a validation set (validation set), first training the classifier with the training set , using the validation set to test the trained model , as the Performance index of the evaluation classifier . Common the CV method is as follows :1). Hold-out Method

The raw data is randomly divided into two groups , one set as the training set , one set as the validation set , the training classifier is trained with the train set , and then the validation set is used to validate the model . Record the final classification accuracy rate for this Hold-outmethod under the classifier's performance index . The benefits of this method of processing simple , just randomly divided the original data into two groups , in fact, strictly speaking , Hold-out Method is not a CV, Because this method does not achieve the idea of crossover , because it is randomly grouped raw data , so the final verification set classification accuracy of the high and low and the original data grouping has a great relationship, So the results of this approach are not persuasive .

2). K-fold Cross Validation ( recorded as K-CV)

divide the raw data into K -Groups ( usually evenly divided), each subset of data to do a validation set , the rest of the K-1 Group subset of data as a training set , so that you can get a K model , The average of the classification accuracy of the final verification set of K model is used as the Performance index of this K-CV classifier. K is generally greater than or equal to 2, the actual operation is generally starting from 3 to take , only when the raw data collection of small amount of time will try to fetch 2.K-CV Can effectively avoid the occurrence of learning and lack of learning , the final results are also more persuasive .

3). Leave-one-out Cross Validation ( recorded as LOO-CV)

if the original data is set N Samples , then LOO-CV is N-CV, that is, each sample as a validation set , the remaining N-1 samples as the training set , so LOO-CV will get n models , using the average of the classification accuracy of the final validation set of the N model as the performance index of the lower LOO-CV classifier. . compared to the previous the K-CV,LOO-CV has two distinct advantages :

①
A. Almost all of the samples in each round are used to train the model , so it is closest to the distribution of the original sample , which results in a more reliable evaluation.

②
B. There are no random factors in the experimental process that can affect the experimental data and ensure that the experimental process is reproducible.

But the disadvantage of LOO-CV is that it is computationally expensive because the number of models that need to be built is the same as the number of raw data samples, and when the raw data sample count is quite large , LOO-CV In practice, the difficulty is almost not shown , except that each time the training classifier gets the model quickly , or it can use parallelization to calculate the time required to reduce the computation .If you understand K-fold cross validation, this is about the same as what it means. K-fold, is to take the whole sample of 1/k as a predictive sample, (K-1)/k as a training sample. When a training sample is used to model the data, a predictive sample is used to predict it.
Leave-one-out is the N-1 sample as a training set, leaving a sample as a predictive set. and loop so that each sample acts as a prediction set, and then calculates the correct rate of cross-validation. http://blog.xuite.net/x5super/studyroom/61471385-%E4%B8%80%E7%AF%87%E5%BE%88%E6%A3%92%E7%9A%84%E6%B8%AC%E8 %a9%a6%28%e5%9b%9e%e6%b8%ac%29%e6%8a%80%e8%a1%93%e6%96%87%e7%ab%a0

Application of Big data in education Part2 notes

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

The application of big data in education Part2 notes

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

The application of big data in education Part2 notes

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support