"Deeplearning.ai Notes Lesson 1.1" Training set, validation set and test set

Source: Internet
Author: User

The data set is generally divided into three parts: train set, valid set, test set

It is used to train the model, adjust the hyper-parameters and test the model.

Where valid set is also called development set, referred to as Dev set. Cross-validation (hold-out crosses validation)

Randomly extract part of the data from a set of measurement data to build the model, and use the rest of the data to test the model's approach. The most common is 10 cross-validation, that is, the training set is randomly divided into 10 parts, each take a copy of the valid set, the remaining as train set. This gives the n model, n the result of the validation. Use the average of these n results to measure the performance of the model. distribution Ratio

Traditional machine learning phase (data set at the order of magnitude), the general distribution ratio is 6:2:2

In the era of big data, this ratio is less applicable. Because the millions data set, even with 1% of the data to do test also has 10,000, is enough. You can do the training with more data. So the common proportion can reach 98:1:1, even can reach 99.5:0.4:0.1 and so on. mismatched train/test distribution

In the actual project, there will be a training set and a validation set, the test set is not the same situation.

For example, the training set is to crawl the cat slices on the Internet, the verification set and the test set are the photos taken by their mobile phones.

In this case, the validation set and the test set are guaranteed to come from the same distribution, otherwise the evaluation of the model is problematic. Only train set and dev set are available without test set.
Many teams will refer to the dev set in this case as the test set

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.