Basic concepts: overfitting, trimming, false positive, false negative

Last Update:2018-12-04 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

  Generally, the entire training set is divided into two parts: the appointment of data60-80%Put it into our training set to generate the model; then put the rest of the data into a test set, and immediately use it to test the accuracy of our model after the model is generated.

Why is this extra step so important in this model? This problem is called overfitting: if we provide too manyData is used for model creation. Although our model will be perfectly created, it only applies to this data. Remember: we want to use this model to predict future unknowns. We do not want to use this model to accurately predict the values we already know. This is why we need to create a test set. After creating a model, we need to check to ensure that the accuracy of the model we created will not decrease in the test set. This ensures that our model can accurately predict unknown values in the future.

Trim. TrimAs the name implies, it means to cut down the branches of the classification tree. So why does someone want to delete the information from the category tree? It is because of overfitting. As the dataset increases and the number of attributes increases, the trees we create become more and more complex. Theoretically, a tree can haveLeaves=(Rows*Attributes ). But what are the benefits? In terms of predicting future unknowns, it cannot help us because it is only suitable for our existing training data. Therefore, we need a balance. We want our trees to be as simple as possible, with as few nodes and leaves as possible. We also want it to be as accurate as possible.

False refers to a data instance where the model we created predicts that it should be positive, but the actual value is negative, on the contrary. Likewise, false negative refers to a data instance where the model we created predicts that it should be negative, but on the contrary, the actual value is positive.

These errors indicate a problem in our model, and our model is incorrectly classifying some data. Although incorrect classification may occur, the acceptable percentage of errors is determined by the model creator. For example, if you test the heart monitor in a hospital, it is clear that a very low percentage of errors is required. If you only mine some fictitious data in Data Mining articles, the error rate can be higher. To make it further, you also need to determine the acceptable percentage of false negative and false positive. An example I immediately came up with is the spam model: a false positive (a real email is marked as spam) is more negative than a false positive (a spam message is not marked as spam) more destructive. In such an example, we can determine the false negative: the false positive rate is the lowest100:1Is acceptable.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Basic concepts: overfitting, trimming, false positive, false negative

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Basic concepts: overfitting, trimming, false positive, false negative

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support