Clever Methods of Overfitting and how to avoid them

Last Update:2015-12-22 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Overfitting is the bane of data science in the age of Big Data. John Langford Reviews "Clever" methods of overfitting, including traditional, parameter tweak, brittle measures, bad Stati Stics, Human-loop overfitting, and gives suggestions and directions for avoiding overfitting.

Comments by John Langford (Microsoft, hunch.net)

(Gregory Piatetsky:i recently came across this classic post from 2005 by John Langford Clever Methods of Overfitt ING, which addresses one of the most critical issues in Data science. The problem of overfitting is a major bane of big data, and the issues described below be perhaps even more relevant than Before. I have made several of these mistakes myself in the past. John agreed to repost it in kdnuggets, so enjoy and comment if you find new methods)

"Overfitting" is traditionally defined as training some flexible representation so, it memorizes the data but fails to Predict well in the future. For this post, I'll define overfitting more generally as over-representing the performance of systems. There is both styles of general overfitting:over-representing performance on particular datasets and (implicitly) over-re Presenting performance of a method on the future datasets.

We should all is aware of these methods, avoid them where possible, and take them to account otherwise. I have used ' reproblem ' and ' old Datasets ', and may have participated in ' overfitting by review '-some of these are very di Fficult to avoid.

1. Traditional overfitting: Train a complex predictor on too-few examples.

Remedy:

Pristine examples for testing.
Use a simpler predictor.
Get more training examples.
Integrate over many predictors.
Reject papers which do this.

2. Parameter tweak overfitting: Use a learning algorithm with many parameters. Choose the parameters based on the test set performance.

For example, choosing the features so as to optimize test set performance can achieve this.

Remedy:same as above

3. Brittle Measure: Use a measure of performance which are especially brittle to overfitting.

Examples: "Entropy", "mutual information", and Leave-one-out cross-validation is all surprisingly brittle. This is particularly severe if used in conjunction with another approach.

Remedy:prefer less brittle measures of performance.

4. Bad Statistics: Misuse statistics to overstate confidences.

One common example is pretending this cross validation performance are drawn from a i.i.d. Gaussian, then using standard C Onfidence intervals. Cross validation errors is not independent. Another standard method is-make known-false-assumptions about some system and then derive excessive confidence.

Remedy:don ' t do this. Reject papers which do this.

5. Choice of Measure: Choose the best of accuracy, error rate, (A) ROC, F1, percent improvement on the previous best, percent improvement of ER ROR rate, etc. For your method. For bonus points, use ambiguous graphs.

This is fairly common and tempting.

Remedy:use canonical performance measures. For example, the performance measure directly motivated by the problem.

6. Incomplete prediction: Instead of (say) making a multiclass prediction, make a set of binary predictions, then compute the optimal multiclass p Rediction.

Sometimes it's tempting to leave a gap filled on by a human when you don ' t otherwise succeed.

Remedy:reject papers which do this.

7. Human-loop overfitting: Use a human as part of a learning algorithm and don ' t take to account overfitting by the entire Human/computer Interac tion.

This is subtle and comes in many forms. One example is a human using a clustering algorithm (on training and test examples) to the Guide Learning algorithm choice.

Remedy:make sure test examples is available to the human.

8. Data Set Selection: Chose to report results in some subset of datasets where your algorithm performs well.

The reason why we test on natural datasets is because we believe there are some structure captured by the past problems tha T helps on the future problems. Data Set Selection subverts this and was very difficult to detect.

Remedy:use comparisons on standard datasets. Select datasets without using the test set. Good Contest performance can ' t be faked.

9. Reprobleming: Alter The problem so, your performance improves.

Remedy:for example, take a time series dataset and use the cross validation. Or, ignore asymmetric false positive/false negative costs. This can is completely unintentional, for example when someone uses an ill-specified UCI dataset.

Remedy:discount papers which do this. Make sure problem specifications is clear.

ten. Old Datasets: Create an algorithm for the purpose of improving performance on old datasets.

After a DataSet have been released, algorithms can be made to perform well on the dataset using a process of feedback desig N, indicating better performance than we might expect in the future. Some conferences has canonical datasets that has been used for a decade ...&NBSP;

Remedy:prefer simplicity in algorithm design. Weight newer datasets higher in consideration. Making test examples not publicly available for datasets slows the feedback design process but does not eliminate It.&NBSP;

11. Overfitting by Review : People submit a paper to a conference. The one with the best result is accepted. &NBSP;

This is a systemic problem which are very difficult to detect or Eliminat E. We want to prefer presentation of the good results, but doing so can result in Overfitting.&NBSP;

Remedy:&NBSP;

be more pessimistic of confidence statements by papers at high rejection rate conferences.
Some People has advocated allowing the publishing of methods with poor performance. (I have doubts the would work.)

I has Personally observed all of the these methods in action, and there is doubtless others. &NBSP;

Selected comments on John ' s post: &NBSP;

Negative results:

Aleks jakulin:how about a index of negative results in machine learning? There ' s a Journal of negative Results in other Domains:ecology & evolutionary biology, biomedicine, and there Is journal of articles in support of the Null H Ypothesis. A section on negative results learning conferences? This kind of information are very useful in preventing people from taking pathways that leads Nowhere:if one wants to class Ify an algorithm to Good/bad, one certainly benefits from unexpectedly bad examples too, not just unexpectedly good exam Ples.
John Langford i visited the workshop on negative results at NIPS 2002. My impression was, it did not work well. The difficulty with negative results on machine learning was that they be too easy. For example, there was a plethora of ways to say that "learning was impossible (in the worst case)". On the applied side, it's still common for learning algorithms to don't work on simple-seeming problems. In this situation, positive results (this works) is generally more valuable than negative results (this doesn ' t work).

Brittle measures &NBSP;

What does mean by "brittle"? Why is mutual information brittle?
John Langford: What I mean by brittle:suppose you have a box which takes some feature values as input and predicts some probability of Label 1 as output. You aren't allowed to open this box or determine how it works other than by this process of giving it inputs and Observin G outputs.

Letxis an input.
Letyis an output.
Assume(x, y)is drawn from a fixed but unknown distributionD.
Letp (x)Be a prediction.

For classification errorI (|y–p (x) | < 0.5)You can prove a theorem of the rough form:
for all D, with a probability over the draw of M examples independently from D, expected classification error rate of the box with respect to D are bounded by a function of the the observations.

What I mean by "brittle" was that no statement of the This sort can be made for any unbounded loss (including Log-loss which are Integral to mutual information and entropy). You can of course open up the box and analyze it structure or make extra assumptions aboutDTo get a similar but inherently more limited analysis.

The situation with leave-one-out Cross validation are not so bad, but it's still pretty bad. In particular, there exists a very simple learning Algorithm/problem pair with the property that the Leave-one-out Estimat e have the variance and deviations of a single coin flip. Yoshua Bengio and Yves Grandvalet In fact proved this there is&NBSP; no unbiased Estimator of Variance . The paper that I pointed to above shows a for k-fold cross validation on&NBSP; m examples, all moments of the deviations might is as good as on a test Set of size $m/k$. &NBSP;

I ' m not sure what a ' valid summary ' is, but Leave-one-out cross Validat Ion can not provide results I trust, because I know how to break it.

I has personally observed people using Leave-one-out Cross validation with feature selection to quickly achieve a Severe overfit.

Related: &NBSP;

The Cardinal Sin of data Mining and data science:overfitting
Big Data Winter ahead–unless We change course, warns Michael Jordan
The first law of Data science:do umbrellas cause Rain?

Clever Methods of Overfitting and how to avoid them

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Clever Methods of Overfitting and how to avoid them

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Clever Methods of Overfitting and how to avoid them

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support