Handling skewed data---trading off precision and recall

Source: Internet
Author: User

Trade-offs between Preision and recall

Is still an example of cancer prediction, when predicted as cancer, y=1; generally as a logistic regression we are hθ when >=0.5 (x) Y=1;

When we want to be more confident in predicting cancer (for patients who say they have cancer that will have a significant effect on them, let them go to therapy, so be more sure to tell the patient cancer predictions): We can set the threshold to 0.7, At this point we will have a high precision (because the cancer is very confident), and a low value recall, if threshold is set to 0.9---> High precision, and a low value recall

When we want to avoid missing patients with cancer (avoid false negatives, i.e. we do not want a patient to have a cancer, but we do not tell him, delaying his treatment): Set threshold to 0.3, when we get a low precision (there are many cancer that are actually mistaken) and a high recall (because most of the cancer are labeled).

So for most regression models, we need to weigh precision and recall.

The Precision&recall curve (which changes with the change of threshold), as shown on the right, has a number of possibilities for the Precision&recall curve, depending on the specific algorithm.

So can we automatically pick the right threshold?

How to choose the right threshold

The threshold value of the above three algorithms is different, that is, precision and recall value are different, then we should choose which of the above three models? ----We need an evaluation question value (evaluation metric) to measure.

Precison and recall cannot be evaluation metric, because they are different two numbers (this is the elimination).

If we use the average to do this evaluation metric: we can see that the average value of algorithm 3 is the largest, but the algorithm 3 is not a good algorithm, because we can predict all Y to 1 (will be threshold down) to achieve high recall, Low precision, which is obviously not a good algorithm, but it has a very good average, so we can not use average as evaluation metric.

F score (or F1 score): Used in machine learning to measure precision and recall evaluation metric (used to select threshold), when Precison and recall have an hour, The F-value obtained by this formula is also very small, which prevents the error that we mentioned above by using the average to measure. That is, as long as the F value is large, then the precision and recall are larger.

If precision or recall has a value of 0 for 0,f, and if it is a perfect model, that is, precision and recall are 1, then the F value is 1, so the range of F values in reality is between 0-1.

Summarize

    1. Trade-offs between precision and recall (change their values by changing the threshold)
    2. Different threshold correspond to different precison and recall, how to choose the appropriate threshold to get a good model (through the F-value on the cross validation set model selection)
    3. If you want to select threshold automatically, try a series of different thresholdand select them on the cross validation

Handling skewed data---trading off precision and recall

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.