Exception value Handling

Source: Internet
Author: User

Outlier is one of the key points of model optimization, the previous knowledge of outliers only know that even outliers are far from the mean, but how far is far enough, in fact, different models have different considerations, based on the impact of the model is different, so can endure the outliers are different.

1, the type of the exception value

From the two-dimensional point of view, in fact, there are three types of outliers, one is to affect the vertical direction of y outliers, called vertical specificity, corresponding to detect such anomalies of the index is standardized residuals (student-based residuals can also); the second is to affect both X and Y outliers, the corresponding detection of such anomalies of the indicator is cook value, The third is the anomaly value of x that affects the horizontal direction, called the lever value, and the indicator for detecting this type of anomaly is the leverage ratio.

2, different models of the types of attention to outliers and processing methods

The processing of outliers is distinguished from the angle of the x, y variable in different models. One is that there is no Y model, only a series of x, through descriptive analysis, the production of the box diagram to detect outliers, in this case, the outliers of a single variable is generally not deleted, just be vigilant. Second, for the model with y, for y is linear, such as linear regression model, the model focuses on vertical specificity (the specific reason is related to the standard deviation of the estimate), so it is mainly to compare the standardized residuals with +-2, which is the abnormal value; for y two categorical variables, such as the logistic model, Three types of outliers are to be considered, so it is necessary to combine the standardized residuals, cook values, leverage three indicators to consider together, resulting in deviation residuals (bounded value of 8), Piersenkafan (bounded value of 100), this case outliers to be deleted. Third, there is no X, y of the model, X, Y is equally important, through the cluster analysis of the two-dimensional scatter plot to detect outliers, if it is in data mining, outliers may not be deleted, but focus on checking, because outliers represent the behavior of consumers in the minority behavior, perhaps the VIP behavior.

Exception value Handling

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.