What about the imbalance of positive and negative samples in training set

Source: Internet
Author: User

1, sample imbalance is solved by oversampling and under sampling

1, oversampling: Oversampling is also called Upper sampling (over-sampling). This method achieves sample equalization by increasing the number of samples in the classification. The most straightforward approach is to simply copy a few samples to form multiple records. For example, the positive and negative ratio is 1:10, then we can copy the positive example 9 times to achieve positive and negative ratio 1:1. However, the disadvantage of this method is that if the sample features are small, it may lead to overfitting, and an improved oversampling method can generate new synthetic samples by adding random noise, interfering data, or by certain rules to a few classes, such as the smote algorithm.
2, under sampling: Under-sampling is also called the lower sampling (under-sampling), this method by reducing the number of samples in the classification of most of the sample size to achieve sample equalization, the most straightforward way is to randomly remove some of the majority of class samples to reduce the size of most classes, the disadvantage is that the majority of samples of some important information.

In summary, oversampling and under-sampling are better suited to uneven distribution of large data, especially for the first (oversampling) application in general.

2, the sample imbalance is solved by the penalty weights of the positive and negative samples.
3, the sample imbalance is solved by combining the integrated method.
4. Solve sample imbalance by feature selection

The following three methods do not specifically expand the record, the specific steps see the reference address
Reference Address: https://www.zhihu.com/question/56662976

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.