Paper note "The Impact of imbalanced Training Data for CNN"

Source: Internet
Author: User

The original is: "The Impact of imbalanced Training Data for convolutional neural Networks"

This blog is the paper's reading notes, there is inevitably a lot of details of the wrong place.

Also hope that you crossing can forgive, welcome criticism correct.

More related blog please poke: http://blog.csdn.net/cyh_24

If you want to reprint, please attach this article link: http://blog.csdn.net/cyh_24/article/details/49871387

Abstract

This paper mainly studies the effect of using unbalanced data to train CNN on image classification. The data set used in this paper is CIFAR-10, and the author uses this database to manually generate different amounts of data for different types of distributions. For example, make one category of images occupy a large proportion, while the other is a small proportion. using the different training sets of these builds, train a CNN and test for the correct rate.

The results show that unbalanced training rallies have a significant negative impact on the results, while the training set can achieve the best performance in a balanced situation.

Furthermore, the paper concludes thatoversampling is a good and effective way to solve the problem of unbalanced training sets.

Experimental process DataSet

The data set used is CIFAR-10, which has 10 classes, 6000 per class, and a total of 6w images.

The CIFAR-10 is segmented, using 5000 of them as training and 1000 as the test image.

Generate different data distributions

Explain:

    • Dist.1 is balanced data, each class accounted for 10% of the weight;
    • Dist.2 showed that Airplane,automobile,bird and cat accounted for 8%, while the other categories accounted for 12% ... This should be able to read it.

So now there are 11 training sets, which are then trained using the same CNN, or tested using the original test data.

Oversampling

The Oversampling method used in this article is very simple:

For each category, some images are randomly selected for reproduction until the number of pictures is equal to the picture that accounts for the largest proportion.

Resultsdistribution performace

Oversampling Performance

The above is the oversampling after the training of CNN performance, you can see that almost every class has a promotion, but dist.1 (balanced training data) is the highest.

Total Performance

Average of the following per dist accuracy rate, the following table shows the accuracy of the comparison chart, dark color is the accuracy of imbalanced, light is oversampling after the accuracy rate.

The goal of the article is very clear, the idea is very simple, and no other trick, I also talked about this.

To summarize, the article tells the matter and the conclusion:

    1. The distribution of training data has a great impact on CNN results.
    2. Obviously, the balanced training set is optimal, the more unbalanced the data, the worse the accuracy rate;
    3. The use of oversampling can improve the accuracy rate;

Copyright NOTICE: If you want to reprint, please attach this article link, not very grateful! Author's homepage: http://blog.csdn.net/cyh_24

Paper note "The Impact of imbalanced Training Data for CNN"

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.