International - English

Cart Console

Topic Center

Contact Sales

Home > Others

Affective classification-A solution to the imbalance of corpus classification

Last Update:2018-08-23 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, Introduction second, influence third, other people's solution data level: The algorithm level: four, individual solution five, Reference

First, Introduction

Before doing emotional classification problems are using SST, and so on, some classical corpus, but when you want to do the corpus, only to find that things are not as simple as imagined. To carry out corpus cleaning, corpus segmentation (10 intersection), now also consider the question of the balance of the corpus.

Imbalance problem: The number of corpora between categories varies greatly

Take a look at my corpus:

A total of 6 categories, the number varies very much. Second, influence

The imbalance of corpus number between categories is a factor that restricts the accuracy of many classification algorithms. Many classifiers tend to classify corpora into large classes, thus resulting in lower accuracy of classification. But the problem of unbalanced classification is real and pervasive, and many times those few are worthy of our attention. For example, cyber attacks, credit card illegal transactions, etc. The illegal transaction of credit card belongs to the few categories, and the classification is less accurate, so it is difficult to find the illegal record.
Why the less accurate rate of the classification of corpus is low. Because the characteristics of the few corpus are not obvious, it is easy to mix with the noise corpus. And most of the classification methods are based on the characteristics of the classification. The few features are not obvious, so it is difficult to distinguish the corpus of the few classes. Iii. Solutions for others

General Practice: Data level:

The sample is copied directly, that is, the category samples with less sample number are kept. Interpolation method: Through the sample normalization, sampling, obtain sample distribution, extreme value, mean, and so on, according to the sample distribution, extreme value, mean to generate new samples to expand the number of samples.

Less-than-sampled direct deletion randomly reduces the number of most class samples. Algorithm level:

Weighted loss function, a common method of dealing with unbalanced data is to set the weight of loss function, so that the loss of a few kinds of discriminant errors is greater than that of most kinds of discriminant errors. In Python's sk-learn we can use the Class_weight parameter to set weights, and to increase the weight of a few classes, such as 10 times times the number of classes.

RBG and kaiming give a pretty good method, not to be introduced in detail here.
For more information see links: http://blog.csdn.net/u014380165/article/details/77019084

See a blog:
It's a very large brain hole, a special sampling.

First of all, this paper analyzes the category of fewer samples, through the analysis of syntactic dependence of text, analyzes the related attributes of the word, and then uses the method of synonym substitution to generate new text. method is simple and effective

https://blog.csdn.net/u014535908/article/details/79035653 Iv. Individual Solutions

Do not think well, want to try, the results come out and fill in five, Reference

https://blog.csdn.net/jerryfy007/article/details/72904257
Http://blog.sina.com.cn/s/blog_afa352bf0102vo57.html
https://blog.csdn.net/u014380165/article/details/77019084

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

security classification tensorflow classification keras image classification titus document classification lidar point cloud classification dlp data classification matlab classification learner

OpenGL Series Tutorial Eight: OpenGL vertex buffer Object (VBO) 07-26

Methods for generating various waveform files Vcd,vpd,shm,fsdb 02-11

Mac Ping:sendto:Host is down Ping does not pass other people'... 09-01

Solution to the problem that WordPress cannot be opened after... 12-05

(SOLR is successfully installed on the office machine accordi... 12-07

Webmaster resources (site creation required) 12-07

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Affective classification-A solution to the imbalance of corpus classification

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support