A detailed semi-supervised learning method using EM algorithm applied to naive Bayesian text classification

Source: Internet
Author: User
1. Preface

Tagging a large number of text data that needs to be categorized is a tedious, time-consuming task, while the real world, such as the presence of large amounts of unlabeled data on the Internet, is easy and inexpensive to access. In the following sections, we introduce the use of semi-supervised learning and EM algorithms to fully combine a large number of unlabeled samples in order to obtain a higher accuracy of text classification. This article uses the polynomial naive Bayes as the classifier, training with the EM algorithm, using tagged data and unmarked data. The relationship between multi-class classification accuracy and the proportion of unlabeled data in training set is studied. and explore ways to reduce the computational cost of EM processes to speed up training. The results show that the semi-supervised EM-NB classifier can achieve the accuracy of more than 50% in the case of only 2% labeled data, and the accuracy rate is greater than 70% in the case of 33% labeled data. This article comes from Appendix 1 in the Reference, and the detailed code and introduction can be found in the links.

2. Introduction to the Model

3. Key Code Implementation

X. References

Appendix 1:text Classification Using EM and semi-supervised learning

A detailed semi-supervised learning method using EM algorithm applied to naive Bayesian text classification

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.