Demonstration System of Bayesian Forum junk post shielding |
Introduction: As Forum moderators, one of the tasks is to maintain the quality of Forum speeches, delete advertising posts, add fake posts, and so on. The purpose of the system development is to reduce the workload of the moderator and automatically identify a spam Demonstration System. The theoretical basis is the naive Bayes principle. The procedure is as follows: 1. First, log on to the system by registering an account with zule. 2. input the raw data of the training system. There are two types of spam and non-spam. 3. Enter the posts to be checked and view the percentage of spam posts. |
Welcome to discuss and improve this program.
Microsoft Asia Research Institute-natural language computing group
Thesis
- Dependency Language Model for Information Retrieval
Jianfeng Gao, Jian-yun nie, Guangyuan Wu and guihong Cao. "dependence language model for information retrieval", in SIGIR-2004. Sheffield, UK, July 25-29,200 4.
- A New Method for English-Chinese naming object alignment
Dong-hui Feng, ya-Juan LV, Ming Zhou, "a new approach for English-Chinese Named Entity alignment", 2004 Conference on empirical methods in natural language processing, Barcelona, Spain, jul. 2004.
- Automatic Acquisition of paired Translation Based on Single-language corpus
Ya-Juan LV, Ming Zhou, "collocation translation acquisition using monolingual injection a", 42nd Annual Meeting of the Association for computational linguistics, Barcelona, Spain, Jul. 2004.
- Adaptive Chinese Word Segmentation
Jianfeng Gao, andI Wu, Mu Li, Chang-ning Huang, Hongqiao Li, xinsong Xia and haowei Qin. "Adaptive Chinese word segmentation", 42nd Annual Meeting of the Association for computational linguistics, Barcelona, Spain, Jul. 2004.
- Use SVM to recognize Chinese New Words
Hongqiao Li, Chang-ning Huang, Jianfeng Gao and xiaozhong fan, "the use of SVM for Chinese New Word identification", in IJCNLP-04. sanya City, Hainan Island, China, March 22-24,200 4.
- Experience in obtaining long-distance dependency in language models
Jianfeng Gao and hisami Suzuki, "capturing long distance dependency for Language Modeling: An Empirical Study", in IJCNLP-04. Sanya City, Hainan Island, China, March 22-24,200 4.
- Word Translation Disambiguation Using bilingual bootstrapping
Hang Li and Cong Li, "Word Translation Disambiguation Using bilingual bootstrapping", computational linguistics 30 (1), 1-22,200 4.
- Text Classification Using stochastic keyword generation
Cong Li, Ji-rong Wen, and hang Li, "text classification using stochastic keyword generation", Proc. Of icml '2017-03,464.
- Uncertainty ction in collaborative bootstrapping: Measure and Algorithm
Yunbo Cao, hang Li, and Li Lian, "Uncertainty functions in collaborative bootstrapping: Measure and algorithm", Proc. of ACL '2017-03,327.
- Application of Improved source-channel model in Chinese Word Segmentation
Ya-jjianfeng Gao, Mu Li and Chang-ning Huang, "improved source-channel models for Chinese word segmentation", 41nd Annual Meeting of the Association for computational linguistics. sapporo. japan, July 7-12,200 3.
- Topic analysis using a Finite Mixture Model
Hang Li and Kenji yamanishi, "topic analysis using a finite mixture model", Information Processing & Management, 39 (4), 521-541, (2003 ).
- Using bilingual web data to mine and rank translations
Hang Li, Yunbo Cao, and Cong Li, "using bilingual web data to mine and rank translations", IEEE intelligent systems, vol. 18 (4), 54-59, (2003)