1. Common steps
2. Chinese participle
1 This is relative to the English text affective analysis, Chinese unique preprocessing.
2 Common methods: Based on the dictionary, rule-based, Statistical, based on the word annotation, based on artificial intelligence.
3 Common tools: Hit-language cloud, Northeastern University Niutrans statistical Machine translation system, the Chinese Academy of Sciences Zhang Huaping Dr. Ictclas, Posen technology, stuttering participle, ansj participle, hanlp.
3. Feature Extraction
1 What the text takes as a feature.
2) commonly used methods: According to the part of speech (adj, adv, v), words are combined (Unigram, Bigram), location.
3 Use the combination of words to represent text, two ways: the occurrence of words or not, the number of words appear.
4. Feature Selection
1 Select which features, if all of the characteristics of the calculation as a feature, that the computation is very large, high dimensional sparse matrix.
2) commonly used methods: to stop the use of words, chi-square, mutual information.
3) Common tools: Word2vector, Doc2vec
5. Classification model
1) training, testing.
2 Common methods: Naive Bayesian, maximum entropy, SVM.
6. Evaluation indicators
1) Accuracy rate
Accuracy = (TP + TN)/(TP + FN + FP + TN) reflects the ability of the classifier to judge the whole sample--------------------positive judgment, negative judgment negative.
2) Accuracy rate
Precision = tp/(TP+FP) reflects the proportion of the true positive sample in the positive case determined by the classifier
3) Recall rate
Recall = tp/(TP+FN) reflects the proportion of positive cases that are correctly judged as the total positive case
7. Available resources
1 Chinese Word segmentation basic Algorithm Introduction
2) Ictclas Chinese pos annotation Set
3) Text Classification technology
4 Text categorization and SVM
5 text categorization algorithm based on Bayesian algorithm
6 based on LIBSVM Chinese text classification prototype
7) lda-math-Text modeling
8 Emotional Analysis Resources
9 feature extraction technology for affective analysis
9.1. The seventh course of natural language processing at Stanford University-affective analysis
10 depth learning, natural language processing and characterization methods
Deep Learning in NLP (one) word vector and language model