0. Note the Chinese encoding of Weka
Runweka.ini-----"Fileencoding=utf-8
1. First to the word-breaker after the discovery of the word breaker, converted to Arff file command
Java weka.core.converters.textdirectoryloader-dir D:\weibo\catagory\data10W\nlpirSegment\noNI > D:\weibo\ Catagory\data10w\nlpirsegment\weka\wb10w.arff
Find transitions particularly fast
2. Open the above file to generate the word vector, first select through features of the have, 1000 features/each class of documents, and finally save the current file Wb10w_vsm_true_false_weight.arff
Generate more than 6,000 features
3. Open the above ARRF file, make feature selection, evaluate strategy select IG, search strategy select Ranker feature number 5000, save file WB10W_AS_TRUE_FALSE_WEIGHT.ARRF
4.bayes Classifier 66 Training data results copied to the result file
The text classification of the first practice of Weka