def wordfeatures (word): return {"Cnword": WORD}.....CLASSIFIER=NLTK. Naivebayesclassifier.train (samplewords) #大学所属的类别 http://blog.csdn.net/myhasplprint u "----University-affiliated categories-----" Print Classifier.classify ({"Cnword": U "University"}) #大脑所属的类别http://blog.csdn.net/myhasplprint u "----The category-----The brain belongs to" print Classifier.classify ({"Cnword": U "Brain"}) #测试数据分类准确率http://blog.csdn.net/myhasplprint nltk.classify.accuracy ( Classifier,testwords) #特征0分类最有效的10个词http://blog.csdn.net/myhasplfor Wf,mostword in Classifier.most_informative_ Features: Print mostword,print# to display the Utf-8, modify the Show_most_informative_features code http://blog.csdn.net/myhaspl# Classifier.show_most_informative_features (10) can also call this sentence directly, but UTF8 shows a problem http://blog.csdn.net/myhaspl cpdist = classifier . _feature_probdistprint (' most informative Features ') for (fname, fval) in Classifier.most_informative_features (10): def labelprob (L): Return cpdist[l, Fname].prob (fval) labels = sorted ([l for L in Classifier._labels If fval in Cpdist[l, fname]. samples ()], key=labelprob) If len (labels) = = 1:continue L0 = labels[0] L1 = labels[-1 ] If cpdist[l0, Fname].prob (fval) = = 0:ratio = ' INF ' else:ratio = '%8.1f '% (CPDIST[L1, Fname].prob ( FVal)/cpdist[l0, Fname].prob (fval)) print fname+ "=" +fval, print ('%6s:%-6s =%s:1.0 '% (("%s"% L1) [: 6], ("%s"% l0) [: 6], ratio))) Running Result: = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =>|----The category-----Education----The category of the brain-----technology 0.9773462 78317 World Company pre-game field using discipline inside technology most informative featurescnword= World Science and technology: education = 20.6:1.0cnword= Company Science and Technology: education = 12.4:1.0cnword= Advance Technology: education = 5.8:1.0cnword= Game Technology: education = 5.8:1.0cnword= after technology : Education = 4.5:1.0cnword= Field Science and technology: education = 4.5:1.0cnword= Adopting science and technology: education 4.5:1.0cnword= Subject Science and technology: education = 4.1:1.0cnword= inside Science and technology: education = 4.1:1.0cnword= Technology and technology: education 4.1:1
This blog all content is original, if reproduced please indicate sourcehttp://blog.csdn.net/myhaspl/
Naive Bayesian classification, classification of entries as above
The Road to Mathematics (machine Learning Practice Guide)-Text mining with NLP (6)