Learn naive Bayesian classification API using Python3
Design to string extract feature vectors
Welcome to my git download source code: https://github.com/linyi0604/kaggle
1 fromSklearn.datasetsImportfetch_20newsgroups2 fromSklearn.cross_validationImportTrain_test_split3 #Import text feature vector conversion module4 fromSklearn.feature_extraction.textImportCountvectorizer5 #import naive Bayesian model6 fromSklearn.naive_bayesImportMULTINOMIALNB7 #Model Evaluation Module8 fromSklearn.metricsImportClassification_report9 Ten " " One naive Bayesian model is widely used in mass Internet text classification tasks. A due to the hypothesis that the characteristic conditions are independent, the estimation of the estimated parameter scale from the exponential magnitude of the exponential is close to the linear magnitude, saving memory and computing time . - However, the model can not consider the relationship between the characteristics, the data related to the classification task is not good performance. - " " the - " " - 1 reading the data section - " " + #the API will download data even if it is networked -News = Fetch_20newsgroups (subset=" All") + #Check data size and details A #print (len (news.data)) at #print (news.data[0]) - " " - 18846 - - from:mamatha devineni Ratnam <[email protected]> - Subject:pens Fans reactions in organization:post Office, Carnegie Mellon, Pittsburgh, PA - Lines:12 to nntp-posting-host:po4.andrew.cmu.edu + - I am sure some bashers of Pens fans is pretty confused about the lack the of any kind of posts about the recent Pens massacre of the Devils. Actually, * I am bit puzzled too and a bit relieved. However, I am going to put an end $ To non-pittsburghers ' relief with a bit of praise for the Pens. Man, theyPanax Notoginseng Is killing those devils worse than I thought. Jagr just showed - He is much better than he regular season stats. He is also a lot the fo fun-to-watch in the playoffs. Bowman should let Jagr has a lot of + Fun on the next couple of games since the Pens is going to beat the pulp out of Jersey anyway. I was very disappointed don't to see the Islanders lose the final A regular season game. PENS RULE!!! the " " + - " " $ 2 Split data section $ " " -X_train, X_test, y_train, y_test =train_test_split (News.data, - News.target, thetest_size=0.25, -Random_state=33)Wuyi the " " - 3 Bayesian classifier predicts news Wu " " - #convert text to features AboutVEC =Countvectorizer () $X_train =vec.fit_transform (X_train) -X_test =vec.transform (x_test) - #Initialize naive Bayesian model -MNB =MULTINOMIALNB () A #Training set, estimating parameters + Mnb.fit (X_train, Y_train) the #prediction of a test set to save forecast results -Y_predict =mnb.predict (x_test) $ the " " the 4 Model Evaluation the " " the Print("Accuracy rate:", Mnb.score (X_test, y_test)) - Print("Other indicators: \ n", Classification_report (Y_test, Y_predict, target_names=news.target_names)) in " " the accuracy rate: 0.8397707979626485 the Other indicators: About Precision recall F1-score support the the alt.atheism 0.86 0.86 0.86 201 the comp.graphics 0.59 0.86 0.70 + Comp.os.ms-windows.misc 0.89 0.10 0.17 248 - comp.sys.ibm.pc.hardware 0.60 0.88 0.72 the Comp.sys.mac.hardware 0.93 0.78 0.85 242Bayi comp.windows.x 0.82 0.84 0.83 263 the Misc.forsale 0.91 0.70 0.79 257 the Rec.autos 0.89 0.89 0.89 238 - Rec.motorcycles 0.98 0.92 0.95 276 - Rec.sport.baseball 0.98 0.91 0.95 251 the Rec.sport.hockey 0.93 0.99 0.96 233 the Sci.crypt 0.86 0.98 0.91 238 the sci.electronics 0.85 0.88 0.86 249 the sci.med 0.92 0.94 0.93 245 - sci.space 0.89 0.96 0.92 221 the Soc.religion.christian 0.78 0.96 0.86 232 the talk.politics.guns 0.88 0.96 0.92 251 the talk.politics.mideast 0.90 0.98 0.94 23194 Talk.politics.misc 0.79 0.89 0.84 188 the Talk.religion.misc 0.93 0.44 0.60 158 the the avg/total 0.86 0.84 0.82 471298 " "
Machine learning Path: Python naive Bayesian classifier Predictive news category