Machine learning Path: Python naive Bayesian classifier Predictive news category

Source: Internet
Author: User

Learn naive Bayesian classification API using Python3

Design to string extract feature vectors

Welcome to my git download source code: https://github.com/linyi0604/kaggle

1  fromSklearn.datasetsImportfetch_20newsgroups2  fromSklearn.cross_validationImportTrain_test_split3 #Import text feature vector conversion module4  fromSklearn.feature_extraction.textImportCountvectorizer5 #import naive Bayesian model6  fromSklearn.naive_bayesImportMULTINOMIALNB7 #Model Evaluation Module8  fromSklearn.metricsImportClassification_report9 Ten " " One naive Bayesian model is widely used in mass Internet text classification tasks.  A due to the hypothesis that the characteristic conditions are independent, the estimation of the estimated parameter scale from the exponential magnitude of the exponential is close to the linear magnitude, saving memory and computing time . - However, the model can not consider the relationship between the characteristics, the data related to the classification task is not good performance.  - " " the  - " " - 1 reading the data section - " " + #the API will download data even if it is networked -News = Fetch_20newsgroups (subset=" All") + #Check data size and details A #print (len (news.data)) at #print (news.data[0]) - " " - 18846 -  - from:mamatha devineni Ratnam <[email protected]> - Subject:pens Fans reactions in organization:post Office, Carnegie Mellon, Pittsburgh, PA - Lines:12 to nntp-posting-host:po4.andrew.cmu.edu +  - I am sure some bashers of Pens fans is pretty confused about the lack the of any kind of posts about the recent Pens massacre of the Devils. Actually, * I am bit puzzled too and a bit relieved. However, I am going to put an end $ To non-pittsburghers ' relief with a bit of praise for the Pens. Man, theyPanax Notoginseng Is killing those devils worse than I thought. Jagr just showed - He is much better than he regular season stats. He is also a lot the fo fun-to-watch in the playoffs. Bowman should let Jagr has a lot of + Fun on the next couple of games since the Pens is going to beat the pulp out of Jersey anyway. I was very disappointed don't to see the Islanders lose the final A regular season game. PENS RULE!!! the " " +  - " " $ 2 Split data section $ " " -X_train, X_test, y_train, y_test =train_test_split (News.data, - News.target, thetest_size=0.25, -Random_state=33)Wuyi  the " " - 3 Bayesian classifier predicts news Wu " " - #convert text to features AboutVEC =Countvectorizer () $X_train =vec.fit_transform (X_train) -X_test =vec.transform (x_test) - #Initialize naive Bayesian model -MNB =MULTINOMIALNB () A #Training set, estimating parameters + Mnb.fit (X_train, Y_train) the #prediction of a test set to save forecast results -Y_predict =mnb.predict (x_test) $  the " " the 4 Model Evaluation the " " the Print("Accuracy rate:", Mnb.score (X_test, y_test)) - Print("Other indicators: \ n", Classification_report (Y_test, Y_predict, target_names=news.target_names)) in " " the accuracy rate: 0.8397707979626485 the Other indicators: About Precision recall F1-score support the  the alt.atheism 0.86 0.86 0.86 201 the comp.graphics 0.59 0.86 0.70 + Comp.os.ms-windows.misc 0.89 0.10 0.17 248 - comp.sys.ibm.pc.hardware 0.60 0.88 0.72 the Comp.sys.mac.hardware 0.93 0.78 0.85 242Bayi comp.windows.x 0.82 0.84 0.83 263 the Misc.forsale 0.91 0.70 0.79 257 the Rec.autos 0.89 0.89 0.89 238 - Rec.motorcycles 0.98 0.92 0.95 276 - Rec.sport.baseball 0.98 0.91 0.95 251 the Rec.sport.hockey 0.93 0.99 0.96 233 the Sci.crypt 0.86 0.98 0.91 238 the sci.electronics 0.85 0.88 0.86 249 the sci.med 0.92 0.94 0.93 245 - sci.space 0.89 0.96 0.92 221 the Soc.religion.christian 0.78 0.96 0.86 232 the talk.politics.guns 0.88 0.96 0.92 251 the talk.politics.mideast 0.90 0.98 0.94 23194 Talk.politics.misc 0.79 0.89 0.84 188 the Talk.religion.misc 0.93 0.44 0.60 158 the  the avg/total 0.86 0.84 0.82 471298 " "

Machine learning Path: Python naive Bayesian classifier Predictive news category

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.