4 pickle files have been generated, respectively, for documents,word_features,originalnaivebayes5k,featurests
Where featurests capacity is the largest, more than 300 trillion, if the expansion of 5000 feature set, capacity continues to expand, accuracy also provides
https://www.pythonprogramming.net/sentiment-analysis-module-nltk-tutorial/
Creating A module for sentiment analysis with NLTK
#-*-Coding:utf-8-*-"" "Created on Sat Jan 09:59:09 2017@author:daxiong" "" #File: Sentiment_mod.pyimport nltkimport R Andomimport picklefrom nltk.tokenize Import word_tokenizedocuments_f = open ("Documents.pickle", "RB") documents = Pickle.load (Documents_f) documents_f.close () Word_features5k_f = open ("Word_features5k.pickle", "RB") Word_features = Pickle.load (Word_features5k_f) word_features5k_f.close () def find_features (document): words = word_tokenize (document ) features = {} for W in word_features:features[w] = (w in words) return Featuresfeaturesets_f = open ("Fe Aturesets.pickle "," RB ") Featuresets = Pickle.load (Featuresets_f) featuresets_f.close () random.shuffle (featuresets) Print (len (featuresets)) Testing_set = Featuresets[10000:]training_set = Featuresets[:10000]open_file = Open (" Originalnaivebayes5k.pickle "," RB ") classifier = Pickle.load (open_file) open_file.close () def sentiment (text): feats = Find_features (text) return classifier.classify (feats) print (sentiment ("This movie is awesome! The acting was great, plot was wonderful, and there were pythons...so yea! ")) Print (Sentiment ("This movie is utter junk. There were absolutely 0 pythons. I don ' t see how the point is at all. Horrible movie, 0/10 "))
Test effect
Still more accurate, the movie is good test is not allowed, it seems to improve the algorithm, consider using frequency analysis and filter garbage words to improve the accuracy rate
Nltk31_twitter sentiment analysis