Ideas and code are derived from the following two articles:
A naively fat man: Python doing text emotional polarity analysis of emotional analysis
Ran Fengzheng Blog: text affective polarity analysis based on affective dictionaries related code
Affective analysis based on Affective dictionary should be the simplest method of affective analysis, roughly speaking, the idea of emotional analysis using affective dictionaries:
For document segmentation, find out the emotional words, negative words and degree adverbs in the document, then determine whether there are negative words and degree adverbs before each affective word, and divide the negative words and degree adverbs before it into a group, if there is a negative word to multiply the emotional weights of emotional words by 1, if there is a degree of adverbs multiplied by the degree of adverbs of degree value, Finally, the scores of all the groups added up to 0 of the attribution positive, less than 0 attributed to the negative.
Get ready:
1.BosonNLP Dictionary of Emotions
Since it is based on the analysis of emotional dictionaries, of course, need a dictionary containing all the emotional words, online already available, direct download can be.
Https://bosonnlp.com/dev/resource
From the download of the file, casually stuck a few positive emotional words, the number of words behind the word is the emotional value of emotional words, generally positive are positive, negative is a negative:
Colorful 1.87317228434
1.87321290817
subtle 1.87336937803
178.00 1.87338705728
hardworking 1.87338705728
Bulgaria 1.87338705728
Note: Since BOSONNLP is an emotional dictionary built on the basis of data sources such as microblogs, news, and forums, it may not be good to analyze other categories of text
There is also a way to set the emotional value of all emotional words to 1 to calculate, want to learn more about the article can be referred to:
Text affective Classification (i): Traditional models
2. Dictionary of negative words
Text affective Classification (i): The traditional model provides an emotional polar dictionary download package with a negative word txt.
No, no, no, no, no, no, no, no, no, no, no, no,
trip
, never, never. No, no, no,
No, no, no, no, no, no, no, no,
stop
, abort.
The
lack of opposition
is not the need not to do not
have no
Wood has
not No no no no no no no no never
never
No, no, no,
No, no,
never
.
Eph
3. degree adverb dictionary
degree adverbs such as: very, very, special ... and other words
The original Bo provides the "know the net" emotional analysis with the words set (Beta version) of the download link, the dictionary contains the degree of adverbs have corresponding degree of value, but downloaded down to find only the degree of adverbs, and no corresponding degree of value.
From the degree level Word. txt to select a part of the degree of adverbs, you can see only the degree of words, there is no degree of value, this time to see their own situation to assign a value good:
Chinese level words 219
1. "Extremely |extreme/most |most" the full and the full of the full of
joy and
Utter The
full end
is very, very extremely
Extreme
, extremely
The format after the change is as follows, degree adverbs and degree values are separated by commas, the degree value can be defined by themselves:
100%, 2
doubly, 2,
2
, 2
unbearable, 2 full
, 2 happy
, 2 full
, 2
downright, 2
...
4. Stop Word dictionary
Data Hall download is always not open, so the original Bo in the data provided in the Chinese stop word download is not downloaded, and then used the SNOWNLP source of the Stop Word dictionary, but later found that some emotional words were used as a stop word
Data Hall deactivate word download: http://www.datatang.com/data/43894
SNOWNLP Source: HTTPS://GITHUB.COM/ISNOWFY/SNOWNLP (stop word in Snownlp/normal folder Stopwords.txt)
5. Word Segmentation Tools
Because of using Python, Jieba participle was selected
The data and tools are ready, and now it's time to start the emotional analysis.
Here's a simple sentence: I'm very happy and happy today.
(1) Participle, remove the stop word
I, today, also be used as a stop word removed, left very, happy, very, happy
def seg_word (sentence): "" "
using Jieba to document Participle" "
seg_list = jieba.cut (sentence)
seg_result = [] for
W in seg _list:
seg_result.append (W)
# Read Deactivate Word file
stopwords = set ()
fr = Codecs.open (' stopwords.txt ', ' R ', ' Utf-8 ')
for Word in fr:
Stopwords.add (Word.strip ())
fr.close ()
# Remove Deactivate word return
list ( Lambda x:x not in Stopwords, Seg_result))
(2) Convert the result of participle into a dictionary, key for the word, value for the word in the result of the index, then think of a problem, if the word as key if an emotional word in the text appeared several times, then it should be only recorded the last occurrence of this word position, the other is covered off.
Convert the result of the last step to a dictionary:
{' Very ': 0, ' happy ': 1, ' very ': 2, ' Happy ': 3}
def list_to_dict (word_list): "" "" "to the word
after the list into a dictionary, key for words, value is the index of the word in the list, the index corresponds to the position of words in the document" "" ""
data = {}
For x in range (0, Len (word_list)):
data[word_list[x]] = x return
data
(3) Classify the result of Word segmentation, find out affective words, negative words and degree adverbs
Emotional word Sen_word (happy and happy, key for the index of the word, value for the emotional weight):
{1: ' 1.48950851679 ', 3: ' 2.61234173173 '}
Degree adverb Degree_word (very and very, key is index, value is degree value)
{0: ' 1.75 ', 2: ' 2 '}
Negative words Not_word, because no negative words appear, so the negative words are empty:
{}
def classify_words (word_dict): "" "word classification, find out affective words, negative words, Degree adverbs" "# Read emotion dictionary file Sen_file = open (' Bosonnlp_sentiment_score.
TXT ', ' r+ ', encoding= ' Utf-8 ') # Get dictionary file contents sen_list = Sen_file.readlines () # Create emotion Dictionary sen_dict = defaultdict () # read every line of the dictionary file, convert it to a Dictionary object, key is an affective word, value is the corresponding score for s in sen_list: # Each row is divided by a space, index 0 is an emotional word, index 1 is an emotional score (a line in the emotional lexicon file is Empty line, so when the execution of the error, pay attention to processing a blank line, there is no processing) sen_dict[s.split (') [0]] = s.split (') [1] # Read negative Word file Not_word_file = Open (' NotDic.txt ', ' r+ ', encoding= ' Utf-8 ') # because the negative word is only a word, no score, use list to Not_word_list = Not_word_file.readlines () read degree adverb file Degree_file = open (' Degree.txt ', ' r+ ', ' encoding= ' utf-8 ') degree_list = Degree_file.readlines () degree_ DIC = Defaultdict () # degree adverbs, like affective word processing, turn to degree adverb Dictionary object, key as degree adverb, value for corresponding degree value for D in Degree_list:degree_dic[d.s Plit (', ') [0]] = d.split (', ') [1] # classification result, Word index as key, Word score as value, negative word score to 1 Sen_word = Dict () Not_word = Dict () Degree_word = dicT ()
(4) Calculated score
First set the initial weight of W to 1, starting from the first affective word, using the weight w* The affective value of the emotional word as a score (with score record), and then to determine whether there is a degree of adverbs and negative words between the next affective word, if there is a negative word will be w*-1, if there is degree of adverb, w* degree of adverb degree At this time of the W as the traversal of the next emotional word weight value, cycle until all the emotional word traversal, each traversal process of the score score combined is the sum of the document's emotional score.
def socre_sentiment (Sen_word, Not_word, Degree_word, Seg_result): "" "" "" "" # Weights initialized to 1 W = 1 score = 0 # The emotional word subscript initialization Sentiment_index =-1 # The position of the emotional word subscript set sentiment_index_list = List (Sen_word.keys ()) # Traversal result (Traverse participle result is to locate the degree of adverbs and negatives between two affective words for the I in range (0, Len (seg_result)): # If it is an emotional word (based on whether the subscript is judged in the result of the affective Word classification) if I in Sen_wor
D.keys (): # weighted * Emotional Word score score + + W * FLOAT (sen_word[i)) # add 1 to the emotional word, get the next emotional word position Sentiment_index + + 1 if Sentiment_index < Len (sentiment_index_list)-1: # to judge the current affective word and the next affective word Is there a degree adverb or a negative word for j in range (Sentiment_index_list[sentiment_index], Sentiment_index_list[sentiment_index +
1]): # Update weights, if there are negative words, take the reverse if J in Not_word.keys (): W *=-1
Elif J in Degree_word.keys (): # Update weights, if there are degree adverbs, score times degree of adverb degree score W *= Float (degree_worD[J]) # Navigate to the next affective word if Sentiment_index < Len (sentiment_index_list)-1:i = sentiment_index_l Ist[sentiment_index + 1] return score
W=1
Score=0
The first emotional word is happy, happy emotional weight value for 1.48950851679,score=w* emotional weight =1*1.48950851679=1.48950851679
Happy and the next emotional word happy between the degree of the adverb is very, the degree of value of 2, so w=w*2=1*2=2, and then get the next emotional word
The next emotional word is happy, at this time w=2,score=score+2*2.61234173173=1.48950851679+2*2.61234173173=6.71419198025
Traversal end
Two questions are also found here:
(1) The degree adverbs and negative words appearing before the first affective word were neglected
(2) When judging the occurrence of negative words and degree adverbs between two affective words, W is not initialized to 1, so w is tired by
Interested can be modified ~
Complete code:
From collections import defaultdict import OS import re import jieba import codecs def seg_word (sentence): "" "Use Jieba To document Participle "" "seg_list = jieba.cut (sentence) Seg_result = [] for W in Seg_list:seg_result.append (w) # Read Deactivate Word file Stopwords = set () FR = Codecs.open (' stopwords.txt ', ' r ', ' Utf-8 ') for word in fr:stopwords.ad D (Word.strip ()) Fr.close () # Remove deactivated word return list (filter (lambda x:x not in Stopwords, Seg_result)) def Classif Y_words (word_dict): "" "" "word classification, find out affective words, negative words, degree adverbs" "" # Reading emotion dictionary file Sen_file = open (' Bosonnlp_sentiment_score.txt ', ' r+ ', encoding= ' Utf-8 ') # Get dictionary file contents sen_list = Sen_file.readlines () # Create emotion Dictionary sen_dict = defaultdict () # Read Dictionary Each line of the file, convert it to a Dictionary object, key for the emotional word, value for the corresponding score for s in sen_list: # each row content according to the Space Division, index 0 is the emotion Word, Index 01 is the emotion score Sen_dict[s.sp Lit (') [0]] = s.split (') [1] # Read negative Word file Not_word_file = open (' NotDic.txt ', ' r+ ', encoding= ' Utf-8 ') # because negative words only Word, no score, use list to Not_Word_list = Not_word_file.readlines () # reading degree adverb file Degree_file = open (' Degree.txt ', ' r+ ', encoding= ' utf-8 ') deg
Ree_list = Degree_file.readlines () Degree_dic = Defaultdict () # degree adverbs and affective word processing way, to degree adverb Dictionary object, key is degree adverb, value is corresponding degree value For D in Degree_list:degree_dic[d.split (', ') [0]] = d.split (', ') [1] # classification result, Word index as key, Word score as value, negative word score
Set to-1 Sen_word = Dict () Not_word = Dict () Degree_word = Dict () # category for Word in Word_dict.keys (): If Word in Sen_dict.keys () and Word isn't in not_word_list with Word not in Degree_dic.keys (): # Find out the word in the emotion dictionary The word Sen_word[word_dict[word]] = Sen_dict[word] Elif word in not_word_list and word not in DEGREE_DIC.K
Eys (): # Word in the negative word list not_word[word_dict[word] =-1 elif word in Degree_dic.keys (): # the word Degree_word[word_dict[word] = Degree_dic[word] Sen_file.close () Degree_file in the degree adverb in the result of the participle. Close () Not_word_filE.close () # returns the classification result to return Sen_word, Not_word, Degree_word def list_to_dict (word_list): "" "" "to the word after the list into a dictionary, key for single
Word, value is the index of the word in the list, which corresponds to the position of the word appearing in the document "" "Data = {} for x in range (0, Len (word_list)): data[word_list[x]] = X
Return Data def get_init_weight (Sen_word, Not_word, Degree_word): # Weights initialized to 1 W = 1 # Converts the key of the Affective dictionary to list Sen_word_index_list = List (Sen_word.keys ()) If Len (sen_word_index_list) = = 0:return W # Gets the subscript for the first emotion word, traversing from 0 to
All the words between this position, find the degree word and the negative word for I in range (0, sen_word_index_list[0]): If I in Not_word.keys (): W *=-1 Elif i in Degree_word.keys (): # Update weights, if there are degree adverbs, score times degree of adverb degrees value W *= float (degree_word[i) re Turn W def socre_sentiment (Sen_word, Not_word, Degree_word, Seg_result): "" "" "" "" # Weights initialized to 1 W = 1 Scor E = 0 # affective word subscript Initialization Sentiment_index =-1 # Position of the emotional word subscript set sentiment_index_list = List (Sen_word.keys ()) # Traversal participle Result (traversal participle result is to position the degree between two affective words)Adverbs and negatives) for I in range (0, Len (seg_result)): # If it is an emotional word (judged by subscript whether in the result of affective Word classification) if I in Sen_word.keys (): # weight * Emotional Word score score + = W * Float (sen_word[i]) # The emotional word subscript plus 1 to get the next emotional word position Sentiment_inde
X + 1 if Sentiment_index < Len (sentiment_index_list)-1: # To determine whether there is a degree of adverb or negation between the current affective word and the next affective word
For j in range (Sentiment_index_list[sentiment_index], Sentiment_index_list[sentiment_index + 1]):
# Update weights, if there are negative words, take the reverse if J in Not_word.keys (): W *=-1 Elif J in Degree_word.keys (): # Update weights, if there is degree adverb, score times degree of adverb degree score W *= float (d EGREE_WORD[J]) # Navigate to the next affective word if Sentiment_index < Len (sentiment_index_list)-1:i = Sentimen T_index_list[sentiment_index + 1] return score # COMPUTE score def setiment_score (SENTENTCE): # 1. To document participle seg_list = seg _word (SENTENTCE) # 2Convert the list of Word segmentation results to dic, and then find affective words, negative words, degree adverbs sen_word, not_word, Degree_word = Classify_words (list_to_dict (Seg_list)) # 3. Calculate score Score = Socre_sentiment (Sen_word, Not_word, Degree_word, seg_list) return score # Test print (Setiment_score) ("I am very happy today Very happy "))