R language do text mining Part5

Source: Internet
Author: User
Tags knowledge base

PART5 sentiment analysis

This is the last article in this series, in fact, the text mining every part of the single carry out is worth digging and careful study, I am still in the primary research stage, with R in the ready-made algorithm to achieve their own needs, of course, also refer to the wisdom of many netizens, so also want to summarize my harvest to share to everyone , and I hope I can be inspired by the way I look at everyone's share.

On the internet to turn over the Chinese text sentiment analysis of some articles, and then think of my own analysis of the method of emotion, I think my idea is really simple and direct. This is a paper which introduces the tendency of Chinese text sentiment analysis. Http://wenku.baidu.com/link?url=TVf5LgNS6esnunpgubvM14z24m0f4lTyD483gw_ HENP2RYEL6XZANSLZ8OCCZCFLWKLQD0PDBHVUCV4-0LOTDGP3HL-KQETTWJ3L91HFTA3, there are three main ways to do affective analysis in the middle. The first is the creation of the Affective Tendency dictionary by the existing electronic dictionary or the word Knowledge base; the second, unsupervised machine learning method. The third kind of learning method based on manual tagging corpus.

The above three methods are not carefully explained, they all have a common feature, need a corpus of emotional tendencies. My implementation in R is similar to the first method, to tidy up a commendatory term thesaurus with a derogatory thesaurus (this versatile internet has its own little tidying up OK). Make Word segmentation for text and extract the emotional words in the middle. Give each text a sentiment tendency score initial value is 1, match with commendatory derogatory thesaurus, commendatory term +1, derogatory-1, calculate the final sentiment inclination score of each text, positive value is positive, negative is negative evaluation. The method can basically realize the sentiment tendency judgment, but can also improve. As mentioned in the previous reference paper, can also be based on the word of speech strength to assess the strong feelings, not only +1 and 1 of the points; and consider some words in different contexts may be different emotional tendencies, such as the "pride" in the paper, which I think may need to sort out a special case of words , and negative positive circumstances, such as "Do not like it is impossible!" "And according to my scoring criteria it's the result of a negative evaluation; a rhetorical question," Where's the cheap? "The result is a positive. "Cheap" The word I put it under the commendatory term table, in fact, carefully consider if it is said that "cheap and affordable" is definitely commendatory, if said "cheap not good goods", it will be commendatory, this is not right, or the second problem in different contexts, emotional tendencies will be different.

The implementation process in R:

1. Data input Processing

Data is also a brand official micro, take it Weibo 1376 comments, sentiment commendatory term Library and derogatory library, read the data into R. With thesaurus: http://www.datatang.com/data/44317/, may not be very whole, need to organize rich, I look at the clothing related text, found some words like "faded", "Involute", "Show Thin", "fat" are not in the inside, These need to be added in addition to themselves.

> Hlzj.comment <-readlines ("Hlzj_commenttest.txt")

> Negative <-readlines ("D:\\r\\rworkspace\\hlzjworkfiles\\negative.txt")

> Positive <-readlines ("D:\\r\\rworkspace\\hlzjworkfiles\\positive.txt")

> Length (hlzj.comment)

[1] 1376

> Length (negative)

[1] 4477

> Length (positive)

[1] 5588

2. Word processing and rating of comments

The process is similar to the word processing described in Part2. Then I wrote a Method Getemotionaltype (), the results of the segmentation and negative table and positive table as a comparison of the calculation score.

> Commenttemp <-gsub ("[0-90123456789 < > ~]", "", Hlzj.comment)

> Commenttemp <-SEGMENTCN (commenttemp)

> Commenttemp[1:2]

[[1]]

[1] " Congratulations " " everyone " " and " " no " Find " " I 'm " 

[[2] "

 [1] " no " " private messages to "  " i small i "  " give "    " drain "  " "&NBSP;

> Emotionrank <-getemotionaltype (commenttemp,positive,negative)

[1] 0.073

[1] 0.145

[1] 0.218

[1] 0.291

[1] 0.363

[1] 0.436

[1] 0.509

[1] 0.581

[1] 0.654

[1] 0.727

[1] 0.799

[1] 0.872

[1] 0.945

> Emotionrank[1:10]

[1] 1 0 2 1 1 2 3 1 0 0

> Commentemotionalrank <-list (rank=emotionrank,comment=hlzj.comment)

> Commentemotionalrank <-as.data.frame (Commentemotionalrank)

> Fix (Commentemotionalrank)

Getemotionaltype <-Function (x,pwords,nwords) {    emotiontype <-numeric (0)    Xlen <-length (x)    emotiontype[1:xlen]<-0    Index <-1 while    (index <=xlen) {        Ylen <-length (X[[index]])        Index2 <-1 while        (index2<= ylen) {           if (length (Pwords[pwords==x[[index]][index2])) >= 1) {               Emotiontype[index] <-Emotiontype[index] + 1            }elseif (Length (Nwords[nwords==x[[index]][index2]]) >= 1) {               Emotiontype[index] <-Emotiontype[index]-1            }            index2<-Index2 + 1        }        #获取进度       if (index%%100==0) {        print (round (index/xlen,3)        }              Index <-index +1    }    Emotiontype}

See the results below, the first figure looks quite normal, and the second figure seems to be a comment from the HLZJ-sponsored RM when the clothes are torn. Not black their home, just want to find an example to illustrate the effect of poor evaluation, it seems not very ideal. Those rhetorical questions can not be identified and judged, there are some more colloquial "drunk", "too times" such words are not put into the emotional thesaurus, the emotional orientation of these comments is not very good recognition effect.


As I said before, the method needs to be improved, my method is only one of the most basic analysis of the realization of emotion, there are any questions welcome to correct.

Reprint please indicate the source, thank you!

R language do text mining Part5

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.