Text sentiment classification

Source: Internet
Author: User
Tags comments html tags regular expression idf
movie text sentiment classification

GitHub Address
Kaggle Address

This task is mainly to the film review text emotional classification, mainly divided into positive comments and negative comments, so is a two classification problem, two classification model we can choose some common models such as Bayesian, logistic regression, one of the challenges here is the vectorization of textual content, therefore, We first try to TF-IDF based on the Vectorization method, and then try to Word2vec.

#-*-Coding:utf-8-*-
import pandas as PD
import numpy as NP
import re from
BS4 import beautifulsoup
  
   def review_to_wordlist (review):
    "
    to turn the review of IMDB into a word sequence
    reference: http://blog.csdn.net/longxinchen_ml/article/ details/50629613
    '
    # Remove HTML tags, get content
    Review_text = BeautifulSoup (review, "Html.parser"). Get_text ()
    # Take the regular expression out of the conforming section
    Review_text = Re.sub ("[^a-za-z]", "", Review_text)
    # lowercase all words and turn into words list
    words = Review_text.lower (). Split ()
    # returns words return
    words
  
Load Data Set
# load data Set train = Pd.read_csv ('/users/frank/documents/workspace/kaggle/dataset/bag_of _WORDS_MEETS_BAGS_OF_POPCORN/LABELEDTRAINDATA.TSV ', header=0, delimiter= "\ T", quoting=3) test = pd.read_csv ('/Users/ FRANK/DOCUMENTS/WORKSPACE/KAGGLE/DATASET/BAG_OF_WORDS_MEETS_BAGS_OF_POPCORN/TESTDATA.TSV ', header=0, delimiter= \ t ", quoting=3) print train.head () print test.head () 
         ID  sentiment                                             review
0  "5814_8"          1 "with all this  stuff going in the moment ...
1  "2381_9"          1  "\" The Classic War of the Worlds\ "by Timothy
... 2  "7759_3"          0  "The film starts with a manager (Nicholas Bell ...
3  "3630_4"          0  "It must is assumed that those who praised Thi ...
4  "9495_8"          1  "superbly trashy and wondrously unpretentious ...
           ID                                             Review
0  "12311_10"  "naturally in a film who's main themes
are of ... 1    "8348_2"  "This movie is a disaster within a disaster fi ...
2    "5828_4" "All", the This  is a movie for kids. We saw ...
3    "7186_2"  "afraid of the Dark left me with the Impressio ...
4   "12128_7"  "A very accurate depiction of small time mob l ...
preprocessing Data
# preprocessing data label = train[' sentiment '] train_data = [] for i in range (len (train[' review ')): Train_data.append (". Join" (Review_to_wordlist (train[' review '][i]))) Test_data = [] for i in range (len (test[' review '])) : Test_data.append ('. Join (Review_to_wordlist (test[' review '][i])) # Preview Data Print train_data[0], ' \ n ' Print test_data[ 0] 
With all this stuff going under the moment with MJ I ve started listening to he music watching the odd documentary here And there watched the Wiz and watched Moonwalker again maybe I just want to get a certain insight into this guy I tho Ught was really cool on the eighties just to maybe make up my mind whether he's guilty or innocent Moonwalker is part bio  Graphy part feature film which I remember going to see at the cinema when it is originally released some of it has subtle  Messages about MJ s feeling towards the press and also the obvious message of drugs is bad m Kay visually impressive Of course this was all on Michael Jackson so unless you remotely like MJ in anyway then you were going to hate this and Find it boring some may call MJ a egotist for consenting to the making of this movie but MJ and most of his fans would SA Y that he made it for the fans which if true was really nice's him the actual feature film bit when it finally starts was O Nly on for minutes or so Excluding the smooth criminal sequence and Joe Pesci are convincing as a psychopathic all powerful drug lord why he wants MJ dead So bad are beyond me because MJ overheard his plans nah Joe Pesci s character ranted the He wanted people to know It's he who's supplying drugs etc so I dunno maybe he just hates MJ s music lots of cool things in this like MJ turning into a car and a robot and the whole speed demon sequence also the director must has had the patience of a saint when it Came to filming the kiddy bad sequence as usually directors hate working with one kid let alone a whole bunch of them perf Orming a complex dance scene bottom line This movie was for people who like MJ on one level or another which I think are MOS T people if not then stay away it does try and give off a wholesome message and ironically MJ S bestest Buddy in this movi E is a girl Michael Jackson was truly one of the most talented people ever to grace this planet it is he guilty well with All the attention i ve gAve This subject hmmm well I don t know because people can being different behind closed doors I know this for a fact he's E Ither an extremely nice but stupid guy or one of the most sickest liars I hope he's not the latter naturally in a film w Ho S main themes is of mortality nostalgia and loss of innocence it's perhaps not surprising it's rated more HIGHL Y by older viewers than younger ones however there are a craftsmanship and completeness to the film which anyone can enjoy The pace is steady and constant the characters full and engaging the relationships and interactions natural showing that Y ou do not need floods of tears to show emotion screams to show fear shouting to show dispute or violence to show anger NAT Urally Joyce S short stories lends the film a ready made structure as perfect as a polished diamond but the small changes Hu Ston makes such as the inclusion of the poem fit in neatly it's truly a masterpiece of tact subtlety and overwhelming Bea Uty
feature processing

Directly to the computer these word text, the computer is not calculated, so we need to convert the text to vectors, there are several common text vector processing methods, such as: Word Count
TF-IDF Vector
Word2vec Vector
Let's try it first with TF-IDF.

From Sklearn.feature_extraction.text import Tfidfvectorizer as TFIDF
# Reference: http://blog.csdn.net/longxinchen_ml/ article/details/50629613
TFIDF = TFIDF (min_df=

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.