--sentiment Analysis of Kaggle contest questions on Movie reviews

Source: Internet
Author: User

Classify the sentiment of sentences from the Rotten Tomatoes dataset

Title Link: https://www.kaggle.com/c/sentiment-analysis-on-movie-reviews

More and more like Ipython notebook. All of the following work can be done on one page, and Firefox support is better than chrome.

Datasets are divided into TRAIN.TSV and TEST.TSV. fields are delimited by \ t, and each row has four fields: Phraseid,sentenceid,phrase,sentiment.

Emotional identity:

0-negative
1-somewhat Negative
2-neutral
3-somewhat Positive
4-positive

Import Pandas as Pddf = Pd.read_csv (' train.tsv ', header=0,delimiter= ' \ t ') df.info () <class ' Pandas.core.frame.DataFrame ' >int64index:156060 entries, 0 to 156059Data columns (total 4 columns):P Hraseid      156060 non-null int64sentenceid    156060 non-null int64phrase        156060 non-null objectsentiment     156060 Non-null Int64dtypes:int64 (3), object (1)


<textarea tabindex="0" style="position:absolute; padding-top:0px; padding-left:0px; width:1px; height:1em; outline:none medium"></textarea>
DF. Head ()
OUT[6]:
phraseid sentenceid Phrase sentiment
0 1 1 A series of escapades demonstrating the adage ... 1
1 2 1 A series of escapades demonstrating the adage ... 2
2 3 1 A Series 2
3 4 1 A 2
4 5 1 Series 2
in []:d F. Sentiment.value_counts ()/df. Sentiment.count () out[13]:2    0.5099453    0.2109891    0.1747604    0.0589900    0.045316dtype:float64
Test the classification accuracy directly with the first 5 lines of the training set:
X_train = df[' Phrase ']y_train = df[' sentiment ']import numpy as Npfrom sklearn.feature_extraction.text import Tfidftransformerfrom sklearn.pipeline Import pipelinefrom sklearn.linear_model Import LOGISTICREGRESSIONTEXT_CLF = Pipeline (' Vect ', Countvectorizer ()), ('                     tfidf ', Tfidftransformer ()),                     (' CLF ', logisticregression ()                      ), ]) TEXT_CLF = Text_clf.fit (x_train,y_train) x_test = Df.head () [' Phrase ']predicted = text_clf.predict (x_test) print Np.mean (predicted = = Df.head () [' sentiment ']) for phrase, sentiment in zip (X_test, predicted):    print ('%r =%s '% (p Hrase, sentiment))
Classification accuracy and results:

0.8 ' A series of escapades demonstrating the adage that's good for the goose are also good for the gander, some of WH Ich occasionally amuses but none of the which amounts to much of a story. ' + 3 ' a series of escapades demonstrating the ad Age that's good for the goose ' + 2 ' a series ' = + 2 ' A ' + 2 ' series ' = 2
Df.head () [' sentiment ']0    2
The first classification error.
Test Data set:
TEST_DF = pd.read_csv (' test.tsv ', header=0,delimiter= ' \ t ') test_df.info () <class ' Pandas.core.frame.DataFrame ' >int64index:66292 entries, 0 to 66291Data columns (total 3 columns):P Hraseid      66292 non-null Int64sentenceid    6 6292 non-null int64phrase        66292 non-null objectdtypes:int64 (2), object (1)
Use a well-trained model to classify test data sets:

From numpy Import savetxtx_test = test_df[' Phrase ']phraseids = test_df[' Phraseid ']predicted = text_clf.predict (x_test) pred = [[index+156061,x] for index,x in enumerate (predicted)]savetxt ('.. /submissions/lr_benchmark.csv ', pred,delimiter= ', ', fmt= '%d,%d ', header= ' phraseid,sentiment ', comments= ')
Submit Result:

Reference: http://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html

--sentiment Analysis of Kaggle contest questions on Movie reviews

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.