Python word cloud picture (electric Business Review data)

Source: Internet
Author: User

Recently collected the day Cat search page about the windbreaker baby information and 14,676 comments data. So I thought about making a word cloud about the comment data.
Let's take a look at the effect chart:


From the above picture can see clothes, good, quality, good, like the larger keywords, that the baby in the day cat should be sold better. The specific implementation process is documented below.
[TOC] Data preview:


Python version 3.4.4: jieba; Word cloud Gallery: Pytagcloud data preprocessing: Data reading, participle

Import pandas as PD from
pandas import dataframe,series
import numpy as NP
import Jieba from
Ciyuntu_clas S import ciyuntu_class
cyt = Ciyuntu_class () from
pytagcloud import make_tags,create_tag_image from
random Import Sample
comment = Pd.read_csv (' Dealer review data. csv ', encoding= ' GBK ')
there should be more than 1W of comment data, the words of the whole part of the words will take a lot of time, Thus, 5,000 comments were randomly sampled in all comments data into comment = Comment.drop ([' unnamed:0 '],axis=1)
# df_comment = comment[[' evaluation content ']].ix[0:30 ]
index_5000 = sample (List (Comment.index), 5000)
df_comment = comment[[' evaluation content ']].ix[index_5000]
df_ Comment.index = Range (Df_comment.shape[0])
participle
# participle
df_freq = Cyt.fenci (df_comment.ix[0][0])
# converted to Data box
Df_freq = cyt.sta_list (df_freq) for
I in range ( Df_comment.shape[0]) [1:]:
    print (i)
    try:
        df_freq0 = Cyt.fenci (df_comment.ix[i][0])
        df_freq0 = Cyt.sta_list (df_freq0)
        # merge data box
        Df_freq = CYT.BIND_DF (df_freq,df_freq0)
    except:
        print (df_ COMMENT.IX[I][0]) Pass
        
picture of the word cloud
# picture Word cloud
tuple_list = cyt.df2tuple_list (df_freq)
tags = make_tags (tuple_list,maxsize=80)
Create_tag_ Image (tags, ' comment_cloud.png ', size= (900,600), fontname= ' Simhei ')
Summary:

First, 1000 comments were selected and the word cloud was drawn:
You can see some neutral words, like: very, very, all, buy, also, these words can not reflect the attitude of people, but their word frequency is relatively large, the results have interference, so in the process of participle to remove these words, the following is removed after the effect of the map:
It can be seen that after the deletion of the theme of the word cloud and comment on the attitude of the information are better displayed in the figure.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.