Recently collected the day Cat search page about the windbreaker baby information and 14,676 comments data. So I thought about making a word cloud about the comment data.
Let's take a look at the effect chart:
From the above picture can see clothes, good, quality, good, like the larger keywords, that the baby in the day cat should be sold better. The specific implementation process is documented below.
[TOC] Data preview:
Python version 3.4.4: jieba; Word cloud Gallery: Pytagcloud data preprocessing: Data reading, participle
Import pandas as PD from
pandas import dataframe,series
import numpy as NP
import Jieba from
Ciyuntu_clas S import ciyuntu_class
cyt = Ciyuntu_class () from
pytagcloud import make_tags,create_tag_image from
random Import Sample
comment = Pd.read_csv (' Dealer review data. csv ', encoding= ' GBK ')
there should be more than 1W of comment data, the words of the whole part of the words will take a lot of time, Thus, 5,000 comments were randomly sampled in all comments data into comment = Comment.drop ([' unnamed:0 '],axis=1)
# df_comment = comment[[' evaluation content ']].ix[0:30 ]
index_5000 = sample (List (Comment.index), 5000)
df_comment = comment[[' evaluation content ']].ix[index_5000]
df_ Comment.index = Range (Df_comment.shape[0])
participle
# participle
df_freq = Cyt.fenci (df_comment.ix[0][0])
# converted to Data box
Df_freq = cyt.sta_list (df_freq) for
I in range ( Df_comment.shape[0]) [1:]:
print (i)
try:
df_freq0 = Cyt.fenci (df_comment.ix[i][0])
df_freq0 = Cyt.sta_list (df_freq0)
# merge data box
Df_freq = CYT.BIND_DF (df_freq,df_freq0)
except:
print (df_ COMMENT.IX[I][0]) Pass
picture of the word cloud
# picture Word cloud
tuple_list = cyt.df2tuple_list (df_freq)
tags = make_tags (tuple_list,maxsize=80)
Create_tag_ Image (tags, ' comment_cloud.png ', size= (900,600), fontname= ' Simhei ')
Summary:
First, 1000 comments were selected and the word cloud was drawn:
You can see some neutral words, like: very, very, all, buy, also, these words can not reflect the attitude of people, but their word frequency is relatively large, the results have interference, so in the process of participle to remove these words, the following is removed after the effect of the map:
It can be seen that after the deletion of the theme of the word cloud and comment on the attitude of the information are better displayed in the figure.