1-3 climbing the heat of the movie theme on Weibo (number of readings and discussion of topics)

Source: Internet
Author: User

1 weiboheat.py2 #-*-coding:utf-8-*-3 " "4 The script can crawl popular movie information from the WAP version of the microblogging site,5 In particular, the number of film topics discussed and the number of readings6 " "7 ImportJSON8 ImportRequests9  fromPandasImportDataFrameTen Import Time Oneheaders = {'user-agent':'mozilla/5.0 (Windows NT 10.0; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/45.0.2454.101 safari/537.36'} AI=1#the regular item in the URL -Movies=[]#Initialize movie list -Csvname='Wh_allmovies.csv'  #The name of the file that will be entered theCARDS=[1]#For cold start, set the cards list to non-empty - #This is a dynamic load problem, and when you can get content from a Web page, you keep looping -  while(cards!=[]): -     Try: +         ifI==1: -j=2 +         Else: Aj=0 atUrl='http://m.weibo.cn/page/pageJson?containerid=&containerid='  -             '100803_ctg1_100_-_page_topics_ctg1__100&luicode=10000011&lfid='  -             '100808d35a54c4ae10c8311e64ae96c776f206&v_p=11&ext=&fid='  -             '100803_ctg1_100_-_page_topics_ctg1__100&uicode='  -             '10000011&next_cursor=&page='+Str (i) -Resp=requests.get (url,headers=headers) inTime.sleep (0.1) -Content=json.loads ((Resp.text). Decode ('ASCII'). Encode ('Utf-8'))#the Text property of the response is data in JSON format to         #by analyzing the content of JSON-formatted text, we find the law ######### +cards=content['Cards'] -card=Cards[j] thecard_group=card['Card_group'] *         ############################################ $Movies=movies+card_group#A list of 10 movie messages that are card_group for each cyclePanax Notoginseng         #Add the list to the movies list, and each of the card_group is a dictionary that contains various information about the movie -         PrintI*10#use as Tag theI+=1#each cycle I plus 1 +     except: A         Print 'Error' the     finally: +MOVIES_DF = DataFrame (Movies)#each cycle converts the movies list to a dataframe format file, which is then deposited into the file -         #df1 = DataFrame ({' title ': movies_df.ix[:, ' card_type_name '], ' heat ': movies_df.ix[:, ' desc2 '), $         #' scheme ': movies_df.ix[:, ' scheme ', $         #' pic ': movies_df.ix[:, ' pic ']}) -Movies_df.to_csv (Csvname, Index=false, encoding='Utf-8')
1 weiboheat_treatment.py2 #-*-coding:utf-8-*-3 " "4 the script can be processed for the resulting weiboheat.csv file5 Add a movie topic discussion number Discussnum, topic reading Readnum, and the number of heat points obtained by reading6 " "7 ImportPandas as PD8  fromPandasImportDataFrame9Df=pd.read_csv ('Wh_allmovies.csv')Ten #remove the desired column and add a custom column name OneDf1=dataframe ({'title':d f.ix[:,'Card_type_name'],'Heat':d f.ix[:,'DESC2'], A                'Scheme':d f.ix[:,'Scheme'], -                'pic':d f.ix[:,'pic']}) - #Remove the Heat column from the DATAFRAME data structure theheat=df1.ix[:,'Heat'] -  - #function: Converts a string like ' 240 million reading ' into an int format 2400000000 - #Note: The input string is in Unicode encoded format + defGetnum (heat): -     ifU'billion' inchHeat: +Temp=list (heat)#Convert strings to list lists for easy subsequent deletion of Chinese character operations A Temp.pop () at Temp.pop () -Temp.pop ()#execute the statement three times and remove the string like ' billion reading ' -temp="'. Join (temp)#The remainder of the deleted Chinese is combined to get the str format string -Temp=float (temp) *100000000#first, the STR is converted to float format, multiplied by 100 million -     elifU'million' inchHeat: -temp =list (heat) in Temp.pop () - Temp.pop () to Temp.pop () +temp ="'. Join (temp) -temp = Float (temp) * 10000#Multiply by 10,000 the     Else: *temp =list (heat) $ Temp.pop ()Panax Notoginseng Temp.pop () -temp ="'. Join (temp) thetemp = Float (temp)#no need to multiply +     returnInt (temp)#converts the returned value to a number in int format A  the #function: According to the reading volume of the film, get the score of the film + defGetscore (i): -     ifI>=0 andi<100000000: $         return1 $     elifi>=100000000 andi<300000000: -         return2 -     elifi>=300000000 andi<500000000: the         return3 -     elifi>=500000000 andi<700000000:Wuyi         return4 the     elifi>=700000000: -         return5 Wu     Else: -         returnNone About  $Discussnum=[]#initialize a list of discussion series -Readnum=[]#Initialize reading list -Score_weibo=[]#Initialize the score list for the microblog heat -  forIinchRange (len (heat)): AHeat_i=heat[i]#remove each heat item +     #convert each heat item to Unicode encoding and divide by space into a list of length 2 theHeat_ilist= (Heat_i.decode ('Utf-8') . Split () -HEAT_DISCUSS=HEAT_ILIST[0]#the first item of list is a discussion number, like ' 2.758 million discussion ' $HEAT_READ=HEAT_ILIST[1]#the second item of list is reading number, like ' 1.3 billion reading ' theDiscussnum.append (Getnum (Heat_discuss))#after you call the Getnum function to format the conversion, add it to the list the readnum.append (Getnum (heat_read)) theScore_weibo.append (Getscore (Getnum (heat_read)))#Call the Getscore function to add the resulting score to the list theDf2=dataframe ({'Discussnum':d Iscussnum,'Readnum': Readnum,'Score_weibo': Score_weibo})#Get datafrme format -Df3=pd.concat ([Df1,df2],axis=1) inDf3.to_csv ('Wh_allmovies_discussreadscore.csv', Index=false)

1-3 Crawl The popularity of movie themes on Weibo (number of readings and discussions on topics)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.