Python crawl QQ space friends talk and generate word clouds (super detail)

Source: Internet
Author: User
Tags xpath

Objective

Look first:

Ideas

1. Confirm the URL of the visit
2. Demo Login your QQ number
3. Determine if the friend space is added permissions, switch to the said frame, crawl the current page data, pull down the scroll bar, page to continue to get crawled content written to the local TXT file
4. Crawl to the last page and read the TXT file to generate a word cloud

Specific analysis

1. Confirm the URL of the visit
This is very simple, we found through the observation, QQ space friend's URL:
https://user.qzone.qq.com/{friend QQ Number}/311

2. We will encounter a simulated login on request, that is, to login your QQ number via selenium+ browser to access your friend's QQ space
Here is the code for the impersonation login:

1File ='C:/users/administrator/desktop/{}.txt'. Format (QQ)2Driver =Webdriver. Firefox ()3Driver.maximize_window ()#window Maximization4 5Driver.get ('https://user.qzone.qq.com/{}/311'. Format (QQ))#URL6Driver.implicitly_wait (10)#implicit wait, in order to wait for the full load good URL7DRIVER.FIND_ELEMENT_BY_ID ('Login_div')8Driver.switch_to_frame ('Login_frame')#cut to the frame that entered the account password9DRIVER.FIND_ELEMENT_BY_ID ('Switcher_plogin'). Click ()##点击 ' account password login 'TenDRIVER.FIND_ELEMENT_BY_ID ('u'). Clear ()##清空账号栏 OneDRIVER.FIND_ELEMENT_BY_ID ('u'). Send_keys ('your QQ account')#Enter your account ADRIVER.FIND_ELEMENT_BY_ID ('P'). Clear ()#Empty Password Bar -DRIVER.FIND_ELEMENT_BY_ID ('P'). Send_keys ('your QQ password')#Enter Password -DRIVER.FIND_ELEMENT_BY_ID ('Login_button'). Click ()#click ' Login ' theDriver.switch_to_default_content ()#jump out of the current frame, this step is critical, do not write will be error, because you log in and then cut another frame

It is important to emphasize that driver.switch_to_default_content (), which means jumping out of the current frame, is critical because you have to cut another frame after logging in. If you do not write, the following error will appear:

3. The third part, I would say in several points:
(1). Determine if the space has been added permissions

1     Try : 2         driver.find_element_by_id ('qm_ownerinfo_icon')# determine if the QQ space is added to the permissions 3         b = True4     except:5         b = False

(2) switch to the said frame, this people will find it

(3) drop-down scroll bar
The dropdown scroll bar is to click on ' Next Page ', drop down to the visible view. Note the drop-down scroll bar:
Be sure to correspond to its frame, not to crawl the said frame down.

1 # Sub 4 This pull down, make sure to pull down to the bottom 2     for  in range (1, 5):3                     driver.execute_script ("Window.scrollby (0,5000)  ")4                     time.sleep (2)

(4) Crawl to say the data, this is simple? I used the XPath to get the said title, interested friends can put time and other data together to get

1                 selector = etree. HTML (Driver.page_source)2                 title = Selector.xpath ('//li/div/div/pre/text () ')

(5). Page turn
Just click on ' next Page '.

1 driver.find_element_by_link_text (u' next page '). Click ()

(6). txt data write, do not say more, crawl to the title of the direct write

1                  forIinchTitle:2                     if  notos.path.exists (file):3                         Print('Create txt Success')4 5With open (file,'A +') as F:6F.write (i +'\ n')7F.close ()

4. Generate word cloud, this is just the normal mode, want to know specifically can see my previous article or Google

1defGet_wordcloud (file):2 3 4 F = open (file,'R', encoding='GBK'). Read ()5 6#stutter participle, generate a string, Wordcloud can not directly generate the correct Chinese word cloud7 Cut_text =" ". Join (Jieba.cut (f))8 9 Wordcloud =Wordcloud (10#set the font, or there will be garbled characters, the path of the text is the general path of the computer's font, can be replaced by otherFont_path="C:/windows/fonts/simfang.ttf",12#set the background, width and heightBackground_color=" White", width=2000, height=1380). Generate (Cut_text)Plt.imshow (Wordcloud, interpolation="bilinear")Plt.axis ("off")17plt.show ()18#Python Learning Exchange Group: 548377875

Because of time, this article only support input a friend QQ number, if you want to crawl all of your QQ friends say, you can now QQ mailbox to get all your friends QQ number, and then generate an array, then get it.

Python crawl QQ space friends talk and generate word clouds (super detail)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.