Objective
Look first:
Ideas
1. Confirm the URL of the visit
2. Demo Login your QQ number
3. Determine if the friend space is added permissions, switch to the said frame, crawl the current page data, pull down the scroll bar, page to continue to get crawled content written to the local TXT file
4. Crawl to the last page and read the TXT file to generate a word cloud
Specific analysis
1. Confirm the URL of the visit
This is very simple, we found through the observation, QQ space friend's URL:
https://user.qzone.qq.com/{friend QQ Number}/311
2. We will encounter a simulated login on request, that is, to login your QQ number via selenium+ browser to access your friend's QQ space
Here is the code for the impersonation login:
1File ='C:/users/administrator/desktop/{}.txt'. Format (QQ)2Driver =Webdriver. Firefox ()3Driver.maximize_window ()#window Maximization4 5Driver.get ('https://user.qzone.qq.com/{}/311'. Format (QQ))#URL6Driver.implicitly_wait (10)#implicit wait, in order to wait for the full load good URL7DRIVER.FIND_ELEMENT_BY_ID ('Login_div')8Driver.switch_to_frame ('Login_frame')#cut to the frame that entered the account password9DRIVER.FIND_ELEMENT_BY_ID ('Switcher_plogin'). Click ()##点击 ' account password login 'TenDRIVER.FIND_ELEMENT_BY_ID ('u'). Clear ()##清空账号栏 OneDRIVER.FIND_ELEMENT_BY_ID ('u'). Send_keys ('your QQ account')#Enter your account ADRIVER.FIND_ELEMENT_BY_ID ('P'). Clear ()#Empty Password Bar -DRIVER.FIND_ELEMENT_BY_ID ('P'). Send_keys ('your QQ password')#Enter Password -DRIVER.FIND_ELEMENT_BY_ID ('Login_button'). Click ()#click ' Login ' theDriver.switch_to_default_content ()#jump out of the current frame, this step is critical, do not write will be error, because you log in and then cut another frame
It is important to emphasize that driver.switch_to_default_content (), which means jumping out of the current frame, is critical because you have to cut another frame after logging in. If you do not write, the following error will appear:
3. The third part, I would say in several points:
(1). Determine if the space has been added permissions
1 Try : 2 driver.find_element_by_id ('qm_ownerinfo_icon')# determine if the QQ space is added to the permissions 3 b = True4 except:5 b = False
(2) switch to the said frame, this people will find it
(3) drop-down scroll bar
The dropdown scroll bar is to click on ' Next Page ', drop down to the visible view. Note the drop-down scroll bar:
Be sure to correspond to its frame, not to crawl the said frame down.
1 # Sub 4 This pull down, make sure to pull down to the bottom 2 for in range (1, 5):3 driver.execute_script ("Window.scrollby (0,5000) ")4 time.sleep (2)
(4) Crawl to say the data, this is simple? I used the XPath to get the said title, interested friends can put time and other data together to get
1 selector = etree. HTML (Driver.page_source)2 title = Selector.xpath ('//li/div/div/pre/text () ')
(5). Page turn
Just click on ' next Page '.
1 driver.find_element_by_link_text (u' next page '). Click ()
(6). txt data write, do not say more, crawl to the title of the direct write
1 forIinchTitle:2 if notos.path.exists (file):3 Print('Create txt Success')4 5With open (file,'A +') as F:6F.write (i +'\ n')7F.close ()
4. Generate word cloud, this is just the normal mode, want to know specifically can see my previous article or Google
1defGet_wordcloud (file):2 3 4 F = open (file,'R', encoding='GBK'). Read ()5 6#stutter participle, generate a string, Wordcloud can not directly generate the correct Chinese word cloud7 Cut_text =" ". Join (Jieba.cut (f))8 9 Wordcloud =Wordcloud (10#set the font, or there will be garbled characters, the path of the text is the general path of the computer's font, can be replaced by otherFont_path="C:/windows/fonts/simfang.ttf",12#set the background, width and heightBackground_color=" White", width=2000, height=1380). Generate (Cut_text)Plt.imshow (Wordcloud, interpolation="bilinear")Plt.axis ("off")17plt.show ()18#Python Learning Exchange Group: 548377875
Because of time, this article only support input a friend QQ number, if you want to crawl all of your QQ friends say, you can now QQ mailbox to get all your friends QQ number, and then generate an array, then get it.
Python crawl QQ space friends talk and generate word clouds (super detail)