Want to do information retrieval course set, first write a crawler crawl micro bo.
After watching the API for the afternoon microblog, I decided to use Chromnium in the evening.
Install selenium with Pip first, take a look at the official documents,
pip Install selenium
Then you need to install the browser driver, the same as the browser debug mode
Install via Choco, PowerShell word
Choco Install Selenium-all-drivers
Complete. and start writing code.
1 fromSeleniumImportWebdriver2 fromTimeImportSleep3 4 5Browser =Webdriver. Chrome ()6 Try:7 Print("Open the browser ...")8Browser.get (R'http://weibo.com')9 Print(Browser.title)Ten exceptException: One Print('Browser Open failed ...') A -Sleep (5) - the - Print("Select Href_links ...") -Href_li = Browser.find_elements_by_css_selector ('a') - Print("Total Links:", Len (href_li)) + Print('Not tag named <a>') - Print("For all the links") + forHref_elementinchHref_li: A Print(Href_element.text)
Weibo home has JS dynamic loading, originally intended to visitors landing, the results found that direct access will be stuck in a visitor's certification blank page for a few seconds, and then for the analysis of the Web page what is empty, sleep (5) after you can find what you want.
I have to wait for everything, I decided to use the API of Weibo ...
Python selenium step on the pit