Selenium collecting page elements
PHANTOMJS is mostly analog login
There's not much to say, on the code.
From selenium import Webdriverimport selenium.webdriver.support.ui as Uiimport timedef crawl_cnblogs (blog_url,username , pwd): Driver = Webdriver. PHANTOMJS () driver.get ("Http://passport.cnblogs.com/user/signin?") returnurl=http%3a%2f%2fwww.cnblogs.com%2f ") wait = UI. Webdriverwait (Driver, ten) Wait.until (Lambda dr:dr.find_element_by_id (' signin '). is_displayed ()) driver.find_element _by_id ("Input1"). Send_keys (username) driver.find_element_by_id ("Input2"). Send_keys (PWD) driver.find_element_by_ ID ("signin"). Click () wait.until (Lambda dr:dr.find_element_by_id (' Login_area '). is_displayed ()) #登录成功 Driver.get (blo G_url) Wait.until (Lambda dr:dr.find_element_by_id (' maincontent '). is_displayed ()) Time.sleep (3) #articles = Drive R.find_element_by_xpath ('//div[@class = "Posttitle"]/a ') #为啥不成功 articles = Driver.find_elements_by_class_name (" Posttitle ") for article in articles:print article #print article.text #print article.text.decode (" UT F-8 "," Ignore "# scrapy crawler crawling car info code incredibly wrong urls = Driver.find_elements_by_class_name ("postTitle2") for URL in urls:p Rint url.get_attribute ("href") driver.save_screenshot (' Screen.png ') driver.quit () if __name__ = = ' __main__ ': Crawl _cnblogs ("http://www.cnblogs.com/xiaoyy3/", "xiaoyaoyou3", "------password---------")
Run results
Encoding error, need to change to print article.text.encode (' GB18030 ')
Run the result as
SELENIUM+PHANTOMJS Automated Login Crawl blog post