A simple crawler entry code to crawl the jokes on the wiki homepage (excluding images, text only)
- Selenium and Chromedriver need to be installed.
- Place the Chromedriver.exe in the Chrome installation directory.
- Configure environment variables. Click My Computer, Properties--Advanced system Settings->path-> new (Chrome installation location, like mine is: C:\Program Files (x86) \google\chrome\application)
#/usr/bin/env python#Coding:utf-8#Import Selenium fromSeleniumImportWebdriverclassQiubai:def __init__(self):#Open Chrome BrowserSelf.dr =Webdriver. Chrome ()#visit the Embarrassing Encyclopedia homepageSelf.dr.get ('https://www.qiushibaike.com/') defprint_content (self):#get the element with ID "Content-left"Main_content = self.dr.find_element_by_id ('Content-left') #get the element with class "content"Contents = Main_content.find_elements_by_class_name ('content') #what is obtained through the For loop outputi = 1 forContentinchContents:Print(Str (i) +"."+ Content.text +'\ n') I+ = 1self.quit ()defQuit (self):#Close Browserself.dr.quit () Qiubai (). Print_content ( )
Python Selenium embarrassing Encyclopedia