Want to see the novel "Ghost Blows of the essence of the ancient city", but the web version of a lot of ads, but also a page by page, also can't copy, so wrote a small reptile, save to Word slowly look.
The code is as follows:
"""climb to the blows of Ghost city""" fromSeleniumImportWebdriverImportOS fromDocxImportDocumentclassdownloadfiles ():def __init__(self): Self.baseurl='http://www.luoxia.com/guichui/'Self.basepath= Os.path.dirname (__file__) defMakeDir (self, name): Path=Os.path.join (Self.basepath, name) isexist=os.path.exists (path)if notisExist:os.makedirs (path)Print('File has been created.') Else: Print('The file is existed.') #switch to this directoryos.chdir (path)defConnect (self, url):Try: Driver=Webdriver. PHANTOMJS () driver.get (URL)Print(URL)except: "This page was not existed." returnDriverdefgetcontent (self): Doc=Document () Self.makedir ('Storyfiles') forPageinchRange (27426, 27461): Print('The page number is :'+Str (page)) URL= Self.baseurl + str (page) +'. htm'Driver=self.connect (URL) rList= Driver.find_elements_by_xpath ('//article/p') forRinchrList:Print(R.text) doc.add_paragraph (r.text) Doc.save ('Guichuideng.doc')if __name__=='__main__': obj=downloadfiles () obj.getcontent ( )
View Code
Python Crawler-Grasping the novel "Ghost Blows Ancient City"