Python download online Read the script of Tomb Raider literary sketches

Source: Internet
Author: User

Recently busy to see the novel, find a full south of the three-uncle's novel website, decided to download down to see, so hands-on, in a lot of QQ Group Master's help (I am the expression is very rotten, the program complex is a few master guidance), spent three or four days to write a script

Requires BeautifulSoup and requests of two libraries

(I've written the notes as detailed as possible)

This program execution speed is very slow, ask Master to tell me how to optimize!!

#-*-coding:utf8-*-from BS4 Import beautifulsoupimport requestsimport reimport os# Open Web page read the desired URL and put it in a list R = Requests.get (' http://www.nanpaisanshu.org/'). Content #打开要读取的网页content =beautifulsoup (r). FindAll (' A ', href= Re.compile (R ' \ahttp://www.nanpaisanshu.org/[a-z]+\z ')) #在网页中找到需要的信息sc = str (content) #转换为string类型lists =[]lists =    Sc.split (', ') lists = List (set (lists)) #删除列表中重复信息lisy =[]for line in Lists:p=line.split (' "') [1] #按" Split, take out the required information and write it into the array  Lisy.append (p) #这里已经拥有需要的url #print p#print lisy# to open the URL traversal read to, save all the pages in the HTML file s = OS.GETCWD () #当前路径d = Os.sep #系统分隔符namef = ' aaa ' #文件加名称 #b = os.path.exists (S+D+NAMEF) #判断是存在f =os.path.exists (S+D+NAMEF) #判断是存在if F==false:os.mkdir ( S+D+NAMEF) #如果文件夹不存在就新建一个else: Print U ' already exists ' +nameffilenm = s+d+namef+d #路径i =1for line in lisy:r = Requests.get ( Line) #遍历打开所有url print r.content print ' \ n ' tfile=open (filenm+ ' Neirong ' +str (i) + '. html ', ' W ') i=i+1 TFILE.W Rite (r.content) #将网页内容写入文件 # Read the URL file in a compliant URL and write it into a TXT file for the i inRange (1,len (lisy) +1): fp = open (filenm+ ' Neirong ' +str (i) + '. html ', "R") of = open (filenm+ ' Neirong ' +str (i) + '. txt ', ' W ') Content = Fp.read () #将文件内容读取 p=re.compile (R ' http://www\.nanpaisanshu\.org/.*?\.html ') #正则匹配 #print p.find All (content) #print type (P.findall (content)) for line in P.findall (content): #print line+ ' \ n ' #if li Ne! = ' http://www.nanpaisanshu.org/9701.html ': of.write (line+ ' \ n ') #将匹配到的文件写入另一个文件中 #else: #conti    Nue #of. Write (P.findall (content)) #关闭文件of. Close () Fp.close () tfile.close () #将txtfor I in range (1,len (lisy) +1): Ot=open (filenm+ ' Neirong ' +str (i) + ' txt ', ' R ') outfile=open (filenm+ ' Quanbu ' +str (i) + ' txt ', ' A + ') li=[] for line in OT : line = line.replace (' \ n ', ') li.append (line) #将url文件中的数据放进列表中 li = sorted (li) #给列表排序 for line in Li:print line #line = line.replace (' \ n ', ') R = Requests.get (line). Content #遍历打开所有url title= BeautifulSoup (R). Find ("DIV ", {' class ':" Post_title "}). H2 #取出标题 Content=beautifulsoup (R). FindAll (" div ", {' class ':" Post_entry "}) #取出内容 Sti=str (title). Replace (' 

Sometimes the connection fails, then the program error, you should determine the Requests.get (URL). status_code! = 200 But I added the later found that running slower, each page is judged, sweat, probably because I have a few k on the speed of the reason will be abnormal




Python download online Read the script of Tomb Raider literary sketches

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.