Python download online Read the script of Tomb Raider literary sketches

Last Update:2014-10-14 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Recently busy to see the novel, find a full south of the three-uncle's novel website, decided to download down to see, so hands-on, in a lot of QQ Group Master's help (I am the expression is very rotten, the program complex is a few master guidance), spent three or four days to write a script

Requires BeautifulSoup and requests of two libraries

(I've written the notes as detailed as possible)

This program execution speed is very slow, ask Master to tell me how to optimize!!

#-*-coding:utf8-*-from BS4 Import beautifulsoupimport requestsimport reimport os# Open Web page read the desired URL and put it in a list R = Requests.get (' http://www.nanpaisanshu.org/'). Content #打开要读取的网页content =beautifulsoup (r). FindAll (' A ', href= Re.compile (R ' \ahttp://www.nanpaisanshu.org/[a-z]+\z ')) #在网页中找到需要的信息sc = str (content) #转换为string类型lists =[]lists =    Sc.split (', ') lists = List (set (lists)) #删除列表中重复信息lisy =[]for line in Lists:p=line.split (' "') [1] #按" Split, take out the required information and write it into the array  Lisy.append (p) #这里已经拥有需要的url #print p#print lisy# to open the URL traversal read to, save all the pages in the HTML file s = OS.GETCWD () #当前路径d = Os.sep #系统分隔符namef = ' aaa ' #文件加名称 #b = os.path.exists (S+D+NAMEF) #判断是存在f =os.path.exists (S+D+NAMEF) #判断是存在if F==false:os.mkdir ( S+D+NAMEF) #如果文件夹不存在就新建一个else: Print U ' already exists ' +nameffilenm = s+d+namef+d #路径i =1for line in lisy:r = Requests.get ( Line) #遍历打开所有url print r.content print ' \ n ' tfile=open (filenm+ ' Neirong ' +str (i) + '. html ', ' W ') i=i+1 TFILE.W Rite (r.content) #将网页内容写入文件 # Read the URL file in a compliant URL and write it into a TXT file for the i inRange (1,len (lisy) +1): fp = open (filenm+ ' Neirong ' +str (i) + '. html ', "R") of = open (filenm+ ' Neirong ' +str (i) + '. txt ', ' W ') Content = Fp.read () #将文件内容读取 p=re.compile (R ' http://www\.nanpaisanshu\.org/.*?\.html ') #正则匹配 #print p.find All (content) #print type (P.findall (content)) for line in P.findall (content): #print line+ ' \ n ' #if li Ne! = ' http://www.nanpaisanshu.org/9701.html ': of.write (line+ ' \ n ') #将匹配到的文件写入另一个文件中 #else: #conti    Nue #of. Write (P.findall (content)) #关闭文件of. Close () Fp.close () tfile.close () #将txtfor I in range (1,len (lisy) +1): Ot=open (filenm+ ' Neirong ' +str (i) + ' txt ', ' R ') outfile=open (filenm+ ' Quanbu ' +str (i) + ' txt ', ' A + ') li=[] for line in OT : line = line.replace (' \ n ', ') li.append (line) #将url文件中的数据放进列表中 li = sorted (li) #给列表排序 for line in Li:print line #line = line.replace (' \ n ', ') R = Requests.get (line). Content #遍历打开所有url title= BeautifulSoup (R). Find ("DIV ", {' class ':" Post_title "}). H2 #取出标题 Content=beautifulsoup (R). FindAll (" div ", {' class ':" Post_entry "}) #取出内容 Sti=str (title). Replace (' 

Sometimes the connection fails, then the program error, you should determine the Requests.get (URL). status_code! = 200 But I added the later found that running slower, each page is judged, sweat, probably because I have a few k on the speed of the reason will be abnormal



Python download online Read the script of Tomb Raider literary sketches

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python download online Read the script of Tomb Raider literary sketches

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Python download online Read the script of Tomb Raider literary sketches

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support