Python crawling Han cold all Sina Blogs

Source: Internet
Author: User


Next, we climbed the first page of the blog based on the first page of the link, we can not be difficult to find that each page of the link is only a different (page number), we just need to add a loop outside the previous code, so that we will be able to crawl all the blog page posts. That's all the posts.


#-*-Coding:-utf-8-*-import urllibimport timeurl = [']*350page = 1link = 1while page <=7://now co-owns 7 pages. 3 con = urllib.urlopen (' http://blog.sina.com.cn/s/articlelist_1191258123_0_ ' +str (page) + '. html '). Read () i = 0 titl E = Con.find (R ' <a title= ') href = Con.find (R ' href= ', title) HTML = Con.find (R '. html ', href) while title! =-1 An D href! =-1 and HTML! =-1 and I<350:url[i] = con[href + 6:html + 5] content = Urllib.urlopen (Url[i]). R EAD () Open (R ' allboke/' +url[i][-26:], ' w+ '). Write (content) print ' link ', link,url[i] title = Con        . Find (R ' <a title= ', html) href = Con.find (R ' href= ', title) HTML = Con.find (R '. html ', href) i = i + 1    link = link + 1 else:print ' page ', page, ' Find end! ' page = page + 1else:print ' All find end ' #i = 0#while i <: #content = Urllib.urlopen (Url[i]). Read () #open ( R ' save/' +url[i][-26:], ' w+ '). Write (content) #print ' downloading ', i,url[i] #i = i + 1 #time. Sleep (1) #else:p rint ' download artical finished! ' 


In the most part of the code, saving a Web page can only be saved to 50, not knowing where it went wrong.

So just put the code to save the page in the search, find it and save it!


Perform the interface correctly:



Execution Result:


Python crawling Han cold all Sina Blogs

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.