Download the blog list on the blog homepage.
Url_con = urllib. urlopen ('HTTP: // blog.sina.com.cn/s/articlelist_1193111400_0_1.html '). read () print 'Con ', url_conurl = [''] * 40i = 0 title = url_con.find (R' <a title =') print" title ", titlehref = url_con.find (r 'href = ', title) print "href", hrefhtml = url_con.find(r'.html', href) print "html", htmlwhile title! =-1 and href! =-1 and html! =-1 and I <40: url [I] = url_con [href + 6: html + 5] print url [I] title = url_con.find (R' <a title = ', html) href = url_con.find (r 'href = ', title) html = url_con.find(r'.html', href) filename = url [-26:] I = I + 1j = 0 while j <40: content = urllib. urlopen (url [j]). read () filename = url [j] [-26:] open (r 'blog/'+ filename, 'w '). write (content) j = j + 1 time. sleep (5)
The above code is used to obtain the list of blog articles.