Web crawler-python

Source: Internet
Author: User

Weekend nothing to write a web crawler, first introduced its function, this is a small program, mainly used to crawl pages of articles, blogs, etc., first find the article you want to crawl, such as Han's Sina blog, into his article directory, write down the directory connection such as HTTP/ Blog.sina.com.cn/s/articlelist_1191258123_0_1.html, there is a connection in each article, all we need to do now is to enter and copy the article into your own computer file according to each link. This will be the article climbed down haha, don't say directly to the code bar

Import Urllib

Import time

url=[']*50

j = 0

con = urllib.urlopen (' http://blog.sina.com.cn/s/articlelist_1191258123_0_1.html '). Read () #目录链接

I=0

title = Con.find (R ' <a title= ') #找到第一次出现 the location of the <a title=

href = Con.find (R ' href= ', title) #找到 href= position after <a title=

html = Con.find (R '. html ', href) #同上

While title! =-1 and href! =-1 and HTML! =-1 and i<50: #目录下面大概50篇文章

Url[i] = con[href + 6:html +5] #抓取每篇文章的链接

Print Url[i]

title = Con.find (R ' <a title= ', HTML) #循环抓取每篇文章

href = Con.find (R ' href= ', title)

html = Con.find (R '. html ', href)

I= i+1

While J < 50:

Content = Urllib.urlopen (Url[j]). Read () #读取每个链接内的内容

#print Content

filename = url[j][-26:]

Open (filename, ' w+ '). Write (content) #把内容写到你自己定义的文件下

print ' downloading ', url[j]

j = j+1

Time.sleep (1) #睡眠时间

This article from "Midnight" blog, declined reprint!

Web crawler-python

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.