Use python to crawl blogs and python to crawl blogs

Source: Internet
Author: User

Use python to crawl blogs and python to crawl blogs
Blog crawling with python by Wu xueying
Take the blog crawling Wang Yin as an example:

import reimport urllib2def getHtmlCode(url):return urllib2.urlopen(url).read()def findTitleUrl(htmlString):    regTitleUrl = re.compile("href=\"(.+?)\"")    return regTitleUrl.findall(htmlString)def findTitleContent(htmlString):regTitleContent = re.compile("\">(.+?)</a>")return regTitleContent.findall(htmlString)htmlCode = getHtmlCode('http://www.yinwang.org/')titleContent = findTitleContent(htmlCode)titleUrl = findTitleUrl(htmlCode)for i in range(0, len(titleUrl)):print titleContent[i+3]print titleUrl[i+8]htmlPage = getHtmlCode(titleUrl[i+8])f = open("%s.html"%(titleContent[i+3]),'wb')f.write(htmlPage)f.close



Python script Learning Process recommendation

Learning Process:

I. lay a good foundation
1. Find a suitable entry book (Python core programming 2 and Dive into Python are recommended), read it once, judge it cyclically, use common classes, and understand it (too difficult to skip)
2. Practice python exercises frequently (python core programming 2 has a large number of exercises after class)
3. Join the Python discussion group.
4. write a blog on the summary of python learning.
Ii. Start to use Python for daily work.
For example, Python searches for files, Python batch processing, and web crawlers.
Iii. start learning about Django, Flask, Tornado and other frameworks to develop some web applications.
----------------------------
Resource recommendation:
Concise Python tutorial
Learning programming with children
Head First Python Chinese edition
Stupid Way To Learn Python
Dive. Into. Python Chinese version (with course source code)
Python core programming
Deep understanding of Python
Python standard library
Python programming guide
Diango_book Chinese edition
For more information about the system, see the python official documentation and django official documentation. Learn, summarize, practice, and practice to learn python.

How does python capture csdn Blog content?

R = requests. get ('blog .csdn.net/u013055678 ')

In this case, anti-crawler protection of csdn is required.
R = requests. get ('blog .csdn.net/u013055678', headers?='user-agent': 'mozilla/5.0 (Windows NT 6.1; rv: 32.0) Gecko/20100101 Firefox/100 '})

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.