A probe into Python crawler

Source: Internet
Author: User


Garbled after running console in Pycharm. Required files >> Settings >> editor >> file encoding

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M00/74/08/wKiom1YOiVnyPmlQAAGpMdeT9eU557.jpg "title=" p1.png "alt=" Wkiom1yoivnypmlqaagpmdet9eu557.jpg "/>


Crawling Web pages

#-*-coding:utf-8-*-import requests# Chinese code utf-8import sysreload (SYS) sys.setdefaultencoding (' utf-8 ') #模拟浏览器hea = {' User-agent ': ' mozilla/5.0 (Windows NT 6.2; WOW64) applewebkit/537.36 (khtml, like Gecko) chrome/39.0.2171.71 safari/537.36 '}url = ' #爬取链接html = requests.get (' url ', headers = hea) Print html.textprint ' Start crawling content ... ‘


Simulated landing crawler, #带有cookie

The key is how to get cookies

Ps.cookie encounter every landing change to pay attention to change points, often change the place is random code


Method One: By grasping the package artifact--fiddler

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M01/74/08/wKiom1YOiyiwkmIxAAWE1QvHKIw485.jpg "style=" float: none; "title=" P3.png "alt=" Wkiom1yoiyiwkmixaawe1qvhkiw485.jpg "/>

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M01/74/05/wKioL1YOizigmSK-AAU9UJUl6C0192.jpg "style=" float: none; "title=" fear. png "alt=" wkiol1yoizigmsk-aau9ujul6c0192.jpg "/>

Method 2:

Review elements directly with IE

650) this.width=650; "src=" http://s3.51cto.com/wyfs02/M02/74/05/wKioL1YOi-GTV2QGAAJr6VIN_5g852.jpg "title=" p4.png "alt=" Wkiol1yoi-gtv2qgaajr6vin_5g852.jpg "/>

#-*-coding:utf-8-*-import Requestsimport Recook = {' Cookie ': '} url = ' html = requests.get (URL, cookie = cook). Content Print HTML


This article from "Michelle" blog, declined reprint!

A probe into Python crawler

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.