Request was rejected when crawling a Web site? Scrapy easy to resolve the request header settings! It's not reasonable.

Last Update:2018-06-23 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Default Request Header

command line execution, new crawler

Scrapy startproject myspidercd myspider scrapy genspider scrapy_spider httpbin.org

Through the request to https://httpbin.org/get?show_env=1, we can view the browser information of this request and open it to see if it is your browser information.

Copy the returned text to https://www.json.cn/formatted as a JSON format for easy viewing, and the following actions are not mentioned.

Modify the request header

In that case, we'll modify the request header to achieve the fake effect.

Open the link below and choose a request header that you like

Http://www.useragentstring.com/pages/useragentstring.php?name=Chrome

This uses the Chrome browser request header

Re-visit to find that our request header has been successfully replaced

"User-agent": "mozilla/5.0 (Windows NT 6.1) applewebkit/537.36 (khtml, like Gecko) chrome/41.0.2228.0 safari/537.36",

Way Three: Link settings

This method takes effect on a single link, just this link of the request enjoy

Setting the headers parameter in the request method

Method Four: Middleware settings

This method can be changed from the entire project to modify the request header settings rules, changeable, different wording, you can configure the different settings, the following is a relatively simple example

We refer to scrapy default processing of the request header middleware

From scrapy.downloadermiddlewares.useragent import Useragentmiddleware

Writing middleware

Action Priority

If you make the following settings

# settings.pyuser_agent = "Settings"

# scrapy_spider.pycustom_settings = {"User_agent": "Custom_settings",}headers={"user-agent": "Header"}

The effect is:

"User-agent": "Header"

Comment out headers

"User-agent": "Custom_settings"

Comment out Custom_settings

"User-agent": "Custom_settings"

Comment out Settings

"User-agent": "scrapy/1.1.2 (+http://scrapy.org)"

The visible priority level is:

Headers > Custom_settings > settings.py > Scrapy Default

Attention

Note the wording of the user-agent parameter

headers={"User-agent": User_agent})

If it's wrong, it's likely something strange happened.

headers={"User_agent": User_agent}

More scrapy in the request header ...

"User-agent": "scrapy/1.1.2 (+http://scrapy.org), mozilla/5.0 (Windows NT 6.1) applewebkit/537.36 (khtml, like Gecko) chrome/41.0.2228.0 safari/537.36 ",

In fact, very well differentiated:

User-agent: Browser request header parameters, often used in HTML code-

User_agent:python variable

Suggestions:

Every time you write the browser parameters, if you are afraid to write wrong to open their own browser, casually test a page, copy from inside

As from the introduction to the actual combat, I stepped on a lot of pits, this article made a simple summary, but also shared a few more useful sites. Hope this article to share can provide you with a less detours shortcut, then the value of this article is reflected.

By the way, an ad.

Recently want to write an open source library, Chinesename Chinese name, has achieved the basic name, but the names need to be optimized, if you want to do things together classmates, can together

Request was rejected when crawling a Web site? Scrapy easy to resolve the request header settings! It's not reasonable.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Request was rejected when crawling a Web site? Scrapy easy to resolve the request header settings! It's not reasonable.

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Request was rejected when crawling a Web site? Scrapy easy to resolve the request header settings! It's not reasonable.

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support