scrapy shell 調試返回403 返回為[]

來源:互聯網
上載者:User

Question:有時候用scrapy shell來調試很方便,但是有時候使用scrapy shell會返回403,比如下面:

C:\Users\fendo>scrapy shell https://book.douban.com/subject/26805083/   2017-04-17 15:18:53 [scrapy.utils.log] INFO: Scrapy 1.3.3 started (bot: scrapybot)   2017-04-17 15:18:53 [scrapy.utils.log] INFO: Overridden settings: {'DUPEFILTER_CLASS': 'scrapy.dupefilters.BaseDupeFilter', 'LOGSTATS_INTERVAL': 0}   2017-04-17 15:18:53 [scrapy.middleware] INFO: Enabled extensions:   ['scrapy.extensions.corestats.CoreStats',    'scrapy.extensions.telnet.TelnetConsole']   2017-04-17 15:18:54 [scrapy.middleware] INFO: Enabled downloader middlewares:   ['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',    'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',    'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',    'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',    'scrapy.downloadermiddlewares.retry.RetryMiddleware',    'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',    'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',    'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',    'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',    'scrapy.downloadermiddlewares.stats.DownloaderStats']   2017-04-17 15:18:54 [scrapy.middleware] INFO: Enabled spider middlewares:   ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',    'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',    'scrapy.spidermiddlewares.referer.RefererMiddleware',    'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',    'scrapy.spidermiddlewares.depth.DepthMiddleware']   2017-04-17 15:18:54 [scrapy.middleware] INFO: Enabled item pipelines:   []   2017-04-17 15:18:54 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6024   2017-04-17 15:18:54 [scrapy.core.engine] INFO: Spider opened   2017-04-17 15:18:54 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://book.douban.com/subject/26805083/> (referer: None)   2017-04-17 15:18:54 [traitlets] DEBUG: Using default logger   2017-04-17 15:18:54 [traitlets] DEBUG: Using default logger   [s] Available Scrapy objects:   [s]   scrapy     scrapy module (contains scrapy.Request, scrapy.Selector, etc)   [s]   crawler    <scrapy.crawler.Crawler object at 0x000001E696FBAD68>   [s]   item       {}   [s]   request    <GET https://book.douban.com/subject/26805083/>   [s]   response   <403 https://book.douban.com/subject/26805083/>   [s]   settings   <scrapy.settings.Settings object at 0x000001E6993C7B70>   [s]   spider     <DefaultSpider 'default' at 0x1e69964d1d0>   [s] Useful shortcuts:   [s]   fetch(url[, redirect=True]) Fetch URL and update local objects (by default, redirects are followed)   [s]   fetch(req)                  Fetch a scrapy.Request and update local objects   [s]   shelp()           Shell help (print this help)   [s]   view(response)    View response in a browser   In [1]:

Answer:


(1):第一種方法是在命令上加上-s USER_AGENT='Mozilla/5.0'

(2):第二種方法是修改scrapy的user-agent預設值


找到Python的:安裝目錄下的default_settings.py檔案,比如我的F:\Software\Python36\Lib\site-packages\scrapy\settings\default_settings.py



[html]  view plain  copy USER_AGENT = 'Scrapy/%s (+http://scrapy.org)' % import_module('scrapy').__version__  

改為


[html]  view plain  copy USER_AGENT = 'Mozilla/5.0 (Windows NT 5.1; rv:5.0) Gecko/20100101 Firefox/5.0'    

相關文章

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.