Question: Sometimes it is convenient to debug with a scrapy shell, but sometimes using the Scrapy shell returns 403, such as the following:
c:\users\fendo>scrapy shell https://book.douban.com/subject/26805083/ 2017-04-17 15:18:53 [scrapy.utils.log] info: scrapy 1.3.3 started (Bot: scrapybot) 2017-04-17 15:18:53 [scrapy.utils.log] info: overridden settings: {' Dupefilter_class ': ' scrapy.dupefilters.BaseDupeFilter ', ' logstats_interval ': 0} 2017-04-17 15:18:53 [scrapy.middleware] info: enabled extensions: [' Scrapy.extensions.corestats.CoreStats ', ' scrapy.extensions.telnet.TelnetConsole '] 2017-04-17 15:18:54 [scrapy.middleware] info: enabled downloader middlewares: [' Scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware ', ' Scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware ', ' Scrapy.downloadermiddlewares.defaultheaders.DefaultHeadErsmiddleware ', ' scrapy.downloadermiddlewares.useragent.UserAgentMiddleware ', ' Scrapy.downloadermiddlewares.retry.RetryMiddleware ', ' Scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware ', ' Scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware ', ' Scrapy.downloadermiddlewares.redirect.RedirectMiddleware ', ' Scrapy.downloadermiddlewares.cookies.CookiesMiddleware ', ' Scrapy.downloadermiddlewares.stats.DownloaderStats '] 2017-04-17 15:18:54 [ scrapy.middleware] info: enabled spider middlewares: [' Scrapy.spidermiddlewares.httperror.HttpErrorMiddleware ', ' Scrapy.spidermiddlewares.offsite.OffsiteMiddleware ', ' Scrapy.spidermiddlewares.referer.RefererMiddleware ', ' Scrapy.spidermiddlewares.urllength.UrlLengthMiddleware ', ' scrapy.Spidermiddlewares.depth.DepthMiddleware '] 2017-04-17 15:18:54 [scrapy.middleware] info: enabled item pipelines: [] 2017-04-17 15:18:54 [ scrapy.extensions.telnet] debug: telnet console listening on 127.0.0.1:6024 2017-04-17 15:18:54 [scrapy.core.engine] info: spider opened 2017-04-17 15:18:54 [scrapy.core.engine] debug: crawled (403) <GET https ://book.douban.com/subject/26805083/> (referer: none) 2017-04-17 15:18:54 [ traitlets] debug: using default logger 2017-04-17 15:18:54 [traitlets ] debug: using default logger [s] available scrapy objects: [s] scrapy scrapy module (contains Scrapy. Request, scrapY.SELECTOR,&NBSP;ETC) [s] crawler < scrapy.crawler.crawler object at 0x000001e696fbad68> [S] item {} [s] request <GET https://book.douban.com/subject/26805083/> [s] response <403 https://book.douban.com/subject/26805083/> [s] settings <scrapy.settings.settings object at 0x000001e6993c7b70> [s] spider <defaultspider ' default ' at 0x1e69964d1d0> [s] useful shortcuts: [S] fetch (url[, Redirect=true]) Fetch URL and update local objects (by default, redirects are followed) [S] fetch (req) fetch a scrapy. request and update local objects [S] shelp () Shell help (PRINT&NBSP;THIS&NBSP;HELP) [s ] view (response) View response in a browser IN&NBSP;[1]:
Answer:
(1): The first method is to add-s user_agent= ' mozilla/5.0 ' on the command
(2): The second method is to modify the User-agent default value of Scrapy
Find Python: default_settings.py files under the installation directory, such as my F:\Software\Python36\Lib\site-packages\scrapy\settings\default_ settings.py
Put
[HTML] view plain copy user_agent = ' scrapy/%s (+http://scrapy.org) '% import_module (' scrapy '). __version__
To
[HTML] View plain copy user_agent = ' mozilla/5.0 (windows nt 5.1; rv:5.0) gecko/20100101 firefox/5.0 '