No. 333, web crawler explains 2-scrapy framework crawler-scrapy emulation Browser login-Get scrapy frame cookies

Source: Internet
Author: User

No. 333, web crawler to explain 2-scrapy framework crawler-scrapy Simulation Browser Login

Impersonate a browser login

start_requests () method, you can return a request to the crawler's starting site, this return request is equivalent to start_urls,start_requests () The request returned will replace the request in Start_urls

Request ()get requests, can be set, URL, cookie, callback function

formrequest.from_response () form post submission, first required parameter, last response cookie Response object, other parameters, cookie, url, form content, etc.

yield request () can return a new request to the crawler to execute


The operation of the cookie when sending the request,
meta={' Cookiejar ': 1} indicates a cookie record is opened and written in Request ()
meta={' Cookiejar ': response.meta[' Cookiejar '} means using the last response cookie, written in Formrequest.from_response () In post authorization
meta={' Cookiejar ': True} means using the licensed cookie to access pages that need to be logged in to view

Get scrapy Framework Cookies

Request a Cookie
Cookie = response.request.headers.getlist (' cookie ')
Print (Cookie)

Response Cookie
Cookie2 = response.headers.getlist (' Set-cookie ')
Print (COOKIE2)

#-*-coding:utf-8-*-Importscrapy fromScrapy.httpImportrequest,formrequestclassPachspider (scrapy. Spider):#To define reptiles, you must inherit scrapy. SpiderName ='Pach'                                           #Set crawler nameAllowed_domains = ['edu.iqianyue.com']#Crawl domain Names    #start_urls = [' http://edu.iqianyue.com/index_user_login.html '] #爬取网址, only for requests that do not require login, because cookies and other information cannot be setHeader= {'user-agent':'mozilla/5.0 (Windows NT 10.0; WOW64; rv:54.0) gecko/20100101 firefox/54.0'}#Set Browser user agent    defStart_requests (self):#Replace Start_urls with the Start_requests () method        """The first time you request the login page, set the cookie to be opened, set the callback function"""         return [Request ('http://edu.iqianyue.com/index_user_login.html'),  meta={'cookiejar': 1}, callback=self.parse)] defParse (self, Response):#Parse callback functionData= {#Set the user login information, corresponding to grab the packet to get the field            ' Number':'adc8868',            'passwd':'279819',            'Submit':"'            }        #Response Cookie        Cookie1 = response.headers.getlist ('set-cookie') # Look at the response Cookie, The cookie print (COOKIE1) that is written back to the browser when the first time you visit the registration page         Print('Sign In')        """second time Use form POST request, carry cookie, browser agent, user login information, login to cookie authorization"""        return[Formrequest.from_response (response, URL='Http://edu.iqianyue.com/index_user_login',#Real Post Address                                          meta={' cookiejar': response.meta['cookiejar'  ]}, headers=Self.header, Formdata=data, Callback=Self.next,)] defNext (self,response): a= Response.body.decode ("Utf-8")#You can check the login response information after logging in        #print (a)        """requests to log in to view a page, such as a personal center, with an authorized cookie request"""         yield Request ('http://edu.iqianyue.com/index_user_index.html',meta ={'cookiejar': True}, callback=self.next2) defnext2 (self,response):#Request a Cookie Cookie2 = response.request.headers.getlist ('Cookie')  print /c1> (Cookie2) body= Response.body#get page content byte typeUnicode_body = Response.body_as_unicode ()#Get Site content String typea= Response.xpath ('/html/head/title/text ()'). Extract ()#Get a Personal center page        Print(a)

No. 333, web crawler explains 2-scrapy framework crawler-scrapy emulation Browser login-Get scrapy frame cookies

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.