Python crawler's scrapy emulation login

Source: Internet
Author: User

Background:

Newcomers to the Pythoner, at the beginning of the feeling that all the site is nothing more than the analysis of HTML, JSON data, but ignored a lot of a problem, there are many sites in order to reverse the crawler, in addition to the need for a highly available proxy IP address pool, but also need to log in. For example, a lot of information is required to log in to crawl, but frequent login will appear verification code (some sites directly let you enter the verification code), this is the pit, after all, the operation of the students are very hard, the anti-also anti-, then we how to do? This does not say the verification code, you can manually enter the verification, or directly with the cloud coding platform, here we introduce a scrapy login usage.

Test Login Address: Http://example.webscraping.com/places/default/user/login

Test home: Http://example.webscraping.com/user/profile

1, here is not the description of how to create Scrapy project and spider, you can see my previous blog

2, fast login method.

We do a simple introduction here, we all know that the basic request flow of scrapy is the Start_request method traversal Start_urls list, and then Make_requests_from_url method, inside the request method, Request the address inside the Start_urls, but here we use is no longer the Get method, and use is the Post method, also often say login.

1, first we rewrite the Start_reqeusts method, the direct get login page HTML information (some people say you are not post login, why also get, do not worry, you have to get to login page login information, just know login account, password, etc. how to submit, where to submit)

2, Start_request method get to the data, with the callback parameter, execute to get response after the next method to execute, and then in the login method to write the login user name and password (or old, must use Dict), Then we only use the request subclass Scrapy.formrequest This method to submit the data, this one is the Formrequest.from_response square m_ method.

Some people will ask that the basic use of this from__response is to pass in a response object as the first parameter, which will help the user create the Formrequest object from the form form on the page. The most important thing is that it will help you to automatically jump the information in the hidden input tag into the expression, using this method, we write the user name and password directly, we will introduce the traditional method in the last side.

3. The Parse_login method is that the callback callback function specifies the method to be executed after the form is submitted, in order to verify the success. Here we directly in response search welcome Liu This word to prove login success. This is a good understanding, the focus is yield from super (). Start_resquests (), which means that if once logged in successfully, directly with the login after the success of the cookie value, method start_urls the address inside. In this case, after the successful login response can be written directly in the parse.

1 #-*-coding:utf-8-*-2 Importscrapy3  fromScrapyImportformrequest,request4 5 6 classExampleloginspider (scrapy. Spider):7Name ="Login_"8Allowed_domains = ["example.webscraping.com"]9Start_urls = ['Http://example.webscraping.com/user/profile']TenLogin_url ='Http://example.webscraping.com/places/default/user/login' One  A     defParse (self, response): -         Print(Response.text) -  the     defstart_requests (self): -         yieldScrapy. Request (self.login_url,callback=self.login) -  -     defLogin (self,response): +Formdata = { -             'Email':'[email protected]','Password':'12345678'} +         yieldFormrequest.from_response (response,formdata=Formdata, Acallback=self.parse_login) at     defParse_login (self,response): -         #print (' >>>>>>>> ' +response.text) -         if 'Welcome Liu' inchResponse.text: -             yield  fromSuper (). Start_requests ()

Login successful

Python crawler's scrapy emulation login

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.