Python crawler's scrapy emulation login

Last Update:2018-01-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Background:

Newcomers to the Pythoner, at the beginning of the feeling that all the site is nothing more than the analysis of HTML, JSON data, but ignored a lot of a problem, there are many sites in order to reverse the crawler, in addition to the need for a highly available proxy IP address pool, but also need to log in. For example, a lot of information is required to log in to crawl, but frequent login will appear verification code (some sites directly let you enter the verification code), this is the pit, after all, the operation of the students are very hard, the anti-also anti-, then we how to do? This does not say the verification code, you can manually enter the verification, or directly with the cloud coding platform, here we introduce a scrapy login usage.

Test Login Address: Http://example.webscraping.com/places/default/user/login

Test home: Http://example.webscraping.com/user/profile

1, here is not the description of how to create Scrapy project and spider, you can see my previous blog

2, fast login method.

We do a simple introduction here, we all know that the basic request flow of scrapy is the Start_request method traversal Start_urls list, and then Make_requests_from_url method, inside the request method, Request the address inside the Start_urls, but here we use is no longer the Get method, and use is the Post method, also often say login.

1, first we rewrite the Start_reqeusts method, the direct get login page HTML information (some people say you are not post login, why also get, do not worry, you have to get to login page login information, just know login account, password, etc. how to submit, where to submit)

2, Start_request method get to the data, with the callback parameter, execute to get response after the next method to execute, and then in the login method to write the login user name and password (or old, must use Dict), Then we only use the request subclass Scrapy.formrequest This method to submit the data, this one is the Formrequest.from_response square m_ method.

Some people will ask that the basic use of this from__response is to pass in a response object as the first parameter, which will help the user create the Formrequest object from the form form on the page. The most important thing is that it will help you to automatically jump the information in the hidden input tag into the expression, using this method, we write the user name and password directly, we will introduce the traditional method in the last side.

3. The Parse_login method is that the callback callback function specifies the method to be executed after the form is submitted, in order to verify the success. Here we directly in response search welcome Liu This word to prove login success. This is a good understanding, the focus is yield from super (). Start_resquests (), which means that if once logged in successfully, directly with the login after the success of the cookie value, method start_urls the address inside. In this case, after the successful login response can be written directly in the parse.

1 #-*-coding:utf-8-*-2 Importscrapy3  fromScrapyImportformrequest,request4 5 6 classExampleloginspider (scrapy. Spider):7Name ="Login_"8Allowed_domains = ["example.webscraping.com"]9Start_urls = ['Http://example.webscraping.com/user/profile']TenLogin_url ='Http://example.webscraping.com/places/default/user/login' One  A     defParse (self, response): -         Print(Response.text) -  the     defstart_requests (self): -         yieldScrapy. Request (self.login_url,callback=self.login) -  -     defLogin (self,response): +Formdata = { -             'Email':'[email protected]','Password':'12345678'} +         yieldFormrequest.from_response (response,formdata=Formdata, Acallback=self.parse_login) at     defParse_login (self,response): -         #print (' >>>>>>>> ' +response.text) -         if 'Welcome Liu' inchResponse.text: -             yield  fromSuper (). Start_requests ()

Python crawler's scrapy emulation login

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python crawler's scrapy emulation login

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Python crawler's scrapy emulation login

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support