Python crawler Simulation Login watercress Get the movies you've seen recently

Last Update:2015-08-03 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Hahaha, demo login successful La La la ~~~~~

The important thing is to say three times, but still forget = =

First on:

As we all know, many websites have been set up to get the right to view the page, so the simulation login is the first step to crawl information, this step succeeded, hey, just do it!

OK, nonsense not much to say, direct focus:

First, you should understand the process of website Login and the information you need to post , take the watercress as an example:

Source:movieredir:https://movie.douban.com/mine?status=collectform_email:usernameform_password: Passwordcaptcha-solution:dresscaptcha-id:6rp40cbjzngdjuqogm3y6wns:enlogin: Login

This is the information you need to submit, including the user name and password, as well as the ID of the verification Code and verification Code, see this may be someone will think how I know the ID of the verification code, you can rest assured that in the page load has been to the client side, that is, you could see directly from the browser, is not cool!

The second step, need to understand some requests This library, because requests directly eliminates a lot of urllib and urllib2 many a lot of trouble, save a lot of redundant code, as the official website said,Requests:http for Human, this is for human use = =

Website address: Requests

If you have ever known re and BS4, well, direct coding it!

Otherwise it would be better to get to know Bs and save a lot of trouble, help document address: BeautifulSoup

Talking is cheap,show me the code. Now is showtime!

#-*-encoding:utf-8-*-############################# #__author__ = "Andrewseu" __date__ = "2015/8/3" ################## ############ #import requestsfrom bs4 import beautifulsoupimport urllibimport reloginurl = ' http://accounts.douban.com/ Login ' formdata={' redir ': ' Http://movie.douban.com/mine?status=collect ', ' form_email ': username, ' Form_password ':p a ssWOrd, "Login": U ' login '}headers = {"User-agent": ' mozilla/5.0 (Windows NT 6.1) applewebkit/537.36 (khtml, like Gecko) chro me/43.0.2357.134 safari/537.36 '}r = requests.post (loginurl,data=formdata,headers=headers) page = R.text#print R.url ' ' Get authenticode picture ' #利用bs4获取captcha地址soup = BeautifulSoup (page, "Html.parser") captchaaddr = Soup.find (' img ', id= ' Captcha_ Image ') [' src '] #利用正则表达式获取captcha的IDreCaptchaID = R ' <input type= "hidden" name= "Captcha-id" value= "(. *?)" /' Captchaid = Re.findall (recaptchaid,page) #print captchaid# saved to local urllib.urlretrieve (captchaaddr, "captcha.jpg") Captcha = raw_input (' Please input the captcha: ') formdata[' captcha-solution '] = CAptchaformdata[' Captcha-id ' = CAPTCHAIDR = requests.post (loginurl,data=formdata,headers=headers) page = R.textif R.url    = = ' Http://movie.douban.com/mine?status=collect ': print ' Login successfully!!! ' print ' I've seen the movie ', '-' *60 #获取看过的电影 soup = beautifulsoup (page, "html.parser") result = Soup.findall (' li ', attrs={"class" : "title"}) #print result for item in Result:print item.find (' a '). Get_text () else:print "failed!"

Have any do not understand the place, welcome to communicate with me!

Python crawler Simulation Login watercress Get the movies you've seen recently

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python crawler Simulation Login watercress Get the movies you've seen recently

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support