Python Analog Browser Login

Source: Internet
Author: User
Tags set cookie urlencode

Turn from: http://blog.csdn.net/shomy_liu/article/details/37658701

The previous article roughly describes two simple situations in which Python crawls a Web page, then learns about the login, and implements the login Renren

Let's summarize some of the steps to login

1, add cookie Configuration

Generally require account password login, direct URL or mimic browser operation is inaccessible, the general solution is to use a Python module is cookielib, used to remember the successful login to save to the local cookie;

The specific code to see everyone login bar

2, add the form information that the login needs to submit

General postdata information for the landing of the user name, password, and so many other information, the rest of the data is necessary, need to test; View this can be used Httpfox or review element nextwork inside, when you click Login, In the network of the review element, we can see the data information of post and get, and we can take it according to our needs.

The following is the code that imitates the login Renren; The code annotation is very fine. In order to recall ~ ~ Because the right is not very will, did not do crawl (just behind the learning Qaq)

[Python]  View plain copy # -*- coding: cp936 -*-   #renren  login   # filename: renren.py      import urllib2,urllib,cookielib      # Set cookie   cookiejar= cookielib. Cookiejar ()    cookie=urllib2. Httpcookieprocessor (Cookiejar)    Opener= urllib2.build_opener (cookie,urllib2. HttpHandler ())    Urllib2.install_opener (opener)       #账号信息    email=raw_ Input (' input mailbox ')    password=raw_input (' input password ')    domain= ' renren.com ' #域名    url= ' http:// Www.renren.com/PLogin.do ' #可以通过审查元素得到          #httpfox抓取数据包信息,  where headers and domain   Optional  postdata inside a lot of elements; the main username password    #d对付反爬虫    headers={       ' User-agent ': ' mozilla/5.0  (windows nt 6.1; wow64; rv:30.0)  Gecko/20100101  firefox/30.0 '      &Nbsp; }   data={        ' email '  : email,         ' password '  : password,        ' domain ':  domain        }   #编码data    postdata = urllib.urlencode (data )          #发起请求    req=urllib2. Request (url,postdata,headers)    #获取源码    print urllib2.urlopen (req). Read ()                          

Turn from: http://zipperary.com/2013/08/16/python-login/

I have posted several crawler code in my blog, which is very handy for downloading pictures in bulk. This kind of crawler is easier to implement. And some sites require users to log in before they can download files, the previous method is not done. Tell me today that using Python to simulate a browser's login process is ready for subsequent login downloads.

In the case of login, one of the additional modules is cookielib, which is used to remember the cookies saved to local after successful login, so that it is easy to cross between the pages of the website.

The first code example:

1
2
3
4
5
6
7 8 9 (
28) (a)
#encoding =utf8
Import urllib import
urllib2
import cookielib

# # #登录页的url
lgurl = ' http:// Mlook.mobi/member/login '

# # #用cookielib模块创建一个对象, and then use the URLLLIB2 module to create a cookie handler
cookie = cookielib. Cookiejar ()
Cookie_handler = Urllib2. Httpcookieprocessor (Cookie)

# # #有些网站反爬虫, where headers the program disguised as a browser
HDS = {' user-agent ': ' mozilla/5.0 (Windows NT 6.1 ) applewebkit/537.36 (khtml, like Gecko) chrome/28.0.1500.72 safari/537.36 '}  

# # #登录需要提交的表单
pstdata = {' Formhash ': ', #填入formhash
	' person[login] ': ', #填入网站的用户名
	' Person[password] ': ', #填入网站密码
	}

dt = Urllib.urlencode (pstdata) #表单数据编码成url识别的格式
req = urllib2. Request (url = lgurl,data = Dt,headers = HDS) #伪装成浏览器, access the page, and post the form data, there is no actual access, just create an object with that feature
opener = Urllib2.build_opener (Cookie_handler) #绑定handler, create a custom opener
response = Opener.open (req) #请求网页, return handle
page = Response.read () #读取并返回网页内容

print page #打印到终端显示

Explain:

I will not provide the username password here. For the form data to be submitted, chrome users can F12-> network-> Fill in the account password and log in-> find post in network ..., see screenshots.

Click "login" to enter the following image interface.


"from data" inside the data is more, usually need user name, password, the remaining data is necessary, need to test. For this website, also need "formhash".

No coding problems under Linux, win if the coding problem should be the terminal support for coding is not in place.

After the login is successful, the Cookie_handler that we create will automatically manage cookies, and if you need to access other pages later in the program, open their URLs with opener.

"user-agent" can also be found by F12.

More detailed and nice description please refer to here

This blog does not focus on the introduction of the principle, the focus is to record this simple block of code, other need to log on to the reptilian imitation.

The purpose of this program is to bulk download Mlook e-books. Now we have a problem:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.