Proxyhandler Processor (Agent Setup II)

Source: Internet
Author: User
Tags http cookie urlencode
Httppasswordmgrwithdefaultrealm ()

The Httppasswordmgrwithdefaultrealm () class creates a password management object that holds the user name and password associated with the HTTP request, mainly applying two scenarios: Verifying the user name and password of the proxy authorization (Proxybasicauthhandler ()) Verify the user name and password (Httpbasicauthhandler ()) of the Web client proxybasicauthhandler (proxy authorization authentication)

If we use the previous code to use the private proxy, we will report an HTTP 407 error, indicating that the agent is not authenticated:

Urllib2. Httperror:http Error 407:proxy Authentication Required

So we need to rewrite the code by: Httppasswordmgrwithdefaultrealm (): To save the user password for the private Agent Proxybasicauthhandler (): To handle authentication of the agent.

#urllib2_proxy2. py

Import urllib2
import urllib

# Private Agent Authorized account
user = "Mr_mao_hacker"
# Private Agent-Authorized password
passwd = "sffqry9r"
# Private proxy IP
proxyserver = "61.158.163.130:16816"

# 1. Build a password management object that holds the user name and password to be processed
passwdmgr = urllib2. Httppasswordmgrwithdefaultrealm ()

# 2. Add account information, the first parameter realm is the domain information related to the remote server, generally no one cares that it is write none, the following three parameters are Proxy server, username, password
Passwdmgr.add_password (None, ProxyServer, User, passwd)

# 3. Build a proxy base user name/password Authentication Proxybasicauthhandler Processor Object , the parameter is the password management object created
#   Note that the normal Proxyhandler class is no longer used here
Proxyauth_handler = urllib2. Proxybasicauthhandler (Passwdmgr)

# 4. Use these proxy handler objects with the Build_opener () method to create a custom opener object that includes the constructed Proxy_handler and Proxyauth_handler
opener = Urllib2.build_opener (Proxyauth_handler)

# 5. Construction Request Request
= Urllib2 . Request ("http://www.baidu.com/")

# 6. Use custom opener to send requests
response = Opener.open (Request)

# 7. Print response content
Print Response.read ()
Httpbasicauthhandler Processor (Web Client licensing authentication)

Some Web servers, including Http/ftp, and so on, require user authentication, and the crawler directly accesses the HTTP 401 error, which indicates an unauthorized access status:

Urllib2. Httperror:http Error 401:unauthorized

If we have a username and password for the client, we can access the crawl in the following ways:

Import urllib
import urllib2

# username User
= "test"
# password
passwd = "123456"
# Web server IP
webserver = "http://192.168.199.107"

# 1. Build a password management object that holds the username and password to be processed
passwdmgr = urllib2. Httppasswordmgrwithdefaultrealm ()

# 2. Add account information, the first parameter realm is the domain information related to the remote server, generally no one cares that it is write none, the following three parameters are the Web server, user name, password
Passwdmgr.add_password (None, webserver, user, passwd)

# 3. Construct an HTTP base user name/password Authentication Httpbasicauthhandler Processor object, parameter is the password management object created
Httpauth_handler = urllib2. Httpbasicauthhandler (Passwdmgr)

# 4. Use these proxy handler objects with the Build_opener () method to create a custom opener object, including the constructed Proxy_handler
opener = Urllib2.build_opener (Httpauth_handler)

# 5. You can choose to define opener as global by Install_opener () method opener
Urllib2.install_opener (opener)

# 6. Build the Request object
request = Urllib2. Request ("http://192.168.199.107")

# 7. After defining opener as global opener, you can send requests directly using Urlopen ()
response = Urllib2.urlopen ( Request)

# 8. Print response content
Response.read ()
Cookies

A cookie is a text file stored in a user's browser for certain Web servers to identify users and perform session tracking, and cookies can keep login information to the user's next conversation with the server. Cookie Principle

HTTP is a stateless, connection-oriented protocol, and in order to remain connected, a cookie mechanism cookie is an attribute in the HTTP message header, including:

Cookie name (name) cookie (
value) cookie
expiration Time (expires/max-age)
Cookie Action Path (path)
The name of the cookie (domain),
using cookies for secure connection (secure). The

first two parameters are necessary for the cookie application, and also include the cookie size (size, different browsers vary the number of cookies and the size limit).

Cookies are made up of variable names and values, and according to Netscape, the cookie format is as follows:

Set-cookie:name=value;expires=date;path=path;domain=domain_name;secure Cookie Application

Cookies in reptiles the most typical application is to determine whether registered users have logged on to the site, users may be prompted, whether the next time you enter this site to retain user information to simplify the login procedures.

# Get a cookie with login information simulate login import URLLIB2 # 1. Build a logged in user's headers information headers = {"Host": "Www.renren.com", "Connection": "Keep-alive", "Upgrade-insecure-reque STS ": 1", "user-agent": "mozilla/5.0" (Windows NT 10.0; Win64; x64) applewebkit/537.36 (khtml, like Gecko) chrome/54.0.2840.99 safari/537.36 ", Accept": "Text/html,application/xhtml 
    +xml,application/xml;q=0.9,image/webp,*/*;q=0.8 "," Accept-language ":" zh-cn,zh;q=0.8,en;q=0.6 ", # Easy to read at the end, indicating that compressed files are not supported # Accept-encoding:gzip, deflate, SDCH, # Focus: This cookie is a cookie that holds the password without having to log on again, and this cookie records the username, password (usually via RAS encryption) "C Ookie ":" ANONYMID=IXRNA3FYSUFNWV; DEPOVINCE=GW; _r01_=1; JSESSIONID=ABCMADHEDQILM7RIY5IMV; jebe_key=f6fb270b-d06d-42e6-8b53-e67c3156aa7e%7cc13c37f53bca9e1e7132d4b58ce00fa3%7c1484060607478%7c1% 7c1484060607173; jebecookies=26fb58d1-cbe7-4fc3-a4ad-592233d1b42e| | | | |; ick_login=1f2b895d-34c7-4a1d-afb7-d84666fad409; _de=bf09ee3a28ded52e6b65f6a4705d973f1383380866d39ff5; p=99e54330ba9f910b02e6b08058f780479;ap=327550029; first_login_flag=1; ln_uact=mr_mao_hacker@163.com; ln_hurl=http://hdn.xnimg.cn/photos/hdn521/20140529/1055/h_main_9a3z_e0c300019f6a195a.jpg; t=214ca9a28f70ca6aa0801404dda4f6789; societyguester=214ca9a28f70ca6aa0801404dda4f6789; id=327550029; XNSID=745033C5; ver=7.0; Loginfrom=syshome "} # 2. Build the Request object Urllib2 through the header information in the headers (mainly cookie information). Request ("http://www.renren.com/", headers = headers) # 3. Direct access to Renren home page, the server will be based on headers header information (mainly cookie information) to determine that this is a logged-on user, and return the corresponding page response = Urllib2.urlopen (Request) # 4.
 Print the response content Print Response.read ()

But this is too complicated, we first need to log in to the account in the browser, and set the save password, and by grasping the packet to get the cookie, there is a more convenient way to do it. Cookielib Libraries and httpcookieprocessor processors

Cookies are processed in Python and are typically used in conjunction with the Httpcookieprocessor processor class of the Cookielib module and the Urllib2 module.

Cookielib module: The primary role is to provide objects for storing cookies

Httpcookieprocessor processors: The primary role is to process these cookie objects and build handler objects. Cookielib Library

The main objects of this module are Cookiejar, Filecookiejar, Mozillacookiejar, Lwpcookiejar.

Cookiejar: Manages HTTP cookie values, stores cookies generated by HTTP requests, and objects that add cookies to outgoing HTTP requests. The entire cookie is stored in memory and the cookie is lost after the Cookiejar instance has been garbage collected.

Filecookiejar (Filename,delayload=none,policy=none): derived from Cookiejar, used to create Filecookiejar instances, retrieve cookie information, and store cookies in a file. FileName is the name of the file where the cookie is stored. Delayload is true to support deferred access to files that read files or store data in files only when they are needed.

Mozillacookiejar (Filename,delayload=none,policy=none): derived from Filecookiejar, created with Mozilla browser Cookies.txt-compatible Filecookiejar instance.

Lwpcookiejar (Filename,delayload=none,policy=none): derived from Filecookiejar, creating set-cookie3 with Libwww-perl standard Filecookiejar instance with file format compatibility.

In fact, in most cases, we only use Cookiejar () and if we need to interact with local files, we use Mozillacookjar () or Lwpcookiejar ()

Let's do a few cases: 1 Get the cookie and save it to the Cookiejar () object

# urllib2_cookielibtest1.py

Import urllib2
import cookielib

# Build a Cookiejar object instance to save cookies
Cookiejar = Cookielib. Cookiejar ()

# uses Httpcookieprocessor () to create a cookie processor object with the parameter Cookiejar () object
handler=urllib2. Httpcookieprocessor (Cookiejar)

# builds opener
opener = Urllib2.build_opener (handler) # 4 through Build_opener ()

. A Get method to access the page, after which the cookie is automatically saved to Cookiejar
opener.open ("http://www.baidu.com")

# # can print saved cookies in standard format
cookiestr = "" For
item in Cookiejar:
    cookiestr = cookiestr + item.name + "=" + Item.value + ";"

# # to the last semicolon
print cookiestr[:-1]

We use the above method to save cookies to the Cookiejar object, and then print out the value of the cookie, that is, visit the Baidu home page cookie value.

The results of the operation are as follows:

baiduid=4327a58e63a92b73ff7a297fb3b2b4d0:fg=1; bidupsid=4327a58e63a92b73ff7a297fb3b2b4d0; h_ps_pssid=1429_21115_17001_21454_21409_21554_21398; pstm=1480815736; bdsvrtm=0; Bd_home=0
2. Visit the website to get cookies and save the cookies in the cookie file
# urllib2_cookielibtest2.py

Import cookielib
import urllib2

# Save cookie Local Disk filename
filename = ' Cookie.txt '

# declares a Mozillacookiejar (with Save implementation) object instance to save the cookie and then writes to the file
Cookiejar = cookielib. Mozillacookiejar (filename)

# uses Httpcookieprocessor () to create a cookie processor object, with the parameter Cookiejar () object
handler = Urllib2. Httpcookieprocessor (Cookiejar)

# to build opener
opener = Urllib2.build_opener (handler)

by Build_opener () # Create a request with the same principle as urllib2 urlopen
response = Opener.open ("http://www.baidu.com")

# Save cookies to local file
Cookiejar.save ()
3. Obtain cookies from the file as part of the request to visit
# urllib2_cookielibtest2.py

Import cookielib
import urllib2

# Create MOZILLACOOKIEJAR (with load Implementation) instance object
Cookiejar = Cookielib. Mozillacookiejar ()

# reads the cookie content from the file to the variable
cookie.load (' cookie.txt ')

use Httpcookieprocessor () To create a cookie processor object, the parameter is Cookiejar () object
handler = Urllib2. Httpcookieprocessor (Cookiejar)

# to build opener
opener = Urllib2.build_opener (handler)

by Build_opener () Response = Opener.open ("http://www.baidu.com")
Log on to Renren using Cookielib and post
Import urllib Import urllib2 import cookielib # 1. Build a Cookiejar object instance to save the cookie cookie = cookielib. Cookiejar () # 2. Use Httpcookieprocessor () to create a cookie processor object with the parameter Cookiejar () object cookie_handler = Urllib2. Httpcookieprocessor (Cookie) # 3. Build Opener opener = Urllib2.build_opener (Cookie_handler) # 4 by Build_opener (). Addheaders accepts a list in which each element is a ganso of headers information, opener will be accompanied by headers information opener.addheaders = [("User-agent", "mozilla/5.0" ( Windows NT 10.0; Win64; x64) applewebkit/537.36 (khtml, like Gecko) chrome/54.0.2840.99 safari/537.36 ")] # 5. Need to login account and password data = {"Email": "mr_mao_hacker@163.com", "Password": "Alaxxxxxime"} # 6. Via UrlEncode () transcoding postdata = Urllib.urlencode (data) # 7. Build the Request object, containing the username and password to be sent, request = Urllib2. Request ("http://www.renren.com/PLogin.do", data = postdata) # 8. Send this request through opener and get the cookie value after login, Opener.open (Request) # 9.  
Opener contains the user login cookie value, you can directly access those who are logged in before they can access the page response = Opener.open ("Http://www.renren.com/410043129/profile")
# 10. Print the response content Print Response.read ()

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.