Python uses a proxy to access the server there are 3 main steps:1. Create a proxy processor Proxyhandler:Proxy_support = Urllib.request.ProxyHandler (), Proxyhandler is a class whose argument is a dictionary: {' type ': ' Proxy IP: port number '}What is handler? Handler is also known as a processor, and each handlers knows how to open URLs through a specific protocol, or how to handle various aspects of URL opening, such as HTTP redirection or HTTP cookies.2. Customize and create a opener:Opener
14
15
16
The result of the operation is the same as the previous method.Iv. Use of IP proxies1. Why Use IP ProxyThe User agent has been set up, but should also consider a problem, the program is running fast, if we use a crawler to crawl things on the site, a fixed IP access will be very high, this does not meet the standards of human operation, because the human operation is not possible within a few MS, For such a frequent visit. So some sites will set a threshold for IP acce
There are three main ways to access a Web page using python: Urllib, Urllib2, HttplibUrllib simple, relatively weak function, Httplib simple and powerful, but does not support session1. The simplest page access (get the server-side response package)Res=urllib2.urlopen (URL)Print Res.read ()2. Plus the data to get or postdata={"name": "Hank", "passwd": "HJZ"}Urllib2.urlopen (URL, urllib.urlencode (data))3. Add the HTTP headerheader={"user-agent": "mozilla-firefox5.0"}Urllib2.urlopen (URL, urllib.
In front of the Urllib2 simple introduction, the following collation of a part of the use of urllib2 details.
setting of 1.Proxy
URLLIB2 uses environment variable HTTP_PROXY to set HTTP proxy by default.If you want to explicitly control the proxy in your program without being affected by the environment variables, you can use the proxy.Create a new test14 to implement a simple proxy demo:
Copy Code code as follows:
Import Urllib2
Enable_proxy = True
Proxy_handler = Urllib2
://www.111cn.net/")Try: Response=Urlopen (req)exceptUrlerror as E:ifHasattr (E,'reason'):Print('We failed to reach a server.')Print('Reason:', E.reason)elifHasattr (E,'Code'):Print('The server couldn'T fulfill the request.')Print('Error Code:', E.code)Else:Print("good!")Print(Response.read (). Decode ("UTF8"))8. HTTP Authentication#!/usr/bin/env Python3Importurllib.request#Create a password managerPassword_mgr =Urllib.request.HTTPPasswordMgrWithDefaultRealm ()#ADD the username and password.#If W
disconnected if the cookie is lost.
Set cookie persistence in python.
# Cookie set # used to keep the session cj = cookielib. LWPCookieJar () cookie_support = encrypt (cj) opener = urllib2.build _ opener (cookie_support, urllib2.HTTPHandler) urllib2.install _ opener)
The following is a library file that summarizes the above knowledge points for ease of use:
#
website login page, including the login url, POST request data. The Http header uses urllib2.urlopen to send the request and receive the Response of the WEB server. First, check the source code of the login page.
When urllib is used to process a url, it actually works through the urllib2.OpenerDirector instance. It calls resources for various operations, such as using protocols, opening URLs, and processing cookies. The urlopen method uses the default ope
cookie persistence in python.
# Cookie set # used to keep the session cj = cookielib. LWPCookieJar () cookie_support = Encrypt (cj) opener = urllib2.build _ opener (cookie_support, urllib2.HTTPHandler) urllib2.install _ opener)
The following is a library file that summarizes the above knowledge points for ease of use:
# Filename: analogop. py #! /Usr/bin/pyth
Urlerror as E:If Hasattr (E, ' reason '):Print (' We failed to reach a server. ')Print (' Reason: ', E.reason)Elif hasattr (E, ' Code '):Print (' The server couldn ' t fulfill the request. ')Print (' Error code: ', E.code)ElsePrint ("good!")Print (Response.read (). Decode ("UTF8"))8. HTTP Authentication#! /usr/bin/env Python3Import Urllib.request# Create a password managerPassword_mgr = Urllib.request.HTTPPasswordMgrWithDefaultRealm ()# ADD the username and password.# If We knew the realm, we c
The change always refers to the top-level browser window of the split window. If you plan to execute commands from the highest level of the split window, you can use the top variable.
Parent
This variable refers to the parent window that contains the current split window. If you have a split window in one window and a split window in one of the split Windows, the 2nd-tier split window can refer to the parent-partition window that contains it with the master variable.
Using proxy IP, this is the second most common trick for reptiles/anti-reptiles, and is usually best used.
Many sites will detect a certain period of time the number of IP visits (through traffic statistics, system logs, etc.), if the number of visits are not like normal people, it will prohibit this IP access.
So we can set some proxy server, every time to change a proxy, even if the IP is prohibited, can still change IP to continue crawling.
Urllib2 to use a proxy server through Proxyhandler,
http://blog.csdn.net/pleasecallmewhy/article/details/8925978
In front of the Urllib2 simple introduction, the following collation of a part of the use of urllib2 details.
setting of 1.Proxy
URLLIB2 uses environment variable HTTP_PROXY to set HTTP proxy by default.
If you want to explicitly control the proxy in your program without being affected by the environment variables, you can use the proxy.
New test14 to implement a simple proxy demo: [python] view plain copy import urllib2 enable_proxy
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.