Proxyhandler Processor (Agent setup one)

Source: Internet
Author: User

Using proxy IP, this is the second most common trick for reptiles/anti-reptiles, and is usually best used.

Many sites will detect a certain period of time the number of IP visits (through traffic statistics, system logs, etc.), if the number of visits are not like normal people, it will prohibit this IP access.

So we can set some proxy server, every time to change a proxy, even if the IP is prohibited, can still change IP to continue crawling.

Urllib2 to use a proxy server through Proxyhandler, the following code shows how to use the proxy using a custom opener:

#urllib2_proxy1. py

Import urllib2

# Constructs two agent handler, one has the proxy IP, one does not have the proxy IP
httpproxy_handler = urllib2. Proxyhandler ({"http": "124.88.67.81:80"})
Nullproxy_handler = Urllib2. Proxyhandler ({})

Proxyswitch = True #定义一个代理开关

# Use these proxy handler objects through the Urllib2.build_opener () method to create a custom opener object
# Depending on whether the agent switch is open, use a different agent mode
if Proxyswitch:  
    opener = Urllib2.build_opener (Httpproxy_handler)
else:
    opener = Urllib2.build_opener (nullproxy_handler)

request = Urllib2. Request ("http://www.baidu.com/")

# 1. If so, only the custom proxy is used when the request is sent using the Opener.open () method, and Urlopen () does not use a custom proxy.
response = Opener.open (Request)

# 2. If you write this, you apply the opener to the global, and then all, either Opener.open () or Urlopen (), send the request. will use the custom proxy.
# Urllib2.install_opener (opener)
# response = Urlopen (request)

print response.read ()

Free open proxy access is basically no cost, we can collect these free agents on some proxy websites, and if we can use them after testing, collect them for use on reptiles.

If the proxy IP is enough, you can randomly select an agent to visit the site, just as you would get user-agent.

Import urllib2
import random

proxy_list = [
    {"http": "124.88.67.81:80"},
    {"http": "124.88.67.81:80"} ,
    {"http": "124.88.67.81:80"},
    {"http": "124.88.67.81:80"},
    {"http": "124.88.67.81:80"}
]

# Randomly select an agent proxy
= Random.choice (proxy_list)
# Use the selected agent to build the Agent Processor object
Httpproxy_handler = Urllib2. Proxyhandler (proxy)

opener = Urllib2.build_opener (httpproxy_handler)

request = Urllib2. Request ("http://www.baidu.com/")
response = Opener.open (request)
print response.read ()

However, these free and open agents will generally have a lot of people are in use, and agents have a short life, slow speed, anonymity is not high, HTTP/HTTPS support instability and other shortcomings (free of good goods).

Therefore, professional reptile engineers or crawler companies will use high-quality private agents, these agents usually need to find a dedicated agent to purchase, and then through the user name/password authorization to use (reluctant to children can not get the wolf).

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.