Urllib2 provides a wide range of URL-based resource processing methods ~ You can use handler to implement various functions ~ Likewise, automatic redirect and cookie analysis and acquisition are implemented based on status code (redirect Based on HTTP status code is also implemented in urllib. fancyurlopener ~)
The step-by-step code is as follows:
Import urllib2 as ul2, cookielib as Cl, urllib as UL
Cj = Cl. cookiejar ()
Opener = ul2.build _
Python write crawlers use the urllib2 method, pythonurllib2
Use urllib2 for python write Crawlers
The Usage Details of urllib2 are sorted out.
1. Proxy Settings
By default, urllib2 uses the environment variable http_proxy to set HTTP Proxy.
If you want to explicitly control the Proxy in the program without being affected by environment variables, you can use the Proxy.
Create test14 to implement a simple proxy Demo:
import urllib2 enable_proxy = True proxy_handler = urllib2.ProxyHand
Just use, this article is well written, turn around to collect.Reprinted from Tao Road | usage details of the Python standard library urllib2There are many useful tool classes in the Python standard library, but it is not clear how to use the detail description on the standard library document, such as URLLIB2, which is the HTTP client library. Here is a summary of some of the URLLIB2 library usage details.
1 Proxy Settings
2 Timeout Settings
3 Adding a specific Header to the HT
There are a number of useful tools classes in the Python standard library, but when used specifically, the standard library documentation does not describe the details of the usage, such as URLLIB2 this HTTP client library. Here's a summary of some of the URLLIB2 's usage details.
setting of 1.Proxy2.Timeout settings3. Add a specific Header to the HTTP Request4.Redirect5.Cookie6. Use the Put and DELETE method of HTTP7. Get HTTP Return code8.Debug Log
Settings for Proxy
URLLIB2 uses environmen
contents of the first page as an example to detail the use of cookies, the following is the example given in the document, we have to change this example to achieve the functionality we want
Import Cookielib, urllib2
CJ = Cookielib. Cookiejar ()
opener = Urllib2.build_opener (urllib2. Httpcookieprocessor (CJ))
r = Opener.open ("http://example.com/")
#coding: utf-8
Import Urllib2,urllib
import cookielib
url = R ' http://www.renren.com/ajaxL
bookoriginal = new Workbook (); Bookoriginal.loadfromfile ("Information sheet. xlsx"); Worksheet sheet = bookoriginal.worksheets[0];Step 2 : Create a new Workbook object NewBook1 and add an empty worksheet to it.Workbook NewBook1 = new Workbook (); newbook1.createemptysheets (1);Step 3 : Gets the first sheet of NewBook1, and then gets the data from the second row to the eighth row (Sales department) on the source Excel worksheet, and copies them to t
Format is changed to NextEnd Sub
RmDir-delete a folder
Sub dfksdlf () For I = 1 To 20 Step 2 RmDir "C: \ Users \ McDelfino \ Desktop \ exercises \ Folder-" Format (I, "00") NextEnd Sub
Kill-delete an object
Sub dfksdlf () Kill "C: \ Users \ McDelfino \ Desktop \ exercises \ 1.txt" End Sub
Delete the 1.txt file!
Sub dfksdlf () Kill "C: \ Users \ McDelfino \ Desktop \ exercises \ *. *" End Sub
Delete all files with extension!
FileCopy-copy an object
Copy a file and change the file name
1. Opener and handler concepts of URLLIB2 1.1Openers:When you get a URL you use a opener (a urllib2. Openerdirector instances). Under normal circumstances, we use the default opener: through Urlopen. But you can create a personality openers. You can use Build_opener to create opener objects. Generally available for app
Opener: parent indicates the parent window. For example, if A page A uses iframe or frame to call page B, the window where page A is located is the parent of page B, the following describes how to use it in detail. If you are interested, refer
The Code is as follows:
$ ("# SaveInfo"). show ();SetTimeout ('$ ("# saveInfo"). hide ();', 3000 );If (opener ! Opener
The simple introduction to urllib2 is mentioned earlier. The following describes how to use urllib2.
1. Proxy Settings
By default, urllib2 uses the environment variable http_proxy to set HTTP proxy.
If you want to explicitly control the proxy in the program without being affected by environment variables, you can use the proxy.
Create test14 to implement a simple proxy Demo:
import urllib2enable_proxy = Trueproxy_handler = urllib2.ProxyHandler({"http" : 'http://some-proxy.com:8080'})null_proxy
iframe. the return value of open/window. one of the frames; msg is the message to be sent, string type; targetOrigin is used to restrict the uri of the receiverWindow, including the primary domain name and port. "*" indicates no limit, however, to ensure security, you still need to set the settings to prevent messages from being sent to malicious websites. If the URI of targetOrigin does not match the receiverWindow, the system will discard sending messages.B. The receiver obtains the message t
the real URL that is obtained, because Urlopen (or opener object) may have redirects. The URL you get may be different from the request URL.Info-The Dictionary object that returns the object that describes the page condition that was obtained. Typically, the server sends a specific header headers. It is now httplib. Httpmessage instance.8, did not read-- through Urlopen, but you can create the personality of the openers,openers using the processor ha
Python uses a proxy to access the server there are 3 main steps:1. Create a proxy processor Proxyhandler:Proxy_support = Urllib.request.ProxyHandler (), Proxyhandler is a class whose argument is a dictionary: {' type ': ' Proxy IP: port number '}What is handler? Handler is also known as a processor, and each handlers knows how to open URLs through a specific protocol, or how to handle various aspects of URL opening, such as HTTP redirection or HTTP cookies.2. Customize and create a opener:Opener
14
15
16
The result of the operation is the same as the previous method.Iv. Use of IP proxies1. Why Use IP ProxyThe User agent has been set up, but should also consider a problem, the program is running fast, if we use a crawler to crawl things on the site, a fixed IP access will be very high, this does not meet the standards of human operation, because the human operation is not possible within a few MS, For such a frequent visit. So some sites will set a threshold for IP acce
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.