Baptism soul, practice python (58)-crawler-[reprinted] urllib3 module, pythonurllib3
Urllib31. Overview
Compared with urllib and urlib2, urllib3 has some new functions to implement many things. This module is special and can also exist in python2 and python3 at the same time, but to be honest, it is rarely used.
2. Methods/attributes
3. Common methods/attribute Parsing
Because there are few applications, there are very few related materials. I rarely use them anyway. If you want to use urllib and urlib2, you should directly use the urllib package in python3, or directly use the third-party module requests. When it comes to requests, it is because requests and urllib3 are rarely used. Because urlib3 has some functions, requests are basically available, and urllib3 functions are quite practical, but not a lot of them are used. However, the usage of the urllib3 module is still true. Fortunately, I found that there were blog posts in the blog garden that were more popular than the urllib3 module, so I willDirectly reprintedIf you are interested, you can take a look at it.
Author: Victor Original article link: Portal (if the original author has any objection, please contact me to delete it now) Details:
Urllib3 is a powerful and clear Python library for HTTP clients. Many native Python systems have begun to use urllib3. Urllib3 provides many important features not available in the python Standard Library:
1. Thread Security
2. Connection Pool
3. Client SSL/TLS Verification
4. File segment encoding upload
5. Assist in handling repeated requests and HTTP relocation
6. Supports compression encoding.
7. Support for HTTP and SOCKS proxies
8. 100% Test Coverage Rate
Urllib3 is very powerful, but it is very simple to use:
Installation:
Urllib3 can be installed through pip:
$ Pip install urllib3
You can also download the latest source code on github, decompress it, and install it:
$ Git clone git: // github.com/shazow/urllib3.git
$ Python setup. py install
Use of urllib3:
Request generation ):
First, you must import the urllib3 module:
Then you need a PoolManager instance to generate a request. The instance object processes the connection to the thread pool and all thread security details, and no one needs to perform the following operations:
Create a request using the request () method:
The request () method returns an HTTPResponse object.
You can also use the request () method to add other information to the request, such:
The data items in a request can include:
Headers:
In the request () method, a dictionary type (dictionary) can be defined and passed as the headers parameter:
Query parameters:
For GET, HEAD, and DELETE requests, you can simply define a dictionary type as the fields parameter to pass in:
For POST and PUT requests, You need to manually encode the incoming data, and then add it after the URL:
Form data:
For PUT and POST requests, urllib3 automatically encodes the field parameters of the dictionary type into the table type.
JSON:
When initiating a request, you can define the body parameter and the Content-Type parameter of headers to send a compiled JSON data:
Files & binary data:
To upload a file using multipart/form-data encoding, you can use the same method as passing in Form data, and define the file as a tuples (file_name, file_data ):
Filename is not strictly defined, but is recommended to make it more like a browser. At the same time, you can add another data to the tuples to define the MIME type of the file:
To send raw binary data, define it as the body parameter. We recommend that you set the Content-Type parameter of the header:
Timeout:
You can use timeout to control the request running time. In some simple applications, you can set the timeout parameter to a floating point number:
For more precise control, you can use a Timeout instance to separate the connected timeout from the read timeout:
If you want all requests to follow a timeout, you can define the timeout parameter in PoolManager:
Or
When a timeout is defined again in a specific request, the timeout at the PoolManager layer is overwritten.
Request retry (retrying requests ):
Urllib3 can automatically retry idempotent requests in the same principle as handles redirect. You can control retry by setting retries parameters. Urllib3 performs three request retries by default and changes the direction three times.
Define an integer for the retries parameter to change the number of request retries:
To disable retrying request and redirect, you only need to define retries as False:
Disable redirect but retrying request. Define the redirect parameter as False:
For more precise control, you can use a retry instance, through which you can more finely control request retry.
For example, perform three request retries, but only perform two redirection:
If you want all requests to follow a retry policy, you can define the retry parameter in PoolManager:
Or
When a retry is defined in a specific request, the retry at the PoolManager layer is overwritten.