Python urllib library, pythonurllib

Last Update:2018-03-12 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Python urllib library, pythonurllib
Urllib in python2 and python3

Urllib provides an advanced Web communication library that supports basic Web protocols, such as HTTP, FTP, and Gopher. It also supports access to local files.

Specifically, the urllib module uses the preceding protocol to download data from the Internet, lan, and local hosts.

Httplib, ftplib, and gopherlib are not required to use this module unless you need lower-layer functions.

Python 2 contains urlib, urlparse, urllib2, and other content. In Python 3, all these modules are integrated into a single package named urllib.

Urlib and urlib2 are integrated into the urlib. request module, and urlparse is integrated into urllib. parse.

The urlib package in Python 3 also contains submodules such as response, error, and robotparse.

URL format

Prot_sch: // net_loc/path; params? Query # frag

Each part of the URL (each component of the Web address)

Prot_sch network protocol or Download Scheme net_loc server location (also containing user information) path use the slash (/) to split the file or CGI application path params optional parameter query connector (&) the split key-Value Pair net_loc can be further split into multiple components, some of which are essential and others are optional: user: passwd @ host: portuser user name or login passwd User Password host run web server computer name or address (required) port number (if not the default 80)

The urllib. parse module is called urlparse in python2 and has been renamed urllib. parse in python3.

The urllib. parse module provides some basic functions for processing URL strings. These functions include urlparse (), urlunparse (), and urljoin ().

Urlparse () parses urlstr into a 6-tuple (prot_sch, net_loc, path, params, query, frag ):

Syntax: urlparse (urlstr, defProtSch = None, allowFrag = None) >>> urllib. parse. urlparse ("https://www.smelond.com? Cat = 6 ") ParseResult (scheme = 'https', netloc = 'www .smelond.com ', path ='', params = '', query = 'cat = 6 ', fragment = '')

Urlunparse () is the opposite of urlpase (). It generates the urltup 6 tuples (prot_sch, net_loc, path, params, query, frag) from the URL processed by urlparse ), concatenate the URL and return:

Syntax: urlunparse (urltup) >>> result = urllib. parse. urlparse ("https://www.smelond.com")> print (result) ParseResult (scheme = 'https', netloc = 'www .smelond.com ', path = '', params = '', query = '', fragment ='') >>> urllib. parse. urlunparse (result) 'https: // www.smelond.com'

When we need to process multiple related URLs, we need to use the urljoin () function. For example, a Web page may produce a series of page urls:

Urljoin () gets the root domain name and connects its root path (net_loc and the complete path above it, but does not include the final file) to newurl.

Syntax: urljoin (baseurl, newurl, allowFrag = None) >>> urllib. parse. urljoin ("https://www.smelond.com? Cat = 6 ","? Cat = 7 ") 'https: // www.smelond.com? Cat = 7'> urllib. parse. urljoin ("https://www.smelond.com? Cat = 6 "," abc ") 'https: // www.smelond.com/abc'> urllib. parse. urljoin (" https://www.smelond.com? Cat = 6 ","/test/abc.html ") 'https: // www.smelond.com/test/abc.html' >>> urllib. parse. urljoin ("https://www.smelond.com", "abc.html") 'https: // www.smelond.com/abc.html'

Core Function Description in the urllib. parse Module

Urlparse (urlstr, defProSch = None, allowFrag = None) parses urlstr into each component. If no protocol or scheme is specified in urlstr, defProtSch is used; allowFrag determines whether URL fragment urlunparse (urltup) is allowed to combine a tuple of URL data (urltup) into a URL string urljoin (baseurl, newurl, allowFrag = None) combine the root domain name and newurl of the URL into a complete URL. The role of allowFrag is the same as that of urlpase ().

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More