Python urllib library, pythonurllib

Source: Internet
Author: User

Python urllib library, pythonurllib
Urllib in python2 and python3

Urllib provides an advanced Web communication library that supports basic Web protocols, such as HTTP, FTP, and Gopher. It also supports access to local files.

Specifically, the urllib module uses the preceding protocol to download data from the Internet, lan, and local hosts.

Httplib, ftplib, and gopherlib are not required to use this module unless you need lower-layer functions.

Python 2 contains urlib, urlparse, urllib2, and other content. In Python 3, all these modules are integrated into a single package named urllib.

Urlib and urlib2 are integrated into the urlib. request module, and urlparse is integrated into urllib. parse.

The urlib package in Python 3 also contains submodules such as response, error, and robotparse.

 

URL format

Prot_sch: // net_loc/path; params? Query # frag

Each part of the URL (each component of the Web address)

Prot_sch network protocol or Download Scheme net_loc server location (also containing user information) path use the slash (/) to split the file or CGI application path params optional parameter query connector (&) the split key-Value Pair net_loc can be further split into multiple components, some of which are essential and others are optional: user: passwd @ host: portuser user name or login passwd User Password host run web server computer name or address (required) port number (if not the default 80)

 

The urllib. parse module is called urlparse in python2 and has been renamed urllib. parse in python3.

The urllib. parse module provides some basic functions for processing URL strings. These functions include urlparse (), urlunparse (), and urljoin ().

 

Urlparse () parses urlstr into a 6-tuple (prot_sch, net_loc, path, params, query, frag ):

Syntax: urlparse (urlstr, defProtSch = None, allowFrag = None) >>> urllib. parse. urlparse ("https://www.smelond.com? Cat = 6 ") ParseResult (scheme = 'https', netloc = 'www .smelond.com ', path ='', params = '', query = 'cat = 6 ', fragment = '')

 

Urlunparse () is the opposite of urlpase (). It generates the urltup 6 tuples (prot_sch, net_loc, path, params, query, frag) from the URL processed by urlparse ), concatenate the URL and return:

Syntax: urlunparse (urltup) >>> result = urllib. parse. urlparse ("https://www.smelond.com")> print (result) ParseResult (scheme = 'https', netloc = 'www .smelond.com ', path = '', params = '', query = '', fragment ='') >>> urllib. parse. urlunparse (result) 'https: // www.smelond.com'

 

When we need to process multiple related URLs, we need to use the urljoin () function. For example, a Web page may produce a series of page urls:

Urljoin () gets the root domain name and connects its root path (net_loc and the complete path above it, but does not include the final file) to newurl.

Syntax: urljoin (baseurl, newurl, allowFrag = None) >>> urllib. parse. urljoin ("https://www.smelond.com? Cat = 6 ","? Cat = 7 ") 'https: // www.smelond.com? Cat = 7'> urllib. parse. urljoin ("https://www.smelond.com? Cat = 6 "," abc ") 'https: // www.smelond.com/abc'> urllib. parse. urljoin (" https://www.smelond.com? Cat = 6 ","/test/abc.html ") 'https: // www.smelond.com/test/abc.html' >>> urllib. parse. urljoin ("https://www.smelond.com", "abc.html") 'https: // www.smelond.com/abc.html'

 

Core Function Description in the urllib. parse Module

Urlparse (urlstr, defProSch = None, allowFrag = None) parses urlstr into each component. If no protocol or scheme is specified in urlstr, defProtSch is used; allowFrag determines whether URL fragment urlunparse (urltup) is allowed to combine a tuple of URL data (urltup) into a URL string urljoin (baseurl, newurl, allowFrag = None) combine the root domain name and newurl of the URL into a complete URL. The role of allowFrag is the same as that of urlpase ().

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.