Python urllib Library

Source: Internet
Author: User

The Urllib in Python2 and Python3

Urllib provides an advanced Web communications library that supports basic web protocols such as HTTP, FTP, and Gopher protocols, while also supporting access to local files.

Specifically, the function of the Urllib module is to use the protocol described above to download data from the Internet, local area network, and localhost.

Using this module eliminates the need for httplib, Ftplib, and gopherlib, unless lower-level functionality is required.

Python 2 contains Urlib, Urlparse, URLLIB2, and other content. In Python 3, all of these related modules are integrated into a single package named Urllib.

The contents of Urlib and URLIB2 are integrated into the Urlib.request module, and Urlparse are integrated into the urllib.parse.

The Urlib package in Python 3 also includes response, error, and Robotparse these submodules.

Format of the URL

Prot_sch://net_loc/path;params?query#frag

Each part of the URL (the individual components of the web address)

Prot_sch                           Network protocol or download scheme net_loc                            server location (also contains user information) path                               uses a slash (/) to split the file or CGI application path params optional                             parameter query                              connector ( &) A series of key-value pairs frag The                               specified portion of a specific anchor in a document Net_loc can be further split into multiple components, some of which are required, and others are optional: User:[email protected]:p ortuser                               User name or login passwd                             user password host                               the name or address of the computer running the Web server (required) port port                               number (if it is not the default 80)

The Urllib.parse module is called Urlparse in Python2 and has been renamed Urllib.parse in Python3.

The Urllib.parse module provides some basic functionality for handling URL strings. These features include Urlparse (), Urlunparse (), and Urljoin ().

Urlparse () parses the urlstr into a 6-tuple (Prot_sch, net_loc, path, params, query, Frag):

Syntax: Urlparse (URLSTR, Defprotsch=none, Allowfrag=none) >>> urllib.parse.urlparse ("Https://www.smelond.com? Cat=6 ") Parseresult (scheme= ' https ', netloc= ' www.smelond.com ', path= ', params= ', query= ' cat=6 ', fragment= ')

The function of Urlunparse () is exactly the opposite of Urlpase (), which will generate urltup this 6-tuple (Prot_sch, net_loc, path, params, query, Frag) by Urlparse () processing URL, stitching into UR L and return:

Syntax: Urlunparse (urltup) >>> result = Urllib.parse.urlparse ("https://www.smelond.com") >>> print ( Result) Parseresult (scheme= ' https ', netloc= ' www.smelond.com ', path= ', params= ', query= ', fragment= ') >>> Urllib.parse.urlunparse (Result) ' https://www.smelond.com '

We need to use the Urljoin () function when we need to work with multiple related URLs, for example, a Web page may produce a series of page URLs:

Urljoin () Gets the root domain name and connects its root path (Net_loc and its previous full path, but not the end file) to Newurl.

Syntax: Urljoin (BaseURL, Newurl, Allowfrag=none) >>> urllib.parse.urljoin ("https://www.smelond.com?cat=6", "? Cat=7 ") ' Https://www.smelond.com?cat=7 ' >>> urllib.parse.urljoin (" https://www.smelond.com?cat=6 "," abc ") ' Https://www.smelond.com/abc ' >>> urllib.parse.urljoin ("https://www.smelond.com?cat=6", "/test/abc.html") ' Https://www.smelond.com/test/abc.html ' >>> urllib.parse.urljoin ("https://www.smelond.com", "abc.html") ' Https://www.smelond.com/abc.html '

Description of the core function in the Urllib.parse module

Urlparse (Urlstr,defprosch=none,allowfrag=none)            parses the urlstr into individual components, and if no protocol or scheme is given in the URLSTR, use Defprotsch;allowfrag Determines whether URL fragment Urlunparse (urltup) is allowed                                        to spell a tuple of URL data (urltup) as a URL string urljoin (BaseURL, Newurl, Allowfrag=none)                  will The root domain of the URL and the Newurl of a full Url;allowfrag function and urlpase () the same

Python urllib Library

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.