Python3 Crawler 4--Parse link

Source: Internet
Author: User
Tags urlencode

1.urlparse ()

Belongs to Urllib.parse

In Urlparse World, a standard URL link format is as follows

Scheme://nrtlooc/path;paramters?query#fragment

So, a url=‘http://www.baidu.com/index.html;user?id=5#comment‘

If we use Urlparse, we can be divided into 6 parts.

(scheme= ' http ', netloc= ' www.baidu.com ', path= ' index.html ' paramters= ' user '. query= ' id=5 ', fragment= ' comment ')

Here's how:

Res=urlparse ( ‘https://www.baidu.com/baidu?wd=query&tn=monline_dg&ie=utf-8‘ )

Print (RES)

Urlparse with parameters is the method of use

Res=urlparse (urlstring,scheme= ", allow_fragment=true)

Scheme is the default protocol, and if URLString does not have an agreement, use the Protocol in scheme and, if so, still use the URLString protocol

Allow_fragment is whether to ignore fragment, if False,fragment is parsed as part of path, paramenters, or query

2,urlunparse ()

Belongs to Urllib.parse

As its name implies, Ulrunparse () is the inverse process of urlparse ()

For example: data=[' http ', ' www.baidu.com ', ' index.html ', ' user ', ' a=6 ', ' comment ']

Print (Urlunparse (data))

This completes the construction of the urlstring.

3urlsplit ()

From Urllib.parse import Urlsplit

Similar to Urlparse, but urlsplict divides urlstirng into 5 parts, of which less paramters

Res=urlsplict (" http://www.baidu.com/index.html;user?id=5#comment )

Print (RES)

4urlunsplit ()

Usage similar to Urlunparse ()

5urljoin ()

Belongs to Urllib.parse

Urljoin () is also a way to generate urlstring, which is to provide two links, respectively Base_url, and new links, analyze the Scheme,netloc,path in Base_url three parts, and then supplement the missing part of the new link , if there is a new link inside, then do not add, do not mention, and finally return to the new link, for example

Print (Urljoin (' http://www.baidu.com ', ' wd=query&tn=monline_dg&ie=utf-8 '))

The return result is:

Http://www.baidu.com/wd=query&tn=monline_dg&ie=utf-8
6urlencode ()

From Urllib,parse import UrlEncode

You can convert a dictionary type to a URL parameter for example,

param={' name ': ' Lihua ', ' age ': ' 23 '}

Base_url= ' http://www.baidu.com '

Url=base_url+urlencode (param)

Print (URL)

7parse_qs ()

Parse_qs () is the inverse process of parse_encode () (Why the difference between the names is so great that I cannot solve them)

From Urllib.parse import Parse_qs

query= ' Wd=query&tn=monline_dg&ie=utf-8 '

Print (Parse_qs (query))

The output is: {' tn ': [' MONLINE_DG '], ' wd ': [' query '], ' ie ': [' Utf-8 ']}
So the conversion is called a dictionary type.

8PARS_QSL ()

From urllib.pase import PARSE_QSL: Convert a parameter to a list of tuples

query= ' Wd=query&tn=monline_dg&ie=utf-8 '

Print (PARSE_QSL (query))

Output results: [(' WD ', ' query '), (' TN ', ' MONLINE_DG '), (' IE ', ' utf-8 ')]

9quote

The quote () method can convert content to URL encoding format, sometimes URLs with Chinese may cause garbled characters, which requires quote

From Urllib. Parse Import Quote

Keyword= ' Beauty '

Url= ' https://www.baidu.com/s?wd= ' +quote (keyword)

Print (URL)

Output Result: https://www.baidu.com/s?wd=%E7%BE%8E%E5%A5%B3

10unquote ()

Decoding a URL

From Urllib.parse Import unquote

Url= ' https://www.baidu.com/s?wd=%E7%BE%8E%E5%A5%B3 '

Print (unquote (URL))

Output Result: https://www.baidu.com/s?wd= Beauty
You can implement decoding

Python3 Crawler 4--Parse link

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.