Urlparse module is mainly used to resolve the URL parameters in the URL according to a certain format for splitting or splicing
1.urlparse.urlparse
Divides the URL into 6 parts, returning a tuple containing 6 string items: protocol, location, path, parameter, query, fragment.
Import Urlparseurl_change = Urlparse.urlparse (' https://i.cnblogs.com/EditPosts.aspx?opt=1 ') print Url_change
The output is:
Parseresult (scheme= ' https ', netloc= ' i.cnblogs.com ', path= '/editposts.aspx ', params= ', query= ' opt=1 ', fragment= ')
Where scheme is the protocol Netloc is the domain name server path relative path params is the parameter, query is the condition
Urlparse.parse_qs (Urlparse.urlparse (URL). query)
This is an item urlparse.urlparse (URL) in the meta-ancestor that gets the Urlparse partition. Query gets the criteria
There are several implementations of PARSE_QS
Urlparse.parse_qs Return Dictionary
URLPARSE.PARSE_QSL return List
2. Urlparse.urlsplit
Similar to Urlparse, divides the URL into 5 parts and returns a tuple containing 5 string items: protocol, location, path, query, fragment.
Import= urlparse.urlsplit ('https://i.cnblogs.com/EditPosts.aspx?opt=1' )print Url_change
Splitresult (scheme= ' https ', netloc= ' i.cnblogs.com ', path= '/editposts.aspx ', query= ' opt=1 ', fragment= ')
Where scheme is the protocol Netloc is the domain name server path relative path query is the condition
3.urlparse.urljoin
The relative address is combined into a single URL, there is no limit to the input, the beginning must be http://, otherwise the front will not be combined.
Import= urlparse.urljoin ('https://baidu.com/ssss/','88888 ')print new_url
Output https://baidu.com/ssss/88888
If the input error message such as New_url = Urlparse.urljoin (' 122 ', ' 88888 ') does not merge the two outputs ' 88888 '
Last point Urlparse This module has been renamed to Urllib.parse in Python 3.0
Official Document Address http://docs.python.org/library/urlparse.html
Introduction to the Urlparse module in Python