Introduction of Urlparse Module
The Urlparse module mainly splits the URL into 6 parts and returns the tuple. And you can make the split part a URL again. The main functions are Urljoin, Urlsplit, Urlunsplit, Urlparse and so on.
Second, Urljoin function use
Urljoin is primarily a concatenation URL, which takes base as its base address and is then combined with a relative address in the URL to form an absolute URL address. The function urljoin is particularly useful when handling several files at the same location by appending a new file name to the URL base address. It should be noted that if the base address is not a character/end, then the rightmost part of the URL base site will be replaced by this relative path. If you want to keep the end directory in this path, make sure that the URL base address is a character/end.
Enter the code: The import Urlparse#urljoin function is the merge domain name and the relative path of urljoin = Urlparse.urljoin (' http://www.sina.cn/cc ', ' file/down.php ') print Urljoinurljoin1 = Urlparse.urljoin (' http://www.sina.cn/cc/', ' file/down.php ') print urljoin1
|
Code Run Result: C:\Python27\python.exe c:/users/lee/desktop/d/pycharmprojects/untitled/test.py http://www.sina.cn/file/down.php http://www.sina.cn/cc/file/down.php
|
Third, urlparse function and urlsplit function use
The main analysis is urlstring, which returns a tuple containing 5 string items: protocol, location, path, query, fragment. When Allow_fragments is false, the tuple's group after a project is always empty, regardless of whether URLString has a fragment, omitting the item is also empty. Urlsplit () and Urlparse () almost
Enter the code: Import Urlparse
url = ' http://www.baidu.com/good/index.php?id=18 '
#urlsplit函数是把一个url查分出对应部分 result = Urlparse.urlsplit (URL) Print result #显示协议 Print Result.scheme #显示域名 Print Result.netloc #显示相对路径 Print Result.path #显示查询参数 Print Result.query
|
Code Run Result: C:\Python27\python.exe c:/users/lee/desktop/d/pycharmprojects/untitled/test.py Splitresult (scheme= ' http ', netloc= ' www.baidu.com ', path= '/good/index.php ', query= ' id=18 ', fragment= ') http Www.baidu.com /good/index.php Id=18 |
Urlparse module (Python module)