(i) URL Address
URL Address Components
Description
Optional parameters
URL component |
Scheme |
network protocol or download scheme |
Net_loc |
Server location (may contain user information) |
path |
Use (/) split file or CGI app path |
params | TD valign= "Top" width= "308" >
Query |
Connector (&) split a series of key-value pairs |
Fragment |
Specify parts of a specific anchor within a document |
Net_loc Components
User:[email protected]:p ORT
Component |
Description |
User |
User name or Login |
Password |
User password |
Host |
Computer name or address (required) to run the Web server |
Port |
Port number (if not the default 80) |
(ii) Urllib
Here are the main explanations of urllib.request and Urllib.parse.
(iii) urllib.request
Urllib.request
Component |
Description |
Urlopen (url,data=None) |
Opens a URL link and returns a file type Object, just as open opens a file locally in binary read-only mode. URL: Can be a URL string, or it can be a request object Data:url is a POST request, you can specify the data to be passed |
Urlretrieve (url,filename=None) |
Download the file in the URL FileName: File name and path (no specified path is stored in current working directory) |
Urlopen Object Methods
Component |
Description |
Read () |
Read all data |
ReadLine () |
Reading a row of data |
ReadLines () |
Reads all rows, returns as a list |
Fileno () |
Return file handle |
Close () |
Close the URL connection (Close and the 4 methods above are the same as the Open method) |
Info () |
Returns the MIME (multi-target Internet Mail extension) header file. This header file notifies the browser of the type of file returned, and what type of application is available to open. |
Geturl () |
Returns the true URL (for example, if a redirect occurs, you can get the real URL from the final open file) |
GetCode () |
Return HTTP status Code |
1 Importurllib.request2URL ='https://tieba.baidu.com/p/5475267611'3 #Open the URL (just like opening a file with open binary read-only), read all the data using read4HTML =urllib.request.urlopen (URL). Read ()5 Print(type (HTML))6 7Url_file ='https://imgsa.baidu.com/forum/w%3D580/sign=99114e38abec08fa260013af69ef3d4d/ E549b13533fa828bc80c7764f61f4134960a5a85.jpg'8 #download the file in the URL and save it9Urllib.request.urlretrieve (Url_file,'C:\Temp\\1.jpg')Ten One #return Miem header file AHtml_info =urllib.request.urlopen (URL). info () - Print(Html_info)
(iv) Urlib.parse
Urlib.parse
Function |
Description |
Urlparse (URLSTR) |
Resolves a URL to a tuple (scheme= ", netloc=", path= ", params=", query= ", fragment=") |
Urlunparse (Urltup) |
As opposed to urlparse, the URL component (a tuple) is stitched to the full URL |
Urljoin (Base,url) |
Stitching the root domain and URL of base into a full URL Base: function will automatically intercept Net_loc and all previous contents |
1 ImportUrllib.parse2 3URL ='https://www.cnblogs.com/cate/python/'4Newurl ='/cate/ruby/'5 #resolves a url to a tuple (scheme= ", netloc=", path= ", params=", query= ", fragment=")6Urlpar =urllib.parse.urlparse (URL)7 Print('Urlparse Example:', Urlpar)8 #In contrast to Urlparse, the tuple (scheme= ', netloc= ', path= ', params= ', query= ', fragment= ') is spliced to the full URL9URLUNP =urllib.parse.urlunparse (Urlpar)Ten Print('Urlunparse Example:', URLUNP) One #connect the contents of the URL Netloc and the previous section to the Newurl . AUrl_ruby =Urllib.parse.urljoin (Url,newurl) - Print('Urljoin Example:', Url_ruby)
Python notes (13): Urllib module