The HTML may contain the label of the link we want to parse:
Examples:
-
1 Span style= "color: #0000ff;" >< framesetcols = "25%,75%" Span style= "color: #0000ff;" >> < frame src = "frame_a.htm" > < frame src = "frame_b.htm" > 4 </ frameset >
<href= "url">Link text</a>
The images and audio at the beginning of the site must also contain links, but these are not the links we want to parse. None of these starts with <a>, such as displaying a picture on the site:
<src= "url"/>
Of course, the links we parse from the <a> tags will inevitably have links to pictures, audio, etc.Resources, we need to type judgment on the type of link.
url-uniform Resource LocatorA Uniform Resource Locator (URL) is used to locate documents (or other data) on the World Wide Web. URLs, such as http://www.w3school.com.cn/html/index.asp, adhere to the following grammatical rules:
Scheme://host.domain:port/path/filename
Explain:
- Scheme-Defines the type of Internet service. The most common type is the HTTP
- Host-Defines the domain host (the default host for HTTP is www)
- Domain-Define Internet domain names, such as w3school.com.cn
- :p ORT-Defines the port number on the host (the default port number for HTTP is 80)
- Path-Defines the path on the server (if omitted, the document must be in the root directory of the Web site).
- FileName-Defines the name of the document/resource
URL schemes
Here are some of the most popular scheme:
Scheme |
Access |
used for ... |
http |
Hypertext Transfer Protocol |
A normal web page that starts with http://. Not encrypted. |
Https |
Secure Hypertext Transfer Protocol |
Secure Web page. Encrypt all information exchanges. |
Ftp |
File Transfer Protocol |
Used to download or upload files to a Web site. |
File |
|
The files on your computer. |
From for notes (Wiz)
Parsing links Related Knowledge