1. Concept
#1: Uniform Resource Identifier (URI, Uniform Resource Identifier): Uniquely identifies and locates information resources worldwide.
#2: Uniform Resource Locator (Url,uniform Resource Locator): Describes a specific location for a resource on a specific server. A subset of the URI.
#3: Uniform Resource Name (urn,uniform Resource name): A subset of the URI used as a unique name for a specific content. 2.URL Syntax
Most of the URL syntax is based on this 9-part general-purpose format:
<scheme>://<user>:<password>@
Generic URL component:
component |
description |
default value |
Scheme (scheme) |
The protocol to use when accessing the server to obtain resources |
none |
User (user) |
Some scenarios require a user name when accessing resources |
anonymous |
Password (password) |
password may be included after the user name |
<E-mail> address |
td> Host (host)
resource hosting server hostname or dot IP address |
none |
Port (port) |
Resource host server is listening The port. Default |
Scenario specific |
Path (path) |
local name of resource on server |
none |
parameter (params) |
Some scenarios use this component to specify input parameters. The parameter is a name/value pair. Use; split |
no |
Query (query) |
Some scenarios use this component to pass parameters to activate applications such as databases, bulletin boards, search engines, and other Internet gateways |
None |
fragment (Frag) |
The name of a small slice or part of a resource. When referencing an object, the Frag field is not routed to the server, which is used internally within the client. |
None |
2.1. Programme----agreements
#1: Specifies how to access the primary identifier of the specified resource, telling the protocol that the application responsible for parsing the URL should use.
#2: Schema name capitalization is not relevant, so "http://www.dawn.com" and "HTTP://www.dawn.com" are equivalent. 2.2. User name and password
Many servers require that a user name and password be entered to allow the user to access the data. The FTP server is a common example. Example:
Ftp://ftp.prep.ai.mit.edu/pub/gnu
Ftp://anonymous@ftp.prep.ai.mit.edu/pub/gnu
Ftp://anonymous:my_password@ftp.prep.ai.mit.edu/pub/gnu
Http://joe:joespassword@www.joes-hardware.com/sales_info.txt
#1: The first example does not have a user name and password. If an application uses a URL scheme that requires a user name and password, such as FTP, but the user does not provide it, a default user name and password (FTP URL, no user name and password are specified, and anonymous (anonymous user) is used as the user name. IE will use Ieuser as the password).
2.3. Host and Port
#1: The host component identifies the host machine on which the Internet can access resources. You can use a domain name or IP to represent the host name.
#2: The port component identifies the network port that the server is listening on. For HTTP, which uses the TCP protocol below, the default port is 80. 2.4. Path
The path component describes where the resource resides on the server.
Http://www.joes-hardware.com:80/seasonal/index-fail.html
The path to this example is/seasonal/index-fail.html
2.5. Parameters
In some scenarios, the host name and path alone are insufficient and require more information to work. The application that is responsible for parsing the URL requires these protocol parameters to access the resource. Otherwise, the server does not provide services or provide error services.
Ftp://prep.ai.mit.edu/pub/gnu;type=d
http://www.joes-hardware.com/hammers;sale=false/index.html;graphics= True
Example 1: There is a parameter type=d, parameter name type, and a value of D.
Example 2: There are two path segments, hammers and index.html. The Hammers segment has a parameter sale value of false,index.html with a parameter Graphisc value of TRUE.
2.6. Query string
Many resources can be reduced by asking questions or querying to narrow the range of requested resource types.
Http://www.joes-hardware.com/inventory-check.cgi?item=12731&color=blue
In the example, the query component has two name/value pairs: Item=12731&color=blue
2.7. Fragments
Some resource types, in addition to the resource level, can be further divided. To refer to a fragment of a part of a resource or resource, the URL supports using a fragment component to represent a fragment within a resource.
Http://www.joes-hardware.com/tools.html#drills
example, fragment drills refers to a part of/tools.html, which is called drills.
The HTTP server has just processed the entire object (resource level), not the fragment of the object, so the client cannot deliver the fragment to the server, but is used internally.
3.URL Shortcuts
URLs in two ways: absolute and relative. 3.1. Relative URLs
A relative URL is a handy thumbnail notation. To get all the information needed to access a resource from a relative URL, it must be parsed relative to another, called the underlying URL.
<a href= "./hammers.html" >hammers</a>
This is part of the HTML document for http://www.dawn.com/tools.html.
The relative URL "./hammers.html" resolves to "http://www.dawn.com/hammers.html".
3.2. automatic extension URL
#1: Host name extension
Browsers can usually extend the hostname to the full hostname without help, such as Yahoo is automatically built as www.yahoo.com. If you can't find a site that matches Yahoo, some browsers will try several extensions before giving up. The browser uses simple tricks to save you time and reduce the likelihood of being found.
#2: Historical Expansion:
The browser stores the URL history that the previous user has visited, and when the URL entered matches the history prefix, it gives you a complete selection of options. If you enter http://joes-, the browser may suggest using the http://www.joes-hardware.com URL.
4.URL characters 4.1.URL Character Set
Using the Us-ascii character set, US-ASCII uses 7-bit binary codes to represent most of the keys provided by the English typewriter and a few non-printable control characters for text formatting and hardware notifications.
The designer of the URL integrates the escape sequence. With escape sequences, you can encode any character value or data with a finite subset of the US-ASCII character set to achieve portability and integrity. 4.2. Encoding mechanism
The encoding mechanism uses "escape" to represent unsafe characters. The escape notation contains a percent semicolon (%) followed by two hexadecimal numbers that represent the ASCII character of the characters.
Example:
Character |
ASCII code |
Sample URL |
~ |
126 (0x7E) |
Http://www.dawn.com/%7Ejoe |
Space |
(0x20) |
Http://www.dawn.com/more%20tools.html |
% |
Panax Notoginseng (0x25) |
Http://www.dawn.com/100%25satisfaction.html |
4.3. Character Restrictions
Reserved and restricted characters:
Character |
Reserved/restricted |
% |
Preserve escaped flags as encoded characters |
/ |
Preserves delimiters as delimited path segments in the path component |
. |
Reserved for use in path components |
.. |
Reserved for use in path components |
# |
Reserved as fragment delimiter use |
。 |
Reserved for use as a query string delimiter |
; |
Reserved for use as a parameter delimiter |
: |
Reserved as a scheme, user/password, and delimiter usage for host/port components |
$ , + |
Keep |
@ & = |
There is a special meaning in the context of some scenarios , preserving |
{ } | \ ^ ~ [ ] ' |
Agent Agents for various transports |
< > " |
Unsafe, these characters are usually meaningful outside the URL range, such as the delimitation of URLs |
0x00 ~ 0x1f,0x7f |
Restricted, us-ascii in a non-printable range |
>0x7f |
Limited, not in the 7 binary range of the us-ascii character set |
5. The Programme
Scheme |
Describe |
http |
Hypertext Transfer Protocol scheme with no user name and password. Default Port 80. Basic Format:http:// |
Https |
Scenario HTTPS is a pair with scenario http. The only difference is that HTTPS uses SSL (secure Sockets Layer, end-to-end encryption mechanism), and the default port is 443. Basic Format:https:// |
mailto |
The mailto URL points to the e-mail address. Basic Format:mailto:<rfc-822-addr-spec> Example: mailto:joe@joes-hardware.com |
Ftp |
The file Transfer Protocol URL can be used to download or upload files from an FTP server, and to obtain a list of directory structure contents on the FTP server. Basic Format:ftp://<user>:<password>@ |
Rtsp,rtspu |
The RTSP URL is an identifier for the audio/video media resource that can be parsed through the live streaming protocol (real time streaming Protocol). The U of scenario RTSPU is a representation of the use of the UDP protocol to obtain resources. Base format: Rtsp://<user>:<password>@Rtspu://<user>:<password>@ |
File |
The schema file represents a file that can be accessed directly on a specified host. If the hostname is omitted, it defaults to the local host that is using the URL. Basic Format:file:// |
News |
news://<newsgroup> as defined in RFC 1036, scenario news is used to access specific articles or newsgroups. It has a very unique nature: The news URL itself contains insufficient information to locate the resource. News resources can be obtained from polymorphic servers, and they are location-independent because access to them is not dependent on any one resource server. Base format: News://<newsgroup> News://<news-atricle-id> Example: News:rec.arts.startrek |
Telnet |
Used to access interactive business. Base format: telnet://<user>:<password>@ |