URI, URL, URN

Last Update:2018-07-26 Source: Internet

Author: User

Tags reserved rfc

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

java.net
Class URI

Rr.

Extends Object implements comparable < URI, Serializable

Represents a Uniform Resource identifier (URI) reference.

In addition to some of the nuances mentioned below, instances of this class represent a URI reference, which is defined in the following document: RFC 2396:uniform Resource Identifiers (URI): Generic Syntax The contents of this file are amended again: RFC 2732:format for Literal IPv6 Addresses in URLs. Literal value IPV6 address format also supports Scope_ids. The syntax and usage of scope_ids are described here. This class provides methods for constructing a URI instance from its constituent or by resolving its string form, for accessing the various components of an instance, and for normalizing, parsing, and comparing URI instances. An instance of this class cannot be changed. URI syntax and Components

At the highest level, the URI reference in the form of a string (hereinafter the "uri") syntax is as follows [scheme : ] scheme-specific-part [ # Fragment]

where the square brackets [...] are used to describe optional components, characters : and # represent themselves.

An absolute URI specifies a scheme, and a non absolute URI is called a relative URI. URIs can also be categorized according to whether they are opaque or layered.

An opaque URI is an absolute URI, and its scenario-specific part does not start with a slash character ('/'). An opaque URI cannot be parsed further. Here are some examples of opaque URIs:

Mailto:java-net@java.sun.com
News:comp.lang.java
urn:isbn:096139210x

A hierarchical URI is either an absolute URI (its program-specific part starts with a slash character) or a relative URI, that is, the URI of the schema is not specified. Here are some examples of layered URIs: http://java.sun.com/j2se/1.3/
Docs/guide/collections/designfaq.html#28
.. /.. /.. /demo/jfc/swingset2/src/swingset2.java
File:///~/calendar

The layered URI is further parsed according to the following syntax [scheme : ] [/ / Authority] [path ] [query] [ # Fragment]

Where,: ,/ ,? and # represent themselves. The scenario-specific portion of the hierarchical URI contains the characters between the scenario and fragment portions.

The authorization component of the layered URI (if specified) is server-based or registry based. server-based authorization is resolved according to the well-known syntax as follows: [User-info @ ] host [ : Port]

Where the characters @ and : represent themselves. Almost all of the URI schemes currently in use are server-based. The authorization components that cannot be resolved in this way are considered to be based on the registry.

If the path component of a layered URI starts with a slash character ('/'), the URI itself is said to be absolute, otherwise it is relative. The hierarchical URI is either absolute or specifies the path of the authorization, which is always absolute.

As noted above, the URI instance has the following nine components:

Components	type
Scheme	String
Scenario-specific Parts	String
Authorized	String
User Information	String
Host	String
Port	Int
Path	String
Inquire	String
Fragment	String

In a given instance, any particular component is either undefined, or defined, and has a different value. Undefined string components are represented by null, and undefined integer components are represented by-1. The value of a defined string component can be an empty string, which is unequal to an undefined component.

Whether a specific part of an instance is defined or undefined depends on the type of URI represented. An absolute URI has a scenario component. An opaque URI has a scenario, a scenario-specific part, and there may be a fragment, but there is no other component. A hierarchical URI always has a path (although it may be empty) and a scenario-specific part (it contains at least one path) and can contain any other components. If there is an authorization component and it is server-based, the host component is defined, and it is possible to define user information and port components. operations on URI instances

The main operations supported by this class are normalization, parsing, and relativity

Operation.

Normalization is the process of removing unnecessary "." and "..." portions of the path component of a layered URI. Each "." section will be removed. The "..." section is also removed unless it has a non ".." section in front of it. Normalization does not produce any effect on opaque URIs.

Parsing is the process of parsing a URI based on another base URI. The resulting URI is constructed from two URI components and is constructed by RFC 2396, which takes an unspecified part of the original URI from the base URI. For a hierarchical URI, the original path is parsed based on the basic path and then normalized. For example, parse the following URI docs/guide/collections/designfaq.html#28 (1)

Based on the basic URI http://java.sun.com/j2se/1.3/parsing, the result is URI http://java.sun.com/j2se/1.3/docs/guide/collections/ DESIGNFAQ.HTML#28 (3)

Resolves relative URIs ... /.. /.. /demo/jfc/swingset2/src/swingset2.java (2)

The base URI for this result should be generated based on this result (3) Http://java.sun.com/j2se/1.3/demo/jfc/SwingSet2/src/SwingSet2.java

Supports the resolution of absolute and relative URIs, and the absolute and relative paths of layered URIs. Parsing a URI File:///~calendar based on any other URI can only generate the original URI, because it is an absolute path. Parsing the relative URI (2) based on the relative base URI (1) will generate the canonical but still relative URI Demo/jfc/swingset2/src/swingset2.java

Finally, relativity is the analytic inverse process: for any two specifications of the URI U and V, U. Relativize (U. Resolve (v)). Equals (v) and
U. Resolve (U. Relativize (v)). Equals (v).

This operation is useful in the following situations: Construct a document that contains a URI that must be as relative as possible based on the base URI of the document. For example, the relative URI http://java.sun.com/j2se/1.3/docs/guide/index.html

http://java.sun.com/j2se/1.3 based on Basic URI

A relative URI was generated docs/guide/index.html

。 Character Classification

RFC 2396 precisely indicates the characters that are allowed by different parts of the URI reference. Most of the following classifications are taken from the specification, which are described in the following directions:

Th valign= "Top" >reserved The Unicode character

alpha	us-ascii alphabetic characters, ' a ' to ' Z ' and ' a ' to ' Z '
digit	us-ascii decimal digit character, ' 0 ' to ' 9 '
alphanum	all alpha and digit characters
unreserved ;	all Alphanum characters and strings "_-!. ~ "() *" contains characters
punct	string ",;: $&+=" characters contained in
all punct characters and characters contained in the string "?/[]@"
escaped	escapes a eight-bit group, which is a three-part combination: A percent semicolon ('% ') followed by two hexadecimal digits (' 0 '-' 9 ', ' a '-' F ' and ' a '-' f ')
other	not included in the US-ASCII character set is not a control character (according to the Character.isisocontrol method) and is not an empty characters (based on Character.isspacechar method ( somewhat different from RFC 2396 , RFC 2396 limited to US-ASCII)

The full legal URI character set contains unreserved, reserved, escaped, and other characters. escaping eight-bit groups, references, encodings, and decoding

RFC 2396 allows you to include escaped eight-bit groups in user information, paths, queries, and fragment components. Escape implements two purposes in the URI:

When a URI cannot contain any other characters to strictly comply with RFC 2396, a us-ascii character is required to be encoded.

To refer to an illegal character in a component. User information, paths, queries, and fragment components are slightly different in determining which characters are illegal.

These two purposes are implemented by three related operations in this class:

The character is encoded by replacing the character with the escaped eight-bit group sequence representing the character in the UTF-8 character set. For example, the euro sign ('/U20AC ') is encoded as "%E2%82%AC". ( there is a discrepancy with RFC 2396 , RFC 2396 does not specify any special character sets).

Illegal characters are referenced by simply encoding it. For example, a space character is substituted with "%20" to refer to it. UTF-8 contains Us-ascii, so for us-ascii characters, this conversion has the same effect as RFC 2396.

The way to decode an escaped eight-bit group sequence is to replace it with a sequence of characters in the UTF-8 character set it represents. UTF-8 contains Us-ascii, so decoding has the effect of dereferencing any us-ascii character that is referenced, and decoding any encoded non-us-ascii character. If a decoding error occurs while decoding the escaped eight-bit group, the error eight-bit group is replaced with the Unicode replacement character '/ufffd '.

These operations are exposed in the construction methods and methods of this class, as follows:

A single parameter construction method requires that any illegal character in the parameter be referenced and retains any escaped eight-bit and other characters that appear.

The multi-parameter construction method references illegal characters according to the needs of the components that appear in them. The percent semicolon character ('% ') is always referenced through these constructor methods. Any other character will be preserved.

Getrawuserinfo, Getrawpath, Getrawquery, Getrawfragment, getrawauthority and Getrawschemespecificpart Methods return the values of their corresponding components in their original form, and do not interpret any escaped eight-bit groups. The string returned by these methods may contain escaped eight-bit groups and other characters, but it does not contain any illegal characters.

The GetUserInfo, GetPath, Getquery, Getfragment, Getauthority, and Getschemespecificpart methods decode any escaped eight-bit groups in the appropriate components. The string returned by these methods may contain other characters and illegal characters, but it does not contain any escaped eight-bit groups.

ToString returns a URI string with all the necessary references, but it may contain other characters.

The Toasciistring method returns a fully-referenced and encoded URI string that does not contain any other characters. Identification

For any URI u, the following identifies a valid new URI (U. toString ()). Equals (U).

For any URI u that does not contain redundant syntax, for example, there are two slashes (such as file:///tmp/) preceded by a null authorization, followed by a colon but no port (such as http://java.sun.com:), and no character encoding except for the characters that must be referenced. The following identification is also valid: the New URI (U. Getscheme (),
U. Getschemespecificpart (),
U. Getfragment ())
. Equals (U)

In all cases, the following identifies the valid new URI (U. Getscheme (),
U. GetUserInfo (), U. Getauthority (),
U. GetPath (), U. Getquery (),
U. Getfragment ())
. Equals (U)

If u is layered, the following identifies the valid new URI (U. Getscheme (),
U. GetUserInfo (), U. GetHost (), U. Getport (),
U. GetPath (), U. Getquery (),
U. Getfragment ())
. Equals (U)

If you are layered and are not authorized or have no server-based authorization. URI, URL, and URN

The URI is a Uniform resource identifier and the URL is a Uniform Resource locator. Therefore, generally speaking, each URL is a URI, but not necessarily every URI is a URL. This is because the URI also includes a subclass, the Uniform Resource Name (URN), which names the resource but does not specify how to locate the resource. The mailto, News, and ISBN URIs above are examples of urns.

The conceptual differences between URIs and URLs are reflected in the differences between this class and the URL class.

An instance of this class represents a URI reference in the grammatical sense defined by RFC 2396. The URI can be absolute, or it can be relative. Parsing a URI string in general syntax, regardless of the scenario it specifies (if any), does not perform a lookup on the host (if any), nor does it construct a flow handler that relies on the scheme. Equality, hashing, and comparisons are strictly defined according to the character content of the instance. In other words, a URI instance is similar to a structured string that supports syntax-dependent comparisons, normalization, parsing, and relative computation.

As a control, an instance of a URL class represents the syntax component of the URL and the information needed to access the resource it describes. The URL must be absolute, that is, it must always specify a scenario. The URL string is parsed according to its scheme. Typically, a flow handler is established for the URL, and you cannot actually create a URL instance for a scenario that does not provide a handler. Equality and hashing depend on the Internet address of the scenario and host, if any, and no comparison is defined. In other words, a URL is a structured string that supports parsed syntax operations and network I/O operations such as locating a host and opening a connection to a specified resource.

java.net
Class URL

Url

Extends Object implements Serializable

The class URL represents a Uniform resource locator, which is a pointer to the Internet "resource". A resource can be a simple file or directory, or it can be a reference to a more complex object, such as a query to a database or search engine.

The URL class itself does not encode or decode any URL part based on the escape mechanism defined in RFC2396. The caller encodes any field that needs to be escaped before the URL is invoked and decodes any escaped fields returned from the URL. Further, because the URL does not understand URL escaping, it does not recognize the equivalent encoding and decoding form of the same URL. For example, for these two URLs:

    Http://foo.com/hello world/and Http://foo.com/hello%20world

will be considered to be unequal.

Note that the URI class performs an escape of its constituent fields under certain circumstances. It is recommended that you use URIs to manage the encoding and decoding of URLs and to implement conversions between these two classes using Touri () and Uri.tourl ().

You can also use the Urlencoder and Urldecoder classes, but only for HTML-style encodings, which are different from the encoding mechanisms defined in RFC2396.

URLConnection	OpenConnection () Returns a URLConnection object that represents the connection to the remote object referenced by the URL.
URLConnection	OpenConnection (proxy proxy) Similar to OpenConnection (), the difference is that the connection is established through the specified proxy, and the protocol handler that does not support the Proxy method ignores the proxy parameter and establishes a normal connection.
InputStream	OpenStream () Opens a connection to this URL and returns a inputstream for reading from the connection.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More