java.net
Class URI
Rr.
Extends Object implements comparable < URI, Serializable
Represents a Uniform Resource identifier (URI) reference.
In addition to some of the nuances mentioned below, instances of this class represent a URI reference, which is defined in the following document: RFC 2396:uniform Resource Identifiers (URI): Generic Syntax The contents of this file are amended again: RFC 2732:format for Literal IPv6 Addresses in URLs. Literal value IPV6 address format also supports Scope_ids. The syntax and usage of scope_ids are described here. This class provides methods for constructing a URI instance from its constituent or by resolving its string form, for accessing the various components of an instance, and for normalizing, parsing, and comparing URI instances. An instance of this class cannot be changed. URI syntax and Components
At the highest level, the URI reference in the form of a string (hereinafter the "uri") syntax is as follows [scheme : ] scheme-specific-part [ # Fragment]
where the square brackets [...] are used to describe optional components, characters : and # represent themselves.
An absolute URI specifies a scheme, and a non absolute URI is called a relative URI. URIs can also be categorized according to whether they are opaque or layered.
An opaque URI is an absolute URI, and its scenario-specific part does not start with a slash character ('/'). An opaque URI cannot be parsed further. Here are some examples of opaque URIs:
Mailto:java-net@java.sun.com |
|
News:comp.lang.java |
|
urn:isbn:096139210x |
A hierarchical URI is either an absolute URI (its program-specific part starts with a slash character) or a relative URI, that is, the URI of the schema is not specified. Here are some examples of layered URIs: http://java.sun.com/j2se/1.3/
Docs/guide/collections/designfaq.html#28
.. /.. /.. /demo/jfc/swingset2/src/swingset2.java
File:///~/calendar
The layered URI is further parsed according to the following syntax [scheme : ] [/ / Authority] [path ] [query] [ # Fragment]
Where,: ,/ ,? and # represent themselves. The scenario-specific portion of the hierarchical URI contains the characters between the scenario and fragment portions.
The authorization component of the layered URI (if specified) is server-based or registry based. server-based authorization is resolved according to the well-known syntax as follows: [User-info @ ] host [ : Port]
Where the characters @ and : represent themselves. Almost all of the URI schemes currently in use are server-based. The authorization components that cannot be resolved in this way are considered to be based on the registry.
If the path component of a layered URI starts with a slash character ('/'), the URI itself is said to be absolute, otherwise it is relative. The hierarchical URI is either absolute or specifies the path of the authorization, which is always absolute.
As noted above, the URI instance has the following nine components:
Components |
type |
Scheme |
String |
Scenario-specific Parts |
String |
Authorized |
String |
User Information |
String |
Host |
String |
Port |
Int |
Path |
String |
Inquire |
String |
Fragment |
String |
In a given instance, any particular component is either undefined, or defined, and has a different value. Undefined string components are represented by null, and undefined integer components are represented by-1. The value of a defined string component can be an empty string, which is unequal to an undefined component.
Whether a specific part of an instance is defined or undefined depends on the type of URI represented. An absolute URI has a scenario component. An opaque URI has a scenario, a scenario-specific part, and there may be a fragment, but there is no other component. A hierarchical URI always has a path (although it may be empty) and a scenario-specific part (it contains at least one path) and can contain any other components. If there is an authorization component and it is server-based, the host component is defined, and it is possible to define user information and port components. operations on URI instances
The main operations supported by this class are normalization, parsing, and relativity
Operation.
Normalization is the process of removing unnecessary "." and "..." portions of the path component of a layered URI. Each "." section will be removed. The "..." section is also removed unless it has a non ".." section in front of it. Normalization does not produce any effect on opaque URIs.
Parsing is the process of parsing a URI based on another base URI. The resulting URI is constructed from two URI components and is constructed by RFC 2396, which takes an unspecified part of the original URI from the base URI. For a hierarchical URI, the original path is parsed based on the basic path and then normalized. For example, parse the following URI docs/guide/collections/designfaq.html#28 (1)
Based on the basic URI http://java.sun.com/j2se/1.3/parsing, the result is URI http://java.sun.com/j2se/1.3/docs/guide/collections/ DESIGNFAQ.HTML#28 (3)
Resolves relative URIs ... /.. /.. /demo/jfc/swingset2/src/swingset2.java (2)
The base URI for this result should be generated based on this result (3) Http://java.sun.com/j2se/1.3/demo/jfc/SwingSet2/src/SwingSet2.java
Supports the resolution of absolute and relative URIs, and the absolute and relative paths of layered URIs. Parsing a URI File:///~calendar based on any other URI can only generate the original URI, because it is an absolute path. Parsing the relative URI (2) based on the relative base URI (1) will generate the canonical but still relative URI Demo/jfc/swingset2/src/swingset2.java
Finally, relativity is the analytic inverse process: for any two specifications of the URI U and V, U. Relativize (U. Resolve (v)). Equals (v) and
U. Resolve (U. Relativize (v)). Equals (v).
This operation is useful in the following situations: Construct a document that contains a URI that must be as relative as possible based on the base URI of the document. For example, the relative URI http://java.sun.com/j2se/1.3/docs/guide/index.html
http://java.sun.com/j2se/1.3 based on Basic URI
A relative URI was generated docs/guide/index.html
。 Character Classification
RFC 2396 precisely indicates the characters that are allowed by different parts of the URI reference. Most of the following classifications are taken from the specification, which are described in the following directions:
alpha |
us-ascii alphabetic characters, ' a ' to ' Z ' and ' a ' to ' Z ' |
digit |
us-ascii decimal digit character, ' 0 ' to ' 9 ' |
alphanum |
all alpha and digit characters |
unreserved ; |
all Alphanum characters and strings "_-!. ~ "() *" contains characters |
punct |
string ",;: $&+=" characters contained in |
Th valign= "Top" >reserved
all punct characters and characters contained in the string "?/[]@" |
escaped |
escapes a eight-bit group, which is a three-part combination: A percent semicolon ('% ') followed by two hexadecimal digits (' 0 '-' 9 ', ' a '-' F ' and ' a '-' f ') |
The Unicode character
other |
not included in the US-ASCII character set is not a control character (according to the Character.isisocontrol method) and is not an empty characters (based on Character.isspacechar method ( somewhat different from RFC 2396 , RFC 2396 limited to US-ASCII) |
The full legal URI character set contains unreserved, reserved, escaped, and other characters. escaping eight-bit groups, references, encodings, and decoding
RFC 2396 allows you to include escaped eight-bit groups in user information, paths, queries, and fragment components. Escape implements two purposes in the URI:
When a URI cannot contain any other characters to strictly comply with RFC 2396, a us-ascii character is required to be encoded.
To refer to an illegal character in a component. User information, paths, queries, and fragment components are slightly different in determining which characters are illegal.
These two purposes are implemented by three related operations in this class:
The character is encoded by replacing the character with the escaped eight-bit group sequence representing the character in the UTF-8 character set. For example, the euro sign ('/U20AC ') is encoded as "%E2%82%AC". ( there is a discrepancy with RFC 2396 , RFC 2396 does not specify any special character sets).
Illegal characters are referenced by simply encoding it. For example, a space character is substituted with "%20" to refer to it. UTF-8 contains Us-ascii, so for us-ascii characters, this conversion has the same effect as RFC 2396.
The way to decode an escaped eight-bit group sequence is to replace it with a sequence of characters in the UTF-8 character set it represents. UTF-8 contains Us-ascii, so decoding has the effect of dereferencing any us-ascii character that is referenced, and decoding any encoded non-us-ascii character. If a decoding error occurs while decoding the escaped eight-bit group, the error eight-bit group is replaced with the Unicode replacement character '/ufffd '.
These operations are exposed in the construction methods and methods of this class, as follows:
A single parameter construction method requires that any illegal character in the parameter be referenced and retains any escaped eight-bit and other characters that appear.
The multi-parameter construction method references illegal characters according to the needs of the components that appear in them. The percent semicolon character ('% ') is always referenced through these constructor methods. Any other character will be preserved.
Getrawuserinfo, Getrawpath, Getrawquery, Getrawfragment, getrawauthority and Getrawschemespecificpart Methods return the values of their corresponding components in their original form, and do not interpret any escaped eight-bit groups. The string returned by these methods may contain escaped eight-bit groups and other characters, but it does not contain any illegal characters.
The GetUserInfo, GetPath, Getquery, Getfragment, Getauthority, and Getschemespecificpart methods decode any escaped eight-bit groups in the appropriate components. The string returned by these methods may contain other characters and illegal characters, but it does not contain any escaped eight-bit groups.
ToString returns a URI string with all the necessary references, but it may contain other characters.
The Toasciistring method returns a fully-referenced and encoded URI string that does not contain any other characters. Identification
For any URI u, the following identifies a valid new URI (U. toString ()). Equals (U).
For any URI u that does not contain redundant syntax, for example, there are two slashes (such as file:///tmp/) preceded by a null authorization, followed by a colon but no port (such as http://java.sun.com:), and no character encoding except for the characters that must be referenced. The following identification is also valid: the New URI (U. Getscheme (),
U. Getschemespecificpart (),
U. Getfragment ())
. Equals (U)
In all cases, the following identifies the valid new URI (U. Getscheme (),
U. GetUserInfo (), U. Getauthority (),
U. GetPath (), U. Getquery (),
U. Getfragment ())
. Equals (U)
If u is layered, the following identifies the valid new URI (U. Getscheme (),
U. GetUserInfo (), U. GetHost (), U. Getport (),
U. GetPath (), U. Getquery (),
U. Getfragment ())
. Equals (U)
If you are layered and are not authorized or have no server-based authorization. URI, URL, and URN
The URI is a Uniform resource identifier and the URL is a Uniform Resource locator. Therefore, generally speaking, each URL is a URI, but not necessarily every URI is a URL. This is because the URI also includes a subclass, the Uniform Resource Name (URN), which names the resource but does not specify how to locate the resource. The mailto, News, and ISBN URIs above are examples of urns.
The conceptual differences between URIs and URLs are reflected in the differences between this class and the URL class.
An instance of this class represents a URI reference in the grammatical sense defined by RFC 2396. The URI can be absolute, or it can be relative. Parsing a URI string in general syntax, regardless of the scenario it specifies (if any), does not perform a lookup on the host (if any), nor does it construct a flow handler that relies on the scheme. Equality, hashing, and comparisons are strictly defined according to the character content of the instance. In other words, a URI instance is similar to a structured string that supports syntax-dependent comparisons, normalization, parsing, and relative computation.
As a control, an instance of a URL class represents the syntax component of the URL and the information needed to access the resource it describes. The URL must be absolute, that is, it must always specify a scenario. The URL string is parsed according to its scheme. Typically, a flow handler is established for the URL, and you cannot actually create a URL instance for a scenario that does not provide a handler. Equality and hashing depend on the Internet address of the scenario and host, if any, and no comparison is defined. In other words, a URL is a structured string that supports parsed syntax operations and network I/O operations such as locating a host and opening a connection to a specified resource.
java.net
Class URL
Url
Extends Object implements Serializable
The class URL represents a Uniform resource locator, which is a pointer to the Internet "resource". A resource can be a simple file or directory, or it can be a reference to a more complex object, such as a query to a database or search engine.
The URL class itself does not encode or decode any URL part based on the escape mechanism defined in RFC2396. The caller encodes any field that needs to be escaped before the URL is invoked and decodes any escaped fields returned from the URL. Further, because the URL does not understand URL escaping, it does not recognize the equivalent encoding and decoding form of the same URL. For example, for these two URLs:
Http://foo.com/hello world/and Http://foo.com/hello%20world
will be considered to be unequal.
Note that the URI class performs an escape of its constituent fields under certain circumstances. It is recommended that you use URIs to manage the encoding and decoding of URLs and to implement conversions between these two classes using Touri () and Uri.tourl ().
You can also use the Urlencoder and Urldecoder classes, but only for HTML-style encodings, which are different from the encoding mechanisms defined in RFC2396.
URLConnection |
OpenConnection () Returns a URLConnection object that represents the connection to the remote object referenced by the URL. |
URLConnection |
OpenConnection (proxy proxy) Similar to OpenConnection (), the difference is that the connection is established through the specified proxy, and the protocol handler that does not support the Proxy method ignores the proxy parameter and establishes a normal connection. |
InputStream |
OpenStream () Opens a connection to this URL and returns a inputstream for reading from the connection. |